C strcat, strcpy and armageddon, part 1

I am pretty sure writing crash-proof code used to be a thing. Back when computers were simpler, and an entire project was done by one developer, it was much rarer to find a glaring bug in a piece of released software.

Today I’d like to share some things you may never have known, or thought about, when it comes to standard C library string manipulation calls.

Upon the conclusion of this series, you will know some very simple things to do to make your program a bit more crash proof just by using “safer” versions of C string calls. Let’s begin with the “less safe” calls.

From the very-useful cplusplus.com website:

  • strcat( char *destination, const char *source ) – Concatenate strings. Appends a copy of the source string to the destination string. The terminating null character in destination is overwritten by the first character of source, and a null-character is included at the end of the new string formed by the concatenation of both in destination.
  • strcpy( char *destination, const char *source ) – Copy string. Copies the C string pointed by source into the array pointed by destination, including the terminating null character (and stopping at that point).
    To avoid overflows, the size of the array pointed by destination shall be long enough to contain the same C string as source (including the terminating null character), and should not overlap in memory with source.

These simple functions allow us to copy, append and compare null terminated strings. A null is a zero that indicates the end of a string. Consider the string “HELLO WORLD”:

char *message = "HELLO WORLD";

In memory would be the characters “HELLO WORLD” followed by a 0 to indicate the end of the string:

+---+---+---+---+---+---+---+---+---+---+---+---+---+
| H | E | L | L | O |   | W | O | R | L | D | ! | 0 |
+---+---+---+---+---+---+---+---+---+---+---+---+---+

When we control everything, and know exactly what we are doing, strings are safe. But in the real world, there are some very serious problems that can happen with strings that can cause crashes from memory corruption. At best, this may just be annoying if your Arduino sketch won’t run, but at worst, it could mean Armageddon if the corruption is on a system that controls nuclear missiles.

Let’s discuss some very simple habits to get in to when using strings in C that will reduce the chances of an atomic meltdown. Or your blinking LEDs from stopping blinking…

First, here is an example of the problem. Suppose you have a 10-byte string buffer:

char buffer[10];

Somewhere in memory will be 10 bytes reserved for that buffer:

+---+---+---+---+---+---+---+---+---+---+
|   |   |   |   |   |   |   |   |   |   | <- buffer
+---+---+---+---+---+---+---+---+---+---+

You can load a string in to that buffer using strcpy() like this:

strcpy( buffer, "HELLO" );

Then buffer will contain your string, including the null character at the end:

+---+---+---+---+---+---+---+---+---+---+
| H | E | L | L | O | 0 |   |   |   |   | <- buffer
+---+---+---+---+---+---+---+---+---+---+

Anything else in the buffer may be random junk from what might have been in that memory before the buffer was allocated. If this was a concern, we could have initialized that buffer memory ahead of time by doing something like this:

char buffer[10] = { 0 };

…or in your code, you could use memset() to clear the contents of the buffer:

memset( buffer, 0, 10 ); // Initialize buffer with zeros

But I digress. (Not to self: I should share some pros and cons of initializing data like that sometime.)

When you control the world, you can do things like this safely. But what if you forget how large buffer is, and try to load something larger in to it?

strcpy( buffer, "I forgot how much room I have" );

strcpy() will gladly start copying everything you give it up until it finds a null at the end of the string. In C, when you created a “quoted string” it automatically has a null added to the end of it.

Thus, it begins copying “I forgot how…” in to the 10 bytes of buffer, and keeps on going well past the end of buffer, overwriting whatever happens to be in memory past that area. This is a very common mistake.
Suppose you had created two buffers, and the compiler created them in memory next to each other:

char buffer[10];
char launchCode[10];

As the above strcpy() began writing to buffer, it would fill up buffer then continue to write over part of launchCode, corrupting whatever was there. This is bad, and you might not even notice if if you hadn’t checked launchCode to see if it was still intact.

+---+---+---+---+---+---+---+---+---+---+
| I |   | f | o | r | g | o | t |   | h | <- buffer
+---+---+---+---+---+---+---+---+---+---+
| o | w |   | m | u | c | h |   | r | o | <- launchCode
+---+---+---+---+---+---+---+---+---+---+
| o | m |   | I |   | h | a | v | e | 0 | ...other memory...
+---+---+---+---+---+---+---+---+---+---+

Worse, you might have had other types of variables there for counters, high scores or mom’s birthday, and overwriting the buffer might end up corrupting some numeric variable and cause your program to behave in unexpected ways. This can be madness to figure out :)

The solution is to use a special version of string copy that lets you specify the maximum number of characters it will copy before it stops.

  • strncpy( char *destination, const char *source, size_t num ) – Copy characters from string. Copies the first num characters of source to destination. If the end of the source C string (which is signaled by a null-character) is found before num characters have been copied, destination is padded with zeros until a total of numcharacters have been written to it.
    No null-character is implicitly appended at the end of destination if source is longer than num. Thus, in this case,destination shall not be considered a null terminated C string (reading it as such would overflow).
    destination and source shall not overlap (see memmove for a safer alternative when overlapping).

By using this, and specifying the max size of the destination buffer, we can prevent the overrun and life will be perfect and happy:

strncpy( buffer, "I forgot how much room I have", 10 );

…or will it?

+---+---+---+---+---+---+---+---+---+---+
| I |   | f | o | r | g | o | t |   | h | <- buffer
+---+---+---+---+---+---+---+---+---+---+
|   |   |   |   |   |   |   |   |   |   | <- launchCode
+---+---+---+---+---+---+---+---+---+---+

That looks fine, doesn’t it?

In the next part, I’ll demonstrate why the above example is bad, and show you an easy way to make it more bullet-proof. Maybe you see it already… (Hint: The description of strncpy specifically mentions the problem.)

I’ll also discuss appending strings with strcat() and the similar problems that happen there, plus a few unexpected things I did not realize about these functions until I was digging in to them this week for a work project.

To be continued…

8 thoughts on “C strcat, strcpy and armageddon, part 1

  1. Pingback: C strcat, strcpy and armageddon, part 2 | Sub-Etha Software

  2. Pingback: C strcat, strcpy and armageddon, part 3 | Sub-Etha Software

  3. Pingback: C strcat, strcpy and armageddon, part 4 | Sub-Etha Software

  4. Pingback: C strcat, strcpy and armageddon, part 5 | Sub-Etha Software

  5. Pingback: C strcat, strcpy and armageddon, part 6 | Sub-Etha Software

  6. Pingback: Building safer C string functions, part 1 | Sub-Etha Software

  7. Pingback: C strcat, strcpy and armageddon, part 4 | Sub-Etha Software

  8. Pingback: C strcat, strcpy and armageddon, part 5 | Sub-Etha Software

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.