C strcat, strcpy and armageddon, part 2

In part 1, I gave a quick overview of the C library call, strcpy(), and showed how it is used to copy a string in to a buffer.  I specifically wanted to show a problem that can happen if you copy more data than the buffer can hold (a buffer overrun), and show a simple way to fix it using strncpy().

The final example would limit how much data was copied to the number of bytes the buffer could hold:

char buffer[10];
strncpy( buffer, "I forgot how much room I have", 10);

Not only does strncpy() limit how many characters will be copied, it also does something else. If the source string is shorter than the maximum number you specify to copy, it pads the rest of the destination buffer with null characters (zeros). If you give it a null terminated string shorter than the max…

strncpy( buffer, "HI", 10);

…that would result in:

+---+---+---+---+---+---+---+---+---+---+
| H | I | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+---+---+---+---+---+---+---+---+---+---+

Even though it’s clearly stated in the C library manual, I guess I never read that part. I had no idea it did that.

If I were to write my own crude implementation of strncpy(), it might look something like this:


// My own crude implementation of strncpy()
char * myStrncpy( char *destination, const char *source, size_t num )
{
size_t i;

// Now we will begin copying characters until we copy a null (0) or
// we have copied num characters.
for (i=0; i<num ; i++)
{
// If we have found a null character...
if (source[i]=='\0') break; // Exit the for loop.
// Copy a character over.
destination[i] = source[i];
} // Here we have copied 'i' characters. // If less than num, fill the rest with nulls. while(i < num) { destination[i] = '\0'; i++; } return destination; // Why? Because it just does. }

While strncpy() will prevent us from a buffer overrun, passing in a string that is too large will create a different problem: It will not leave us with a usable C string. Per the cplusplus.com page on strncpy:

No null-character is implicitly appended at the end of destination if source is longer than num. Thus, in this case, destination shall not be considered a null terminated C string (reading it as such would overflow).

strncpy() will copy characters until it has copied a null (0) or has copied a max number of bytes you specify with the third parameter. If you copy that full amount, and no null was copied, the destination buffer will not have a null terminator. It won't be a C string, and using it as one will do strange things.

char buffer[10];
strncpy( buffer, "I forgot how much room I have", 10);
+---+---+---+---+---+---+---+---+---+---+
| I |   | f | o | r | g | o | t |   | h |
+---+---+---+---+---+---+---+---+---+---+

In the above example, you end up with buffer full of 10 bytes of data without a null character at the end. You won't be able to use it with any of the C string functions like printf(), strlen(), strcat(), etc. Well, you can, but you might not get what you expect.

If you try to use it, these functions will continue through memory until they find the next null character after the buffer. If the next byte in memory after the buffer just happens to be a zero, everything will work and you won't see the problem. If the next 1K of memory is garbage that has NO zeros in it, the functions would begin at buffer and keep going as it if found a really long 1K string that starts out with what you expected, followed by garbage.

On operating systems that have memory protection, the program might even get terminated if it seeks through memory and ends up trying to read (still looking for that null byte) RAM that doesn't belong that that process.

Don't do that. Instead, when using strncpy(), always make sure you manually null terminate the buffer in case this situation occurs:

// Copy up to 10 characters.
strncpy( buffer, "I forgot how much room I have", 10);
// null terminate destination at 10th byte (bytes 0-9)
buffer[9] = '';

The end result of that will be up to 9 characters followed by a null. (See #3 below for an optimization tip.)

+---+---+---+---+---+---+---+---+---+---+
| I |   | f | o | r | g | o | t |   | 0 |
+---+---+---+---+---+---+---+---+---+---+

Because of needing a null at the end, if you truly did want a 10 character string, you would have to make the buffer 11 bytes large, so it can hold the 10 characters you want plus the null character at the end.

Therefore, my suggestion for doing string copies is to do the following:

  1. For clarity, use a #define for how large the buffer is, so you have that define available to use in strncpy(). This will mean only one place in code to change the buffer size if you need to (rather than having to edit every function instance of strncpy() that used it).
  2. Always manually null terminate the last byte of the destination buffer after the copy, just in case a string as large or larger than the buffer is copied to it. (Yes, this will add to your code and take more CPU time, but we are going for safety here.)
  3. Optimization tip: Since you are going to manually terminate the final byte anyway, you can copy one less byte in strncpy(). (There is no need to have strncpy() write out to the final byte only to have you write to that final byte again.)
#define BUFSIZE 10

char buffer[BUFSIZE];

strncpy(buffer, "Here is my stuff", BUFSIZE-1);
buffer[BUFSIZE-1] = '';

Above, we will allow up to 9 bytes to be copied in to the buffer (BUFSIZE-1), then we will null terminate the 10th byte (remember bytes are numbered starting at zero, so 0-9, therefore BUFSIZE-1 is 9, the 10th byte of the buffer).

If you truly did want to be able to hold a 10 character string, just add one to the BUFSIZE define. None of the other code will have to be touched.

Get in to that simple habit and your strncpy() will now not create a buffer overflow, and will never create a non-terminated C string.

And before you say it . . .

Now, obviously, if YOU are copying a literal "string" inside your own code, you can know exactly how large it is so using strcpy() is perfectly safe (and more efficient if you don't need the rest of the destination buffer padded with zeros). BUT, if you code using strncpy() all the time, you will reduce the chances of you changing something later on that might cause you grief:

strcpy( buffer, "Game Over" );
...later changed to...
strcpy( buffer, "Game over, dude! You suck!" );

Above, a simple change to a message might end up crashing your program because you forgot how large buffer was.

In the next installment, we will take a look at strcat() and appending strings.

2 thoughts on “C strcat, strcpy and armageddon, part 2

  1. wb8nbs

    Is there a way in C to define a variable (which you would fill with null) that is guaranteed to come right after buffer? I remember in Fortran we used to force variable alignment with a common block statement. Maybe a perverse form of union would work, define buffer[11] over the top of buffer[10].

    Reply
    1. Allen Huffman Post author

      I don’t think anything in C guarantees where local variables live. For instance, you could do:

      char buffer[10];
      char nullify = ‘\0’;

      …and they may live just like that in memory, though some of my tests (trying to detect overruns by dumping the variable after) were not getting corrupting, so I don’t think this is guaranteed.

      I bet something like that could be done with a union, just like you mention (two buffers). I am not sure what kind of code that generates, but I may toy with that just to see what it does…

      #define BUFSIZE 20

      union {
      char buffer[BUFSIZE];
      char padding[BUFSIZE+1];
      } MyBuffer;

      // Set it to all zeros…
      memset(MyBuffer.padding, 0, sizeof(MyBuffer.padding));

      strncpy(MyBuffer.buffer, “This string is going to be way too long!”, BUFSIZE);

      printf(“Buffer is: ‘%s’\n”, MyBuffer.buffer);

      …appears to work. But, you might as well just do this:

      char buffer[BUFSIZE+1];
      buffer[BUFSIZE] = ‘\0’;

      …from now on, anything using buffer as BUFSIZE will be ignoring the extra byte at the end, which is a null unless someone changes it.

      That “unless someone changes it” (someone else makes a buffer overrun) is the main reason this could go horribly wrong ;-)

      Reply

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.