Building safer C string functions, part 1

In an earlier series, I discussed some easy ways to prevent buffer overrun problems when doing copies of C strings. As part of this, I created a few of my own implementations for things like strncpy() and strnlen(). Much of what I did as a workaround could be simplified if we had a smarter string concatenate function, so today I’d like to present one.

Fixing strncat

The max-limited string copy (strncpy) works well enough to prevent buffer overruns when copying strings, but the max-limited string concatenate (strncat) does not. It only limits how many characters it copies from the source buffer, without any regard to how much room is left in the destination buffer. (Is this of any use?)

We can do better.

The first thing strcat has to do is look at the destination buffer and seek to the end of whatever null terminated C string is there. Since it is already doing this work, it would be easy for it to limit how many characters it copies based on being told the maximum size of the destination buffer (as opposed to strncat, which limits based on a maximum size of the source buffer).

I am envisioning a function that looks like strncat(), but the max number passed in is for the destination buffer. Thus, if I try to append a string of 10 characters to a buffer that can hold up to 40 characters, I’d just append with a value of 40, and the function would check how much is already in the buffer and do the math for me. Because math is hard.

Here is what it might look like:

char * strncatdst( char * destination, const char * source, size_t num )
{
  size_t len;
  size_t left;

  // Step 1 - find out how much data is in the destination buffer.
  len = strlen( destination );

  // If string len is longer than we want...
  if (len > num)
  {
    // ...limit the len to be the max num.
    len = num;
  }

  // len comes back with how much is in the buffer or maxed to num.

  // Step 2 - find out how much room is left
  left = num - len;

  // Step 3 - copy up to a null, or until we hit the max size.
  // We always copy one less because strncat() adds an extra null.
  strncat( destination, source, left-1 );

  // Return destination pointer, because C says so.
  return destination;
}

This demonstrates a simple way to make a custom version of strncat() by using other standard C library functions. It could be used like this:

#define BUFSIZE 20

int main()
{
  char buffer[BUFSIZE];
  char *string = "This is a long string";

  // Put an initial string in the buffer.
  strcpy(buffer, "new:");

  // Using our own strncatdst() function: 
  strncatdst(buffer, string, BUFSIZE );

  printf("buffer = '%s'\n", buffer);

  return EXIT_SUCCESS;
}

Of course, we probably want to use a safer strcpy() as well, so going back to an earlier article I wrote, I should have done this:

// Put an initial string in the buffer.
strncpy(buffer, "new:", BUFSIZE-1);
buffer[BUFSIZE-1] = '\0';

// Using our own strncatdst() function:
strncatdst(buffer, string, BUFSIZE );

printf("buffer = '%s'\n", buffer);

Even though I know I am only copying four characters (“new:”) there, I might not be sure of the length if I was copying in some string that was created somewhere else, or if I (or someone) changed that string to something longer without thinking about the buffer size.

Fixing strncpy

strncpy() could also use a bit more work because it does not put in a null terminator (required for C strings) if the string being copied is as long (or longer) as the max length specified.

Let’s see if we can make an improved strncpy() that handles the null terminator:

char * strncpynull( char * destination, const char * source, size_t num )
{
  // Step 1 - copy up to 1 less than num characters.
  strncpy( destination, source, num-1 );

  // Step 2 - make sure there is null terminator, in case num reached.
  destination[num-1] = '\0';

  return destination;
}

As you can see, this is just a wrapper for the standard strncpy() library function that adds a final null just in case the string is that long.

Now we can use it like this:

strncpynull(buffer, "This is a really long string.", BUFSIZE);

printf("buffer = '%s'\n", buffer);

…and it will make sure it trims long strings to to len-1, and adds a null terminator (which standard strncpy does not do).

Is it fixed yet?

These two functions should protect us against string buffer overruns, provided we know the size of the destination buffer.

However, by calling existing library functions, we are adding extra overhead. If those functions are highly optimized and very well done, this may still be more efficient than doing them yourself (and it is certainly easier to leverage existing functions rather than rolling your own). However, there are still a few potential issues that bother me.

For instance, for my strncatdst() function, the first thing I do is use strlen() to get the length of the string. It does this by starting at the first byte of the destination buffer and walking through it until it finds a null character. If this was a corrupt pointer, it might find itself walking through bogus memory until it happens to find a zero, potentially crashing the program from a memory access exception (if the operating system has such).

It is also not efficient because, after strlen(), the standard strncat() is used, and internally, it also must start with the first character in the destination buffer and walk through all the bytes until it finds a null (or the max num is reached), to know where it can start appending the source string. If we were doing this to a buffer containing a large 1K string, it would be walking through that 1K twice!

We can do better than that. Let’s see if we can create versions of these functions that do not use the existing C library functions.

Fixing strlen

First, recall my proposal for a version of strlen() that has a limit since there is no such strnlen() function as part of the ANSI-C standard library:

size_t strnlen( const char * str, size_t num )
{
  size_t len;

  len = strlen(str);

  // If string len is longer than we want...
  if (len > num)
  {
    // ...limit the len to be the max num.
    len = num;
  }

  // Return actual len, or max len.
  return len;
}

From the caller’s perspective, that seems fine, but this won’t actually solve the problem — it only hides it, and is still calling strlen() internally.

For this one, we really do need to create our own version so we can prevent it from scanning through 1K of memory if we know the string we expect to find should never be that long.

size_t strnlen2( const char * str, size_t num )
{
  size_t len;

  len = 0;

  while(len < num)
  {
    if (str[len]=='\0') break;
    len++;
  }

  return len;
}

This function will now stop counting at a value you specify. The original version I created used strlen() so it would count endlessly until it found a 0 before returning and having that value (if too big) clipped to the num passed in. This seems to be better for those chances when we are passed a bad pointer. (Not that that EVER happens, right?) It’s just not as efficient as it could be, so we’ll address that later.

Next, let’s use that code inside a new strncatdst() function:

char * strncatdst2( char * destination, const char * source, size_t num )
{
  size_t len;
  size_t index;

  // Step 1 - find out how much data is in the destination buffer.
  // This is basically the strnlen2() code, above.
  len = 0;

  while(len < num)
  {
    if (destination[len]=='\0') break;
    len++;
  }

  // len comes back with how much is in the buffer or maxed to num.

  // Step 2 - copy characters until we are out of room.
  index = 0;

  while(len < num)
  {
    destination[len] = source[index];
    if (source[index]=='\0') break;
    len++;
    index++;
  }

  // Step 3 - make sure string is null terminated. We really only
  // need to do this is len==num, but the overhead of adding the
  // check is probably more than just always doing it.
  if (len == num)
  {
    destination[num-1] = '\0';
  }

  // Return destination pointer, because C says so.
  return destination;
}

Now we can append short or long strings to a destination buffer, and ensure we never copy more than the size of that buffer, including a null terminator we will add if needed.

Next is a version of strncpynull() that does not use library functions. I previously shared a simple strncpy() implementation to demonstrate what it did (padding short strings with nulls). Using that as a reference, we have:

char * strncpynull2( char * destination, const char * source, size_t num )
{
  size_t index;

  // Step 1 - copy up to 1 less than num characters.
  index = 0;

  while(index < num-1) // One less, to leave room for null.
  {
    if (source[index]=='\0') break; // Exit the for loop.
    destination[index] = source[index];
    index++;
  }

  // Here we have copied 'index' characters.

  // If less than num, fill the rest with nulls.
  while(index < num)
  {
    destination[index] = '\0';
    index++;
  }

  return destination;
}

Now we have our own free-standing enhanced versions of strncpy(), strncat() and strnlen().

We should probably look at optimizing them so they are less stupid!

To be continued…

2 thoughts on “Building safer C string functions, part 1

  1. William Astle

    It’s nice to see sensible discussion of the problems with the standard string functions in C.

    I never understood why strncpy() doesn’t always leave a NUL terminated string when the source string is too long. I mean, it would have been simple to have it stop copying at “n – 1” and always write a final NUL. Presumably it’s due to some misguided attempt to optimize the implementation way back in the dark ages. And, frankly, having to write “strncpy(dest, src, n – 1); dest[n] = 0;” all the time is hardly a net gain compared to having strncpy() do the right thing in the first place.

    As for strncat()’s behaviour being useful, I can think of a very few oddball cases where limiting the size copied from the source is useful. But in the general case, strncat() is completely useless. I don’t think I’ve ever used it, to be honest, and I’ve done a massive amount of C programming.

    I think it’s interesting that nearly every non-trivial project I’ve seen in C has its own implementations of some of the string functions and almost always has to implement functions that don’t exist but should.

    Reply

Leave a Reply