Category Archives: C Programming

C musing of the day: signed ints

I ran across some code today that puzzled me. It was an infinite loop that used a counter to determine if things took too long. Something like this:

int main()
{
  int count;
  int status;

  count = 0;

  do
  {
    status = GetStatus();

    count++;

  } while( status == 0 );

  if (count < 0)
  {
    printf("Time out! count = %d\n", count);

    return EXIT_FAILURE;
  }

  printf("Done. count = %d\n", count);

  return EXIT_SUCCESS;
}

The code would read some hardware status (“GetStatus() in this example)  and drop out of the do/while loop once it had a value. Inside that loop, it increments a count variable. After done, it would check for “count < 0” and exit with a timeout if that were true.

Count is only incremented, so the only way count could ever be less than zero is if it incremented so high it rolled over. With a signed 8-bit value (int8_t), you count from 0 to 127, then it rolls over to -128 and counts back up to 0.

So with an “int” being a 32-bit value (on the system I am using), the rollover happens at 2147483647.

And that seems like quite a weird value.

But I suppose it it took that long, it certainly timed out.

I think if was going to do it, I would have probably used an unsigned integer, and just specified an appropriate value:

unsigned int count;

...

if (count > 10000)
{
  printf("Time out! count = %u\n", count);
  return EXIT_FAILURE;
}

What says you?

C warnings, %d versus %u and more C fun.

Code cleanup on aisle five…

I recently spent two days at work going through projects to clean up compiler warnings. In GNU C, you can enable options such as “-Wall” (all warnings), “-Wextra” (extra warnings) and “-Werror” (warnings as errors). By doing steps like these, the compiler will scream at you and fail to build code that has warnings in it.

Many of these warnings don’t impact how your code runs. They just ask you “are you sure this is what you are meaning to do?”

For example, if you leave out a “break” in a switch/case block, the compiler can warn you about that:

  x = 1;

  switch( x )
  {
  case 1:
    printf("x is one\n");
    // did I mean to not have a break here?

  case 2:
    printf("x is two\n");
    break;

  default:
    printf("I don't know what X is\n");
    break;
  }

This code would print:

x is one
x is two

…because without the “break” in the “case 1”, the code drops down to the following case. I found several places in our embedded TCP/IP stack where this was being done intentionally, and the author had left comments like “/* falls through below */” to let future observers know their intent. But, with warnings cranked up, it would no longer build for me, even though it was perfectly fine code working as designed.

I found there was a GCC thing you could do where you put in “//no break” as a comment and it removes that warning. I expect that are many more “yes, I really mean to do this” comments GCC supports, but I have not looked in to it yet.

Size (of int) matters

Another issue I would see would be warnings when you used the wrong specifier in a printf. Code might compile fine without warning on a PC, but generate all kinds of warnings on a different architecture where an “int” might be a different size. For example:

int answer = 42;
printf("The answer is %d\n", answer);

On my PC, “%d” can print an “int” type just fine. But, if I had used a “long” data type, it would error out:

long answer = 42;
printf("The answer is %d\n", answer);

This produces this warning/error:

error: format '%d' expects argument of type 'int', but argument 2 has type 'long int' [-Werror=format=]|

You need to use the “l” (long) specifier (“%ld”) to be correct:

long answer = 42;
printf("The answer is %ld\n", answer);

I found that code that compiled without warnings on the PC would not do the same on one of my embedded target devices.

%u versus %d: Fight!

Another warning I had to deal with was printf() and using “%d” versus “%u”. Most code I see always uses %d, which is for a signed value which can be positive or negative. It seems works just fine is you print an unsigned integer type:

unsigned int z;

z = 100;
printf("z is %d\n", z);

Even though the data type for z is unsigned, the value is positive so it prints out a positive number. After all, a signed value can be positive.

But, it is more correct to use “%u” when printing unsigned values. And, here is an example of why it is important to use the proper specifier… Consider this:

#include <limits.h> // for UINT_MAX

unsigned int x;

x = UINT_MAX; // largest unsigned int

printf("x using %%d is %d\n", x);
printf("x using %%u is %u\n", x);

This prints:

x using %d is -1
x using %u is 4294967295

In this case, %d is not giving you what you expect. For a 32-bit int (in this example), ULONG_MAX of 4294967295 is all bits set:

11111111 11111111 11111111 11111111

That represents a -1 if the value was a signed integer, and that’s what %d is told it is. Thus, while %d works fine for smaller values, any value large enough to set that end bit (that represents a negative value for a signed int) will produce incorrect results.

So, yeah, it will work if you *know* you are never printing values that large, but %u would still be the proper one to use when printing unsigned integers… And you won’t get that warning :)

C warning: comparison between signed and unsigned integer expressions [-Wsign-compare]

Trick C question time … what will this print?

#include <stdio.h>
#include <stdlib.h>

int main()
{
  int x;
  unsigned int y;

  x = -1;
  y = 2;

  printf("x = %d\n", x);
  printf("y = %u\n", y);

  if ( x > y )
  {
    printf("x > y\n");
  }
  else if (x < y)
  {
    printf("x < y\n");
  }
  else
  {
    printf("x == y\n");
  }

  return EXIT_SUCCESS;
}

I recently began looking in to various compiler warnings in some code I am using, and I found quite a few warnings about comparing signed and unsigned values:

warning: comparison between signed and unsigned integer expressions [-Wsign-compare]

I thought I could safely ignore these, since it seems plausible to compare a signed value with an unsigned value. A signed value of -42 should be less than an unsigned value of 42, right?

In the above example, it will print the following:

x = -1
y = 2
x > y

Nope. I was wrong. According to C, -1 is greater than 2.

C does something that I either never knew, or knew and have long since forgotten. I guess I generally try to write code that has no warnings at all, so I’ve avoided doing this. And now I know (or re-know) the reason why.

When dealing with mis-matched comparisons, C makes them both unsigned. Thus, “-1” becomes whatever -1 would be for that data type.

char  achar  = -1;
short ashort = -1;
int   aint   = -1;
long  along  = -1;

printf("char  -1 as unsigned: %u\n", (unsigned char)achar);
printf("short -1 as unsigned: %u\n", (unsigned short)ashort);
printf("int   -1 as unsigned: %u\n", (unsigned int)aint);
printf("long  -1 as unsigned: %u\n", (unsigned long)along);

This outputs:

char  -1 as unsigned: 255
short -1 as unsigned: 65535
int   -1 as unsigned: 4294967295
long  -1 as unsigned: 4294967295

Thus, on a PC, an 8-bit signed value of -1 is treated as a 255 when comparing against an unsigned value, and a 16-bit as 65535. It seems an int and long as both 32-bits on my system, but these could all be different on other architectures (on Arduino, and int is 16-bits, I believe).

So, without this warning enabled, any comparison that looks correct might be doing something quite wrong.

Warnings are our friends. Even if we hate them and want them to go away.

 

const-ant confusion in C.

Updates:

  • 2017-11-30 – Fixing description of MyStructure example. Thanks, Lost Wiz!

Embedded Life

I currently make my living doing embedded C programming. I am not quite sure how to define what “embedded” programming is other than to say: you probably don’t have everything you expect.

You often program on systems without file systems, without gigabytes of RAM and without an operating system to do most of your work for you. For example, at my previous job one of my main platforms (a TI MSP430 processor) had only 10K of RAM and the program was limited to 40K of flash storage. At my current job, I work on several variations of ARM processors, with one configured to give me only 7K of RAM and 36K of program space.

These systems are much closer to an Arduino UNO, which has 32K of flash and 2K of RAM, than a desktop Windows or Linux machine.

Not all embedded systems are this tiny, of course. There are many embedded systems that run Linux, but once you have a full operating system and a file system, the definition of “embedded” seems to be used to just mean “smaller than Windows”.

But I digress…

Over the past six years, I’ve worked on code that was created and maintained by many different programmers before I worked on it. I have learned some cool tricks and also seen some very un-cool tricks (i.e., just wrong). I expect I will be leaving my own cool/un-cool bits of code for future programmers to find.

With that said, there is one item that keeps turning up repeatedly that many of us C programmers don’t seem to really understand because we keep misusing it. And by “we” I include myself.

“const”

There is a keyword in C called “const” which, according to Wikipedia, “indicates that the data is read only.” For example, suppose you wrote a function that accepts a string (actually, pointer to a bunch of characters) like this:

void PrintErrorMessage( char *message )
{
  fputs( "ERROR: ", stderr );
  fputs( message, stderr );
}

In this example, whatever string passed in will be displayed with “ERROR: ” prepended to it.

int main( int argc, char **argv )
{
  PrintErrorMessage("File Not Found");

  return EXIT_SUCCESS;
}

That would display a message to standard error output:

ERROR: File Not Found

But, there is nothing preventing the function from trying* to modify the string that was passed in.

* Key word is “trying”… If that string were embedded in the binary and it was running out of ROM or Flash, attempts to modify it would be rejected by the “hardware can’t do that” exception ;-)

void PrintErrorMessage( char *message )
{
  fputs( "ERROR: ", stderr );

  // Convert message to uppercase
  for (int i=0; i&lt;strlen(message); i++)
  {
    message[i] = toupper(message[i]);
  }
  fputs( message, stderr );
}

The intent here would be to display the error message in uppercase, such as “ERROR: FILE NOT FOUND”. This would work if the string being passed in was modifiable, such as:

char message[80];

strcpy(message, "File Not Found");
PrintErrorMessage( message );

…but after returning from that call, the string would have been modified by the function and would now be “FILE NOT FOUND” in memory. This is fine, if that is the intent of the function, but if you do not intend the string to be modified, you can take steps to prevent the function from being able to do it.

Only read this…

“const” will tell the compiler to not allow code to be built that modifies the variable. You see it used all the time by standard C library functions that take strings, such as puts(), strcpy(), etc.

int puts ( const char * str );

For puts() and similar functions, the use of const disallows modifying that string within the function. In the earlier example, we could make the passed-in string “read only” like this:

void PrintErrorMessage( const char *message )
{
  int i;
  fputs( "ERROR: ", stderr );

  // Convert message to uppercase
  for (int i=0; i&lt;strlen(message); i++)
  {
    message[i] = toupper(message[i]);
  }

  fputs( message, stderr );
}

With that change made, the compiler now should issue warnings (or errors) on attempts to modify the “message” string inside the function:

error: assignment of read-only location '*(message + (sizetype)((unsigned int)i * 1u))'|

The moment the function tries to modify “message[i]” causes a problem, because “const” has told the compiler whatever is passed in should not be modified.

Because of the usefulness of this extra compile-time error checking, const is a good thing to use.

And many of us do.

Incorrectly.

There is a bit of confusion in how const works. In the above example, we pass in the pointer to some memory that contains a string. We do not want the memory that is being passed in to be modified, so we use “const char *message”. According to the “C Gibberish” website, that means:

declare message as pointer to const char

We might also want to prevent the pointer itself from being modified by using “const char const * message” … and that would be incorrect. That is not the proper syntax for “const”:

declare message as pointer to const const char

The confusion comes from const allowing two ways of doing the same thing. Did you know that:

const char

…is the same as:

char const

In C, the true syntax seems to be using “const” after the thing you are declaring, like “char is a constant” or “* is a constant”. But, at the start of that line, const can be used at the beginning to mean the same thing, and since we see that all the time, many of us seem to think that const describes what comes after it. Which it doesn’t.

To properly declare that the pointer and the data it points to should be read-only, it should be:

char const * const message;

C Gibberish agrees:

declare message as const pointer to const char

We need to learn this double “before or after is the same thing” use, or only use the “after” use and be consistent.

// declare message as const pointer to const char
const char * const message;

is the same as

// declare message as const pointer to const char
char const * const message;

My most recent encounter with this was in code at work that did something like this:

void Initialize(const MyStructure *config);

They must have intended this to mean “I’m passing in a constant MyStructure pointer which cannot be modified” but, in reality, what they were getting was:

declare config as pointer to const struct MyStructure

They were telling the compiler that the structure being pointed to was read-only and should not be modified. But the function’s purpose was to modify elements of that structure:

config->type = 10;
config->foo = 'a';

Because of this misuse of const, many compiler warnings were generated.

||=== Build: Debug in Const (compiler: GNU GCC Compiler) ===|
main.c||In function 'Initialize':|
main.c|16|error: assignment of member 'type' in read-only object|
main.c|17|error: assignment of member 'foo' in read-only object|
||=== Build failed: 2 error(s), 0 warning(s) (0 minute(s), 0 second(s)) ===|

The fix was to correct the prototype and function to do what was actually intended:

// declare config as const pointer to struct MyStructure
void Initialize(MyStructure * const config);

This now disallows the “config” pointer from being changed, but not the structure it points to. Thus, this would not work:

config++; // Increment config pointer.

…but the intended structure modifications would:

config->type = 10;
config->foo = 'a';

I’m sad to say I’ve been using “const” incorrectly as long as I can remember using “const.”

For a future article, I may dive in to some deep const-ant confusion I recently found myself in, and see if someone out there can tell me if I am finally doing it correctly or not.

Until then…

Nested ternary operators in C

I started learning C programming back in the late 1980s. I was using the original Microware OS-9 C compiler on a Radio Shack Color Computer 3. It was a K&R compiler, meaning it was the original version of C that was before the ANSI-C standard. Back then, I recall reading a magazine article that claimed C would be “the language of the 80s” so I decided to see what the fuss was about.

A Commodore-using friend of mine, Mark, was helping me learn C. He loaned me a Pocket C reference guide. That, and the Microware documentation, was all I had for reference material. At the time, Mark had moved from his Commodore 64 to a powerful Amiga computer. He would dial in to my OS-9 BBS system and upload source code he wrote on his Amiga and then compile it on my machine to see if it ran there, too. It was amazing that this was even possible, considering how non-portable BASIC programs were from machine to machine.

It was a fun time.

Over the years, I learned much about C from books and friends and just general experimentation. One thing I learned was this weird conditional assignment operation:

a = (b == 10 ? 100 : 200);

It is basically doing this:

condition ? value_if_true : value_if_false

The value of a would be set to 100 if b was 10, otherwise it would be set to 200. It was a shortcut to writing the code like this:

if (b == 10)
{
  a = 100;
}
else
{
  a = 200
}

…or…

switch( b )
{
  case 10:
    a = 100;
    break;
  default:
    a = 200;
    break;
}

I have used this many times over the years, but don’t even know what it’s called. I asked a coworker, and they told me it was a “ternary operator“. Here is the Wikipedia entry on how it works in C:

https://en.wikipedia.org/wiki/%3F:#C

It is a great shortcut for response strings. For instance, turning a boolean true/false result in to a string:

printf( "Status: %s\n", (status == true ? "Enabled" : "Disabled" );

This will print “Status: Enabled” if status==true, or “Status: Disabled” if status==false. What a neat shortcut.

Recently, I saw a nested use of this ternary operator. It never dawned on me that you could do this. It was something like this:

char *colorStr = (value == RED) ? "Red" : (value == BLUE) ? "Blue" : "Unknown";

It was using a second ternary operator for the “value_if_false” condition, allowing it to have three conditions rather than just two. I realized you could nest these in many different ways to create rather complex things… Though, readability would likely suffer. I think I’d just stick with simple if/then or switch/case things for anything more than two choices, but in this case it seemed simple enough.

I thought it was neat, and decided I’d share it here in this quick article.

Floating point is hard.

In my day job, I am an embedded C programmer. Our devices work with flow measurement and use a bunch of floating point math for accumulators and such. In the past three years, I have learned some interesting things about floating point numbers and how they cannot represent certain values. I was therefore amused to see an easy example of the floating point problem when I was paying for postage last night:

floating_point

I wonder how much it deducted from my PayPal account . . .

sizeof() matters

Updates:

  • 2016/02/29 – Per a comment by James, I corrected my statement that sizeof() is a macro. It is not. It is a keyword. My bad.

In C, the sizeof() macro can be used to determine the size of a variable type or structure. For instance, if you need to know the size of an “int” on your system, you can use sizeof(int). If you have a variable like “int i;” or “long i;”, you can also use sizeof(i).

On the Arduino, an int is 16-bits:

void setup() {
  // put your setup code here, to run once:
  Serial.begin(9600);
  Serial.print("sizeof(int) is ");
  Serial.println(sizeof(int));
}

void loop() {
  // put your main code here, to run repeatedly:
}

On the Arduino, that produces:

sizeof(int) is 2

On a Windows system, an int is 32-bits:

int main()
{
  printf("sizeof(int) is %d\n", sizeof(int));

  return EXIT_SUCCESS;
}

That displays:

sizeof(int) is 4

Note: sizeof() is not a library function. It is a macro C keyword that is handled by the C preprocessor during compile time. It will be replaced with the number representing the size the same way a #define replaces the define in the source code. At least, I think that’s what it does.

You should avoid making any assumptions about the size of data types beyond what the C standard tells you. For example, an “int” should be “at least 16-bit”. Thus, even a PC compiler could have chosen to make an “int” be 16-bits instead of 32.

A better way to use data types was added in the C99 specification, where you can include stdint.h and then request specific types of variables:

uint8_t unsignedByte;

uint16_t unsignedWord;

int32_t signed32bit;

But I digress.

The point of this article was to mention that you can also use sizeof() on strings IF they are known to the compiler at compile time. You can, of course, get the size of a pointer:

char *ptr;

printf("sizeof(ptr) is %d\n", sizeof(ptr));

Depending on the size of a pointer on your system  (16-bits on the Arduino, 32 on the PC), you will get back 2 or 4 (or 8 if it’s a 64-bit pointer, I suppose).

And the pointer is still the same size regardless of what it points to. You still get the same size even if you had something like this:

char *msgPtr = "This is my message.";

printf("sizeof(msgPtr) is %d\n", sizeof(msgPtr));

But, if you had declared that string as an array of characters, rather than a pointer to a character, you get something different because the compiler knows a bit about what you are pointing to:

char msgArray[] = "This is my message.";

printf("sizeof(msgArray) is %d\n", sizeof(msgArray));

There, you see the compiler actually substitutes the size of the array of characters:

sizeof(msgArray) is 20

This is an instance where using “char *ptr =” is different than “char ptr[] = ” even though, ultimately, they both are pointers to some memory location where those characters exist.

At work, I ran across a bunch of test code that did this:

const char    PROMPT[] = "Shell: ";
const uint8_t PROMPT_LEN = 7;

const char    LOGIN[] = "Login: ";
const uint8_t LOGIN_LEN = 7;

Those strings would be used elsewhere, and the length needed to be known by some write()-type function. Counting bytes in a quotes string and keeping that number updated sounds like work, so instead they could have used the sizeof() macro. Since it returns the size of the array (including the NIL zero byte at the end), they’d need to subtract one like this:

const char    PROMPT[] = "Shell: ";
const uint8_t PROMPT_LEN = sizeof(PROMPT)-1;

const char    LOGIN[] = "Login: ";
const uint8_t LOGIN_LEN = sizeof(LOGIN)-1;

At compile time, the size of the character array is known, and the compiler substitutes that length where the “sizeof()” macro is. If the string is changed, that value also changes (at compile time).

Of course, since we are using NIL terminated strings, you could also just use the strlen() function. But, that is more for strings of unknown length, and it runs code that counts every character until the NIL zero, which is wasted CPU use and code space if you don’t actually need to do that.

My optimization tip for today is: If you are using hard coded constant strings, and you need to know the size of them, declare them as C arrays (not a pointer to the string) and use the sizeof() macro as a constant. Use strlen() only for times when the compiler cannot know the size of the character array (dynamic strings or things being passed in to a function from the outside).

Speaking of that … as long as the compiler can “see” where the array is declared, sizeof() will work. But if you had something like this:

void showSize(char *ptr)
{
  printf("showSize - sizeof(ptr) = %d\n", sizeof(ptr));
}

int main()
{
  const char    LOGIN[] = "Login: ";

  showSize(LOGIN);

  return EXIT_SUCCESS;
}

…that will not work. By the time you pass in just a “pointer to” the array, all the compiler sees (inside that showSize function) is a pointer, and thus can only tell you the size of the pointer, and not what it points to.

As you see, this tip is of limited use, but I think it is still neat and a potential way to save some CPU cycles and program space bytes from time to time. Since I have worked on a number of Arduino Sketches that have gotten too big to fit (also on some TI MSP430 projects), small tricks like this can make a very big difference in getting something to fit or not fit.

sizeof() can matter :-)

Building safer C string functions, part 1

In an earlier series, I discussed some easy ways to prevent buffer overrun problems when doing copies of C strings. As part of this, I created a few of my own implementations for things like strncpy() and strnlen(). Much of what I did as a workaround could be simplified if we had a smarter string concatenate function, so today I’d like to present one.

Fixing strncat

The max-limited string copy (strncpy) works well enough to prevent buffer overruns when copying strings, but the max-limited string concatenate (strncat) does not. It only limits how many characters it copies from the source buffer, without any regard to how much room is left in the destination buffer. (Is this of any use?)

We can do better.

The first thing strcat has to do is look at the destination buffer and seek to the end of whatever null terminated C string is there. Since it is already doing this work, it would be easy for it to limit how many characters it copies based on being told the maximum size of the destination buffer (as opposed to strncat, which limits based on a maximum size of the source buffer).

I am envisioning a function that looks like strncat(), but the max number passed in is for the destination buffer. Thus, if I try to append a string of 10 characters to a buffer that can hold up to 40 characters, I’d just append with a value of 40, and the function would check how much is already in the buffer and do the math for me. Because math is hard.

Here is what it might look like:

char * strncatdst( char * destination, const char * source, size_t num )
{
  size_t len;
  size_t left;

  // Step 1 - find out how much data is in the destination buffer.
  len = strlen( destination );

  // If string len is longer than we want...
  if (len > num)
  {
    // ...limit the len to be the max num.
    len = num;
  }

  // len comes back with how much is in the buffer or maxed to num.

  // Step 2 - find out how much room is left
  left = num - len;

  // Step 3 - copy up to a null, or until we hit the max size.
  // We always copy one less because strncat() adds an extra null.
  strncat( destination, source, left-1 );

  // Return destination pointer, because C says so.
  return destination;
}

This demonstrates a simple way to make a custom version of strncat() by using other standard C library functions. It could be used like this:

#define BUFSIZE 20

int main()
{
  char buffer[BUFSIZE];
  char *string = "This is a long string";

  // Put an initial string in the buffer.
  strcpy(buffer, "new:");

  // Using our own strncatdst() function: 
  strncatdst(buffer, string, BUFSIZE );

  printf("buffer = '%s'\n", buffer);

  return EXIT_SUCCESS;
}

Of course, we probably want to use a safer strcpy() as well, so going back to an earlier article I wrote, I should have done this:

// Put an initial string in the buffer.
strncpy(buffer, "new:", BUFSIZE-1);
buffer[BUFSIZE-1] = '\0';

// Using our own strncatdst() function:
strncatdst(buffer, string, BUFSIZE );

printf("buffer = '%s'\n", buffer);

Even though I know I am only copying four characters (“new:”) there, I might not be sure of the length if I was copying in some string that was created somewhere else, or if I (or someone) changed that string to something longer without thinking about the buffer size.

Fixing strncpy

strncpy() could also use a bit more work because it does not put in a null terminator (required for C strings) if the string being copied is as long (or longer) as the max length specified.

Let’s see if we can make an improved strncpy() that handles the null terminator:

char * strncpynull( char * destination, const char * source, size_t num )
{
  // Step 1 - copy up to 1 less than num characters.
  strncpy( destination, source, num-1 );

  // Step 2 - make sure there is null terminator, in case num reached.
  destination[num-1] = '\0';

  return destination;
}

As you can see, this is just a wrapper for the standard strncpy() library function that adds a final null just in case the string is that long.

Now we can use it like this:

strncpynull(buffer, "This is a really long string.", BUFSIZE);

printf("buffer = '%s'\n", buffer);

…and it will make sure it trims long strings to to len-1, and adds a null terminator (which standard strncpy does not do).

Is it fixed yet?

These two functions should protect us against string buffer overruns, provided we know the size of the destination buffer.

However, by calling existing library functions, we are adding extra overhead. If those functions are highly optimized and very well done, this may still be more efficient than doing them yourself (and it is certainly easier to leverage existing functions rather than rolling your own). However, there are still a few potential issues that bother me.

For instance, for my strncatdst() function, the first thing I do is use strlen() to get the length of the string. It does this by starting at the first byte of the destination buffer and walking through it until it finds a null character. If this was a corrupt pointer, it might find itself walking through bogus memory until it happens to find a zero, potentially crashing the program from a memory access exception (if the operating system has such).

It is also not efficient because, after strlen(), the standard strncat() is used, and internally, it also must start with the first character in the destination buffer and walk through all the bytes until it finds a null (or the max num is reached), to know where it can start appending the source string. If we were doing this to a buffer containing a large 1K string, it would be walking through that 1K twice!

We can do better than that. Let’s see if we can create versions of these functions that do not use the existing C library functions.

Fixing strlen

First, recall my proposal for a version of strlen() that has a limit since there is no such strnlen() function as part of the ANSI-C standard library:

size_t strnlen( const char * str, size_t num )
{
  size_t len;

  len = strlen(str);

  // If string len is longer than we want...
  if (len > num)
  {
    // ...limit the len to be the max num.
    len = num;
  }

  // Return actual len, or max len.
  return len;
}

From the caller’s perspective, that seems fine, but this won’t actually solve the problem — it only hides it, and is still calling strlen() internally.

For this one, we really do need to create our own version so we can prevent it from scanning through 1K of memory if we know the string we expect to find should never be that long.

size_t strnlen2( const char * str, size_t num )
{
  size_t len;

  len = 0;

  while(len < num)
  {
    if (str[len]=='\0') break;
    len++;
  }

  return len;
}

This function will now stop counting at a value you specify. The original version I created used strlen() so it would count endlessly until it found a 0 before returning and having that value (if too big) clipped to the num passed in. This seems to be better for those chances when we are passed a bad pointer. (Not that that EVER happens, right?) It’s just not as efficient as it could be, so we’ll address that later.

Next, let’s use that code inside a new strncatdst() function:

char * strncatdst2( char * destination, const char * source, size_t num )
{
  size_t len;
  size_t index;

  // Step 1 - find out how much data is in the destination buffer.
  // This is basically the strnlen2() code, above.
  len = 0;

  while(len < num)
  {
    if (destination[len]=='\0') break;
    len++;
  }

  // len comes back with how much is in the buffer or maxed to num.

  // Step 2 - copy characters until we are out of room.
  index = 0;

  while(len < num)
  {
    destination[len] = source[index];
    if (source[index]=='\0') break;
    len++;
    index++;
  }

  // Step 3 - make sure string is null terminated. We really only
  // need to do this is len==num, but the overhead of adding the
  // check is probably more than just always doing it.
  if (len == num)
  {
    destination[num-1] = '\0';
  }

  // Return destination pointer, because C says so.
  return destination;
}

Now we can append short or long strings to a destination buffer, and ensure we never copy more than the size of that buffer, including a null terminator we will add if needed.

Next is a version of strncpynull() that does not use library functions. I previously shared a simple strncpy() implementation to demonstrate what it did (padding short strings with nulls). Using that as a reference, we have:

char * strncpynull2( char * destination, const char * source, size_t num )
{
  size_t index;

  // Step 1 - copy up to 1 less than num characters.
  index = 0;

  while(index < num-1) // One less, to leave room for null.
  {
    if (source[index]=='\0') break; // Exit the for loop.
    destination[index] = source[index];
    index++;
  }

  // Here we have copied 'index' characters.

  // If less than num, fill the rest with nulls.
  while(index < num)
  {
    destination[index] = '\0';
    index++;
  }

  return destination;
}

Now we have our own free-standing enhanced versions of strncpy(), strncat() and strnlen().

We should probably look at optimizing them so they are less stupid!

To be continued…

C strcat, strcpy and armageddon, part 6

See also: part 1part 2part 3part 4 and part 5.

Updates:

  • 3/3/2016 – Updated note about the problem with strlen(), referencing the discussion in part 5, and the solution in the summary.

And now, a simple one-page summary of how to make copying/appending C strings safer and hopefully avoid potential buffer overrun crashes (or other problems).

Copying Strings

Instead of using strcpy(), which does absolutely no checking to see if it’s copying more data than the destination buffer can hold, use strncpy() to limit how much can be copied. If the max amount is copied, strncpy() will not null terminate the destination, so you need to do that yourself.

#define BUFSIZE 30 // size of desination buffer

char buffer[BUFSIZE] = { 0 }; // Initialize buffer with zeros.
char *longString = "Copy this long string to the buffer.";
char *shortString = "Short string.";

// Instead of strcpy(buffer, shortString), do this:
// Copy up to max buffer size-1 (leaving room for a null).
// In case of max-sized string, make sure to null terminate. 
strncpy( buffer, shortString, BUFSIZE-1 );
buffer[BUFSIZE-1] = '\0';

printf( "Buffer: '%s'\n", buffer );

// Instead of strcpy(buffer, longString), do this:

// Copy up to max buffer size-1 (leaving room for a null).
// In case of max-sized string, make sure to null terminate. 
strncpy( buffer, longString, BUFSIZE-1 );
buffer[BUFSIZE-1] = '\0';

printf( "Buffer: '%s'\n", buffer );

That should produce the following output:

Buffer: 'Short string.'
Buffer: 'Copy this long string to the '

Appending Strings

Instead of using strcat() to append a string to an existing string buffer, use strncat() and some math to not copy more than the buffer can hold (counting how many characters are already in it). NOTE: As mentioned in part 5, the use of strlen() is still a point of failure so this is actually NOT a reliable fix.

#define BUFSIZE 30 // size of destination buffer

char buffer[BUFSIZE] = { 0 }; // Initialize buffer with zeros.
char *string1 = "Buffer start.";
char *string2 = "This is what we will be appending.";

printf( "Initial buffer  : '%s'\n", buffer );

// Let's put something in the buffer to demonstrate.
// In case of max-sized string, make sure to null terminate. 
strncpy( buffer, string1, BUFSIZE-1 );
buffer[BUFSIZE-1] = '\0';

printf( "Copied buffer   : '%s'\n", buffer );

// Instead of strcat(buffer, string2), do this:

// Let's "safely" append something to the buffer.
strncat( buffer, string2, BUFSIZE-strlen(buffer)-1 );
// NOTE: strlen() can cause a failure if the original buffer
// was already bad with too much data. We need an strnlen()
// but no such function exists in ANSI-C. See the followup
// article for a "roll your own" set of functions to solve this.
//strncat( buffer, string2, BUFSIZE-strnlen(buffer, BUFSIZE)-1);

printf( "Appended buffer: '%s'\n", buffer );

This should produce the following output:

Initial buffer : ''
Copied buffer : 'Buffer start.'
Appended buffer: 'Buffer start.This is what we '

Comparing Strings

And, instead of using strcmp() to compare strings, use strncmp() so you can set the maximum string size expected.

#define FIRSTNAME_SIZE 20

char firstName[FIRSTNAME_SIZE] = {0}; // Initialize with zeros.

...

// Instead of strcmp(buffer, firstname), do this:

if (strncmp( buffer, firstName, FIRSTNAME_SIZE ) == 0)
{
  ...

Follow these three simple steps and you will greatly reduce your chance of a buffer overrun that could cause unexpected problems.

Look for an upcoming article that will expand on this with replacement functions you can use to make things easier and more efficient.