Building safer C string functions, part 1

In an earlier series, I discussed some easy ways to prevent buffer overrun problems when doing copies of C strings. As part of this, I created a few of my own implementations for things like strncpy() and strnlen(). Much of what I did as a workaround could be simplified if we had a smarter string concatenate function, so today I’d like to present one.

Fixing strncat

The max-limited string copy (strncpy) works well enough to prevent buffer overruns when copying strings, but the max-limited string concatenate (strncat) does not. It only limits how many characters it copies from the source buffer, without any regard to how much room is left in the destination buffer. (Is this of any use?)

We can do better.

The first thing strcat has to do is look at the destination buffer and seek to the end of whatever null terminated C string is there. Since it is already doing this work, it would be easy for it to limit how many characters it copies based on being told the maximum size of the destination buffer (as opposed to strncat, which limits based on a maximum size of the source buffer).

I am envisioning a function that looks like strncat(), but the max number passed in is for the destination buffer. Thus, if I try to append a string of 10 characters to a buffer that can hold up to 40 characters, I’d just append with a value of 40, and the function would check how much is already in the buffer and do the math for me. Because math is hard.

Here is what it might look like:

```char * strncatdst( char * destination, const char * source, size_t num )
{
size_t len;
size_t left;

// Step 1 - find out how much data is in the destination buffer.
len = strlen( destination );

// If string len is longer than we want...
if (len > num)
{
// ...limit the len to be the max num.
len = num;
}

// len comes back with how much is in the buffer or maxed to num.

// Step 2 - find out how much room is left
left = num - len;

// Step 3 - copy up to a null, or until we hit the max size.
// We always copy one less because strncat() adds an extra null.
strncat( destination, source, left-1 );

// Return destination pointer, because C says so.
return destination;
}
```

This demonstrates a simple way to make a custom version of strncat() by using other standard C library functions. It could be used like this:

```#define BUFSIZE 20

int main()
{
char buffer[BUFSIZE];
char *string = "This is a long string";

// Put an initial string in the buffer.
strcpy(buffer, "new:");

// Using our own strncatdst() function:
strncatdst(buffer, string, BUFSIZE );

printf("buffer = '%s'\n", buffer);

return EXIT_SUCCESS;
}
```

Of course, we probably want to use a safer strcpy() as well, so going back to an earlier article I wrote, I should have done this:

```// Put an initial string in the buffer.
strncpy(buffer, "new:", BUFSIZE-1);
buffer[BUFSIZE-1] = '\0';

// Using our own strncatdst() function:
strncatdst(buffer, string, BUFSIZE );

printf("buffer = '%s'\n", buffer);```

Even though I know I am only copying four characters (“new:”) there, I might not be sure of the length if I was copying in some string that was created somewhere else, or if I (or someone) changed that string to something longer without thinking about the buffer size.

Fixing strncpy

strncpy() could also use a bit more work because it does not put in a null terminator (required for C strings) if the string being copied is as long (or longer) as the max length specified.

Let’s see if we can make an improved strncpy() that handles the null terminator:

```char * strncpynull( char * destination, const char * source, size_t num )
{
// Step 1 - copy up to 1 less than num characters.
strncpy( destination, source, num-1 );

// Step 2 - make sure there is null terminator, in case num reached.
destination[num-1] = '\0';

return destination;
}
```

As you can see, this is just a wrapper for the standard strncpy() library function that adds a final null just in case the string is that long.

Now we can use it like this:

```strncpynull(buffer, "This is a really long string.", BUFSIZE);

printf("buffer = '%s'\n", buffer);
```

…and it will make sure it trims long strings to to len-1, and adds a null terminator (which standard strncpy does not do).

Is it fixed yet?

These two functions should protect us against string buffer overruns, provided we know the size of the destination buffer.

However, by calling existing library functions, we are adding extra overhead. If those functions are highly optimized and very well done, this may still be more efficient than doing them yourself (and it is certainly easier to leverage existing functions rather than rolling your own). However, there are still a few potential issues that bother me.

For instance, for my strncatdst() function, the first thing I do is use strlen() to get the length of the string. It does this by starting at the first byte of the destination buffer and walking through it until it finds a null character. If this was a corrupt pointer, it might find itself walking through bogus memory until it happens to find a zero, potentially crashing the program from a memory access exception (if the operating system has such).

It is also not efficient because, after strlen(), the standard strncat() is used, and internally, it also must start with the first character in the destination buffer and walk through all the bytes until it finds a null (or the max num is reached), to know where it can start appending the source string. If we were doing this to a buffer containing a large 1K string, it would be walking through that 1K twice!

We can do better than that. Let’s see if we can create versions of these functions that do not use the existing C library functions.

Fixing strlen

First, recall my proposal for a version of strlen() that has a limit since there is no such strnlen() function as part of the ANSI-C standard library:

```size_t strnlen( const char * str, size_t num )
{
size_t len;

len = strlen(str);

// If string len is longer than we want...
if (len > num)
{
// ...limit the len to be the max num.
len = num;
}

// Return actual len, or max len.
return len;
}
```

From the caller’s perspective, that seems fine, but this won’t actually solve the problem — it only hides it, and is still calling strlen() internally.

For this one, we really do need to create our own version so we can prevent it from scanning through 1K of memory if we know the string we expect to find should never be that long.

```size_t strnlen2( const char * str, size_t num )
{
size_t len;

len = 0;

while(len < num)
{
if (str[len]=='\0') break;
len++;
}

return len;
}
```

This function will now stop counting at a value you specify. The original version I created used strlen() so it would count endlessly until it found a 0 before returning and having that value (if too big) clipped to the num passed in. This seems to be better for those chances when we are passed a bad pointer. (Not that that EVER happens, right?) It’s just not as efficient as it could be, so we’ll address that later.

Next, let’s use that code inside a new strncatdst() function:

```char * strncatdst2( char * destination, const char * source, size_t num )
{
size_t len;
size_t index;

// Step 1 - find out how much data is in the destination buffer.
// This is basically the strnlen2() code, above.
len = 0;

while(len < num)
{
if (destination[len]=='\0') break;
len++;
}

// len comes back with how much is in the buffer or maxed to num.

// Step 2 - copy characters until we are out of room.
index = 0;

while(len < num)
{
destination[len] = source[index];
if (source[index]=='\0') break;
len++;
index++;
}

// Step 3 - make sure string is null terminated. We really only
// need to do this is len==num, but the overhead of adding the
// check is probably more than just always doing it.
if (len == num)
{
destination[num-1] = '\0';
}

// Return destination pointer, because C says so.
return destination;
}
```

Now we can append short or long strings to a destination buffer, and ensure we never copy more than the size of that buffer, including a null terminator we will add if needed.

Next is a version of strncpynull() that does not use library functions. I previously shared a simple strncpy() implementation to demonstrate what it did (padding short strings with nulls). Using that as a reference, we have:

```char * strncpynull2( char * destination, const char * source, size_t num )
{
size_t index;

// Step 1 - copy up to 1 less than num characters.
index = 0;

while(index < num-1) // One less, to leave room for null.
{
if (source[index]=='\0') break; // Exit the for loop.
destination[index] = source[index];
index++;
}

// Here we have copied 'index' characters.

// If less than num, fill the rest with nulls.
while(index < num)
{
destination[index] = '\0';
index++;
}

return destination;
}
```

Now we have our own free-standing enhanced versions of strncpy(), strncat() and strnlen().

We should probably look at optimizing them so they are less stupid!

To be continued…

C strcat, strcpy and armageddon, part 6

• 3/3/2016 – Updated note about the problem with strlen(), referencing the discussion in part 5, and the solution in the summary.

And now, a simple one-page summary of how to make copying/appending C strings safer and hopefully avoid potential buffer overrun crashes (or other problems).

Copying Strings

Instead of using strcpy(), which does absolutely no checking to see if it’s copying more data than the destination buffer can hold, use strncpy() to limit how much can be copied. If the max amount is copied, strncpy() will not null terminate the destination, so you need to do that yourself.

```#define BUFSIZE 30 // size of desination buffer

char buffer[BUFSIZE] = { 0 }; // Initialize buffer with zeros.
char *longString = "Copy this long string to the buffer.";
char *shortString = "Short string.";

// Instead of strcpy(buffer, shortString), do this:
// Copy up to max buffer size-1 (leaving room for a null).
// In case of max-sized string, make sure to null terminate.
strncpy( buffer, shortString, BUFSIZE-1 );
buffer[BUFSIZE-1] = '\0';

printf( "Buffer: '%s'\n", buffer );

// Instead of strcpy(buffer, longString), do this:

// Copy up to max buffer size-1 (leaving room for a null).
// In case of max-sized string, make sure to null terminate.
strncpy( buffer, longString, BUFSIZE-1 );
buffer[BUFSIZE-1] = '\0';

printf( "Buffer: '%s'\n", buffer );```

That should produce the following output:

```Buffer: 'Short string.'
Buffer: 'Copy this long string to the '```

Appending Strings

Instead of using strcat() to append a string to an existing string buffer, use strncat() and some math to not copy more than the buffer can hold (counting how many characters are already in it). NOTE: As mentioned in part 5, the use of strlen() is still a point of failure so this is actually NOT a reliable fix.

```#define BUFSIZE 30 // size of destination buffer

char buffer[BUFSIZE] = { 0 }; // Initialize buffer with zeros.
char *string1 = "Buffer start.";
char *string2 = "This is what we will be appending.";

printf( "Initial buffer  : '%s'\n", buffer );

// Let's put something in the buffer to demonstrate.
// In case of max-sized string, make sure to null terminate.
strncpy( buffer, string1, BUFSIZE-1 );
buffer[BUFSIZE-1] = '\0';

printf( "Copied buffer   : '%s'\n", buffer );

// Instead of strcat(buffer, string2), do this:

// Let's "safely" append something to the buffer.
strncat( buffer, string2, BUFSIZE-strlen(buffer)-1 );
// NOTE: strlen() can cause a failure if the original buffer
// was already bad with too much data. We need an strnlen()
// but no such function exists in ANSI-C. See the followup
// article for a "roll your own" set of functions to solve this.
//strncat( buffer, string2, BUFSIZE-strnlen(buffer, BUFSIZE)-1);

printf( "Appended buffer: '%s'\n", buffer );```

This should produce the following output:

```Initial buffer : ''
Copied buffer : 'Buffer start.'
Appended buffer: 'Buffer start.This is what we '```

Comparing Strings

And, instead of using strcmp() to compare strings, use strncmp() so you can set the maximum string size expected.

```#define FIRSTNAME_SIZE 20

char firstName[FIRSTNAME_SIZE] = {0}; // Initialize with zeros.

...

// Instead of strcmp(buffer, firstname), do this:

if (strncmp( buffer, firstName, FIRSTNAME_SIZE ) == 0)
{
...```

Follow these three simple steps and you will greatly reduce your chance of a buffer overrun that could cause unexpected problems.

Look for an upcoming article that will expand on this with replacement functions you can use to make things easier and more efficient.

C strcat, strcpy and armageddon, part 5

In our quest to find the safest (or at least a safer) way to copy or append strings, we have been discussing the C library functions strncpy() and strncat(). In the previous installment I mentioned there is a potential problem. That problem is with strlen(), which we are using for strncat() to determine how much data is already in the destination buffer:

```// Create full name by starting with first name...
strncpy( fullName, firstName, FULLNAME_SIZE-1 );
fullName[FULLNAME_SIZE-1] = '\0';

// ...then appending a space...
strncat( fullName, " ", FULLNAME_SIZE-strlen(fullname)-1 );

// ...then appending the last name.
strncat( fullName, lastName, FULLNAME_SIZE-strlen(fullName)-1 );```

We are assuming that the length of the data in the buffer is never greater than what the buffer can hold. But what if, somehow, the existing string was too large? If strlen(fullName) was larger than FULLNAME_SIZE, the math would break. Let’s say FULLNAME_SIZE is 20, but the string copied there was 25 (a buffer overrun):

`FULLNAME_SIZE-strlen(fullName)-1`

…is 20-25-1 which is -6 which means:

`strncat( fullName, " ", -6 ); // error!`

I would think that strncat() should fail because it is being passed a negative number for the count, but that third parameter is of type size_t, which, according to the cplusplus.com page, is:

Unsigned integral type
Alias of one of the fundamental unsigned integer types.It is a type able to represent the size of any object in bytes: size_t is the type returned by the `sizeof` operator and is widely used in the standard library to represent sizes and counts.

Unsigned numbers can only be positive (0 and up), while signed numbers can be negative or positive. If you pass in a -1 as an unsigned value, it will actually look like a huge positive number. (For signed numbers, one of the bits is used to indicate if it is positive or negative. If you use an 8-bit value, that can represent the unsigned values of 0-255, or the signed values of -128 to 127).

Thus, because of bad math, strncat() will operate as it was instructed, thinking the limit is some huge number.

Gotcha.

What we really need is a version or strlen() that stops counting when it his a maximuim size (like a strnlen() call). If we did, we could also give it the limit of our buffer:

`FULLNAME_SIZE-strnlen(fullName, FULLNAME_SIZE)-1`

That way, strnlen() would never return anything greater than FULLNAME_SIZE, so if it didn’t find a null (0) terminator before that character, it would just return the max. Thus, the above example would be 20-20-1 which is… -1.

Crap.

We would have to treat strnlen() the same way we do the other calls, and subtract one from it (a 40 character max buffer would return 39, always leaving room for an extra null terminator):

`FULLNAME_SIZE-strnlen(fullName, FULLNAME_SIZE-1)-1`

That would give us 20-19-1 which is 0, and passing in a 0 to strncat() should fail, not appending anything.

Problem solved.

Unfortunately, the ANSI-C standard has no such function (though some do offer it as a non-standard function). This means to be safe in these situations, you would have to create our own strnlen() function. A simple wrapper to the existing strnlen() should work:

```// Return the length of the string, or num,
// whichever is smaller.
size_t strnlen( const char * str, size_t num )
{
size_t len;

len = strlen(str);

// If string len is longer than we want...
if (len > num)
{
// ...limit the len to be the max num.
len = num;
}

// Return actual len, or max len.
return len;
}
```

Problem actually solved.

This may seem to be a very unlikely error, but maybe you have a function written to deal with a buffer up up to 40 characters, but the caller was creating one that could hold up to 60. They might have a string there longer than you expect, so when you go to append (thinking 40 is the max) there might already be a longer string there.

This type of problem can be quite common when using code from other libraries or projects.

Or, perhaps someone properly used the buffer size, but used strncpy() and it filled it up without a null terminator to stop strlen(). Ah! This seems like a much more common issue.

Pity my proposed strnlen() is still potentially inefficient (and could even cause a crash). Do you see why? I guess we will have to fix that, too.

In the next part, I will summarize all of this on one simple “better practices” page for dealing with string copies or appends.

See you then…

C strcat, strcpy and armageddon, part 4

The story so far . . . String copies can be made much safer by using strncpy() instead of strcpy(). strncpy() will take up slightly more code space than strcpy() and may be slower.

Now let’s move on to appending strings with strcat() (“string concatenate”). From the very-useful cplusplus.com website:

• strcat( char *destination, const char *source ) – Concatenate strings. Appends a copy of the source string to the destination string. The terminating null character in destination is overwritten by the first character of source, and a null-character is included at the end of the new string formed by the concatenation of both in destination.

As an example of normal use, perhaps you want to take two string buffers that contain a first and last name, and put them together and make a full name string buffer:

```char firstName[40];
char lastName[40];
char fullName[81]; // firstName + space + lastName

strcpy( firstName, "Zaphod" );

strcpy( lastName, "Beeblebrox" );

// Create full name by starting with first name...
strcpy( fullName, firstName );

// ...then appending a space...
strcat( fullName, " " );

// ...then appending the last name.
strcat( fullName, lastName );```

The fullName buffer looks like this:

```// strcpy( fullName, firstName )
+---+---+---+---+---+---+---+---+---+---+---+---+-...
| Z | a | p | h | o | d | 0 |   |   |   |   |   | ...
+---+---+---+---+---+---+---+---+---+---+---+---+-...

// strcat( fullName, " " );
+---+---+---+---+---+---+---+---+---+---+---+---+-...
| Z | a | p | h | o | d |   | 0 |   |   |   |   | ...
+---+---+---+---+---+---+---+---+---+---+---+---+-...

// strcat( fullName, lastName )
+---+---+---+---+---+---+---+---+---+---+---+---+-...
| Z | a | p | h | o | d |   | B | e | e | b | l | ...
+---+---+---+---+---+---+---+---+---+---+---+---+-...
```

Hopefully you get the idea.

In this simple example, we are controlling the length of the name strings being copied in. We know that fullName can hold 81 bytes, so we know firstName and lastName plus the space in between must be less than 81 bytes long to avoid overflowing the fullName buffer.

In a perfect world, that is fine.  But in a perfect world, we would never need any error checking. Let’s just pretend the world isn’t perfect and do this the safe(r) way.

Just like strcpy() has strncpy(), strcat() also has a safer version. It is called strncat():

• char * strncat ( char * destination, const char * source, size_t num ) – Append characters from string. Appends the first num characters of source to destination, plus a terminating null-character.
If the length of the C string in source is less than num, only the content up to the terminating null-character is copied.

For strncat(), num specifies how many characters of the source to append to the destination buffer. If you know the destination buffer (fullName, in this example) is 81 characters (because it is, in this example), you might think you could just do this:

`strncat( fullName, firstName, 81 );`

While you can do that, that would be wrong. The num count only controls how many characters are appended — it does not have anything to do with the count of how many characters are already in the destination buffer. For example, say fullName already has a 6 character firstName copied to it:

```+---+---+---+---+---+---+---+---+--...--+---+---+---+
| Z | a | p | h | o | d | 0 |   |  ...  |   |   |   | <- fullName
+---+---+---+---+---+---+---+---+--...--+---+---+---+```

If fullname is able to hold 81 characters, and already contains “Zaphod” (6 characters, not counting the null), the maximum size of a string we could append would be 75 (81-6) characters. Remember that the null (0) character just marks the end of a C string, and gets overwritten by the next strcat() data.

This means what we really want is:

`strncat( fullName, firstName, 75 );`

Or do we? Actually, there’s another difference with strncat() versus strncpy(), and it was clearly stated in the function description:

Appends the first num characters of source to destination, plus a terminating null-character.

strncat() always appends the null (0) character, meaning if you gave it a long source string and told it to append up to 10 characters, it would append 10 characters plus the null (0) character. Thus, that 75 could actually append 76 characters! We have to always subtract one to avoid a buffer overrun.

```// Copy up to 74 characters + null to the 81 character
// fullName buffer which already contains 6 characters.
strncat( fullName, firstName, 74 );```

But hard-coding the numbers like that isn’t possible if we don’t know how many characters are already in the destination buffer. Instead, we can use some other C string calls to determine that and then do some math.

strlen() will return a count of how many characters are in a string buffer. It counts up to the first null (0) it finds. Thus, strlen(fullName) would return 6 (“Zaphod”). If we know the full size of the buffer, we can use strlen() to determine what is already there, and then simply subtract to know how many free bytes the buffer has. And since strncat() always adds a null, we subtract 1:

`strncat( fullName, firstName, 81-strlen(fullName)-1 );`

This is getting messy, but that is the basic way to ensure that strcat() doesn’t append too much data. Let’s update the original example a bit more:

```#define FIRSTNAME_SIZE 40
#define LASTNAME_SIZE  40
// firstName + space + lastName
#define FULLANME_SIZE  FIRSTNAME_SIZE + 1 + LASTNAME_SIZE

char firstName[FIRSTNAME_SIZE];
char lastName[LASTNAME_SIZE];
char fullName[FULLNAME_SIZE];

strncpy( firstName, "Zaphod", FIRSTNAME_SIZE-1 );
firstName[FIRSTNAME_SIZE-1] = '\0';

strncpy( lastName, "Beeblebrox", LASTNAME_SIZE-1 );
lastName[LASTNAME_SIZE-1] = '\0';

// Create full name by starting with first name...
strncpy( fullName, firstName, FULLNAME_SIZE-1 );
fullName[FULLNAME_SIZE-1] = '\0';

// ...then appending a space...
strncat( fullName, " ", FULLNAME_SIZE-strlen(fullname)-1 );

// ...then appending the last name.
strncat( fullName, lastName, FULLNAME_SIZE-strlen(fullName)-1 );```

So much extra code, but necessary to avoid a potential buffer overrun if the string being appended was passed in from another function where you don’t know the length.

Now we have a “safe” string append that can’t possibly write past the end of the destination buffer (fullName). If the string is too long, it just stops and puts a null there, truncating the string.

If every string copy is done using strncpy() to assure the destination buffer is never overran, and if every string append is done using strncat() with checks to limit how many characters can be appended, you practically eliminate the chance that a buffer overrun could occur and corrupt or crash your program.

However… If I were writing code to run a nuclear reactor, I might still take some extra steps to make sure the data I am using is valid.

Next time, we will look at a problem with this code. (Hint: What if something is wrong with the initial buffer that you are trying to append to?)

Until then…

64K TRS-80 CoCo memory test

• 2016/9/2 – Removed a duplicat line in the 2nd listing. Maybe fixed a typo or two.

On startup, a cassette-based CoCo has 24871 bytes available for BASIC.

Recently, Richard Ivey became the latest person in the Facebook TRS-80 / Color Computer group to ask how to tell if an old Radio Shack Color Computer had 64K without opening the case. The problem has to do with backwards compatibility. When the original Radio Shack TRS-80 Color Computer was released in 1980, it was sold as either a 4K or 16K system. Later, a 32K model would be available, and the Microsoft COLOR BASIC would give about 24K of free memory (with the rest of the memory used by the video display, cassette buffers, BASIC input buffer, etc.).

Update: See this earlier article about getting more memory for BASIC. Here is an excerpt:

64K NOTE: The reason BASIC memory is the same for 32K and 64K is due to legacy designs. The 6809 processor can only address 16-bits of memory space (64K). The BASIC ROMs started in memory at \$8000 (32768, the 32K halfway mark). This allowed the first 32K to be RAM for programs, and the upper 32K was for BASIC ROM, Extended BASIC ROM, Disk BASIC ROM and Program Pak ROMs. Early CoCo hackers figured out how to piggy-pack 32K RAM chips to get 64K RAM in a CoCo, but by default that RAM was “hidden” under the ROM address space. In assembly language, you could map out the ROMs and access the full 64K of RAM. But, since a BASIC program needed the BASIC ROMs, only the first 32K was available.

When 64K upgraded became available, the original BASIC would still only report about 24K free since it had never been modified to make use of the extra memory. Thus, typing “PRINT MEM” on a 32K CoCo 1 shows the same thing it does on a 512K (or greater) CoCo 3.

So how do you tell? One easy way is to just try to load a program or game that requires 64K and see if it works. But, if all you have is the CoCo, there is a short program you could type in to test. (NOTE: See a better listing later in this article.)

```10 READ A\$:IF A\$="X" THEN END
20 POKE 20000+N,VAL("&H"+A\$)
30 N=N+1:GOTO 10
40 DATA 34,01,1A,50,10,8E,80,00
50 DATA B7,FF,DE,EC,A4,AE,22,EE
60 DATA 24,B7,FF,DF,ED,A1,AF,A1
70 DATA EF,A1,10,8C,FE,FC,25,E8
80 DATA 10,8C,FF,00,24,0C,B7,FF
90 DATA DE,EC,A4,B7,FF,DF,ED,A1
100 DATA 20,EE,35,01,39
110 DATA X```

Thanks to Juan Castro for passing this along. I reformatted it so no line would be longer than the 32 column screen, hoping it makes it a bit easier to type in (though it does make the program longer, needing more, shorter lines).

On a 64K CoCo, this program will copy the ROMs to RAM and then switch in to all-RAM mode. RUN it, then type EXEC 20000 to execute it.

I believe this is the classic “ROM TO RAM” (or ROM2RAM) program that appeared somewhere in Rainbow magazine back probably around 1983. Basically, it places the system in to 64K mode (where memory addresses &H0000 to &HFFFF are all RAM) and copies the COLOR BASIC and, if installed, the EXTENDED and DISK BASIC ROMs in to that upper 64K so the system can still work.

If you type this in on a CoCo 1 or 2, then RUN it, it loads a small machine language program in starting ad memory address 20000. After this, you can type “EXEC 20000” to execute that machine language program. If you typed it in correctly, it should just return to BASIC and nothing should seem any different.

But, at this point (if it worked), the BASIC ROM is now in RAM, which means you can POKE to those memory locations and make changes.

Juan suggests an easy hack of changing where the prompt “OK” appears. He says:

Now try POKE &HABEF,89 — OK should become OY.

It worked for me in the Xroar emulator (configuration to emulated a 64K CoCo 2):

The BASIC “OK” prompt is changed to read “OY” (after placing the 64K CoCo in to all-RAM mode).

Thus, if you can run this program without it crashing, and that POKE works to change something formerly in ROM, your CoCo is operating in all-RAM mode and must have more than 32K.

In the future, I will have to track down a simpler way to test for 64K. Until then, happy typing…

Update: In the comments, Juan suggested making the following changes, so the program will execute the machine language program for you:

```10 READ A\$:IF A\$=”X” THEN 35
20 POKE 20000+N,VAL("&H"+A\$)
30 N=N+1:GOTO 10
35 EXEC 20000:POKE &HABEF,89:END
40 DATA 34,01,1A,50,10,8E,80,00
50 DATA B7,FF,DE,EC,A4,AE,22,EE
60 DATA 24,B7,FF,DF,ED,A1,AF,A1
70 DATA EF,A1,10,8C,FE,FC,25,E8
80 DATA 10,8C,FF,00,24,0C,B7,FF
90 DATA DE,EC,A4,B7,FF,DF,ED,A1
100 DATA 20,EE,35,01,39
110 DATA X```

With those changes, now all you have to do is type RUN and it will load the machine language program, execute it (to copy ROM in to RAM), then poke the “OK” prompt to say “OY”. Thanks, Juan!

C strcat, strcpy and armageddon, part 3

Previously, I discussed a way to make string copies safer by using strncpy(). I mentioned there would be a bit of extra overhead and I’d like to discuss that. This made me wonder: how much overhead? I decided to try and find out.

First, I created a very simple test program that copied a string using strcpy(), or strncpy() (with the extra null added).

```#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define BUFSIZE 20

//#define SMALL

int main()
{
char buffer[BUFSIZE];

#ifdef SMALL
// Smaller and faster, but less safe.
strcpy(buffer, "Hello");
#else
// Larger and slower, but more safe.
strncpy(buffer, "Hello", BUFSIZE-1);
buffer[BUFSIZE-1] = '\0';
#endif

// If you don't use 'buffer', it may be optimized out.
puts(buffer);

return EXIT_SUCCESS;
}
```

Since I am lazy, I didn’t want to make two separate test programs. Instead, I used a #define to conditionally compile which version of the string copy code I would use.

When I built this using GCC for Windows (using the excellent Code::Blocks editor/IDE), I found that each version produced a .exe that was 17920 bytes. I expect the code size difference might start showing up after using a bunch of these calls, so this test program was not good on a Windows compiler.

Instead, I turned to the Arduino IDE (currently version 1.6.7). It still uses GCC, but since it targets a smaller 16-bit AVR processor, it creates much smaller code and lets me see size differences easier. I modified the code to run inside the setup() function of an Arduino sketch:

```#define BUFSIZE 10
#define SMALL

void setup() {
// volatile to prevent optimizer from removing it.
volatile char buffer[BUFSIZE];

#ifdef SMALL
// Smaller and faster, but less safe.
strcpy((char*)buffer, "Hello");
#else
// Larger and slower, but more safe.
strncpy((char*)buffer, "Hello", BUFSIZE-1);
buffer[BUFSIZE-1] = '\0';
#endif
}

void loop() {
// put your main code here, to run repeatedly:
}
```

Then I selected Sketch Verify/Compile (Ctrl-R, or the checkmark button). Here are the results:

• SMALL (strcpy) – 540 bytes
• LARGE (strncpy) – 562 bytes

It seems moving from strcpy() to strncpy() would add only 22 extra bytes to my sketch. (Without the “buffer[BUFSIZE-1] = ‘\0’;” line, it was 560 bytes.)

Now, this does not mean that every use of strncpy() is going to add 20 bytes to your program. When the compiler links in that library code, only one copy of the strncpy() function will exist, so this is more of a “one time” penalty. To better demonstrate this, I created a program that would always link in both strcpy() and strncpy() so I could then test the overhead of the actual call:

```#define BUFSIZE 10
#define SMALL

void setup() {
// volatile to prevent optimizer from removing it.
volatile char buffer[BUFSIZE];

// For inclusion of both strcpy() and strncpy()
strcpy((char*)buffer, "Test");
strncpy((char*)buffer, "Test", BUFSIZE);

#ifdef SMALL
// Smaller and faster, but less safe.
strcpy((char*)buffer, "Hello");
#else
// Larger and slower, but more safe.
strncpy((char*)buffer, "Hello", BUFSIZE-1);
//buffer[BUFSIZE-1] = '\0';
#endif
}

void loop() {
// put your main code here, to run repeatedly:
}
```

Now, with both calls used (and trying to make sure the optimizer didn’t remove them), the sketch compiles to 604 bytes for SMALL, or 610 bytes for the larger strncpy() version. (Again, without the “buffer[BUFSIZE-1] = ‘\0’;” line it would be 608 bytes.)

Conclusions:

1. The strncpy() library function is larger than strcpy(). On this Arduino, it appeared to add 20 bytes to the program size. This is a one-time cost just to include that library function.
2. Making a call to strncpy() is larger than a call to strcpy() because it has to deal with an extra parameter. On this Arduino, each use would be 4 bytes larger.
3. Adding the null obviously adds extra code. On this Arduino, that seems to be 2 bytes. (The optimizer is probably doing something. Surely it takes more than two bytes to store a 0 in a buffer at an offset.)

Since the overhead of each use is only a few bytes, there’s not much of an impact to switch to doing string copies this safer way. (Assuming you can spare the extra 20 bytes to include the library function.)

Now we have a general idea about code space overhead, but what about CPU overhead? strncpy() should be slower since it is doing more work during the copy (checking for the max number of characters to copy, and possibly padding with null bytes).

To test this, I once again used the Arduino and it’s timing function, millis(). I created a sample program that would do 100,000 string copies and then print how long it took.

```#define BUFSIZE 10
//#define SMALL

void setup() {
// volatile to prevent optimizer from removing it.
volatile char buffer[BUFSIZE];
unsigned long startTime, endTime;

Serial.begin(115200); // So we can print stuff.

// For inclusion of both strcpy() and strncpy()
strcpy((char*)buffer, "Test");
strncpy((char*)buffer, "Test", BUFSIZE);

// Let's do this a bunch of times to test.
startTime = millis();

Serial.print("Start time: ");
Serial.println(startTime);

for (unsigned long i = 0; i < 100000; i++)
{
#ifdef SMALL
// Smaller and faster, but less safe.
strcpy((char*)buffer, "Hello");
#else
// Larger and slower, but more safe.
strncpy((char*)buffer, "Hello", BUFSIZE - 1);
buffer[BUFSIZE - 1] = '\0';
#endif
}
endTime = millis();

Serial.print("End time  : ");
Serial.println(endTime);

Serial.print("Time taken: ");
Serial.println(endTime - startTime);
}

void loop() {
// put your main code here, to run repeatedly:
}
```

When I ran this using SMALL strcpy(), it reports taking 396 milliseconds. When I run it using strncpy() with the null added, it reports 678 milliseconds. strcpy() appears to take about 60% of the time strncpy() does, at least for this test. (Maybe. Math is hard.)

Now, this is a short string that requires strncpy() to pad out the rest of the buffer. If I change it to use a 9 character string (leaving one byte for the null terminator):

```#ifdef SMALL
// Smaller and faster, but less safe.
strcpy(buffer, "123456789");
#else
// Larger and slower, but more safe.
strncpy(buffer, "123456789", BUFSIZE - 1);
buffer[BUFSIZE-1] = '\0';
#endif
```

…no padding will be done. Without padding, the SMALL version takes 572 and the strncpy()/null version takes… 478!?!

Huh? How can this be? How did the “small” version suddenly get SLOWER? Well, before, strcpy() only had to copy the five characters of “Hello” plus a null then it was done, while strncpy() had to copy “Hello” then pad out five nulls to fill the buffer. Once both had to do the same amount of work (copying nine bytes and a null), it appears that strncpy() is actually faster! (Your mileage may vary. Different compilers targeting different processors may generate code in vastly different ways.)

Perhaps there is just some optimization going on when the destination buffer size is know. (Note to self: Look in to the GCC strncpy source code and see what it does versus strcpy.)

Conclusion:

• strncpy() isn’t necessarily going to be slower (at least on this Arduino)!
• strncpy() might be significantly slower if you copy a very short string (“Hi”) in to a very long buffer (char buffer[80];).

I am sure glad we (didn’t) clear that up. In the next part, I’ll get back to talking about appending strings using strcat() and how to make that safer.

To be continued…

C strcat, strcpy and armageddon, part 2

In part 1, I gave a quick overview of the C library call, strcpy(), and showed how it is used to copy a string in to a buffer.  I specifically wanted to show a problem that can happen if you copy more data than the buffer can hold (a buffer overrun), and show a simple way to fix it using strncpy().

The final example would limit how much data was copied to the number of bytes the buffer could hold:

```char buffer[10];
strncpy( buffer, "I forgot how much room I have", 10);```

Not only does strncpy() limit how many characters will be copied, it also does something else. If the source string is shorter than the maximum number you specify to copy, it pads the rest of the destination buffer with null characters (zeros). If you give it a null terminated string shorter than the max…

`strncpy( buffer, "HI", 10);`

…that would result in:

```+---+---+---+---+---+---+---+---+---+---+
| H | I | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+---+---+---+---+---+---+---+---+---+---+```

Even though it’s clearly stated in the C library manual, I guess I never read that part. I had no idea it did that.

If I were to write my own crude implementation of strncpy(), it might look something like this:

```// My own crude implementation of strncpy()
char * myStrncpy( char *destination, const char *source, size_t num )
{
size_t i;

// Now we will begin copying characters until we copy a null (0) or
// we have copied num characters.
for (i=0; i<num ; i++)
{
// If we have found a null character...
if (source[i]=='\0') break; // Exit the for loop.

// Copy a character over.
destination[i] = source[i];
}
// Here we have copied 'i' characters.

// If less than num, fill the rest with nulls.
while(i < num)
{
destination[i] = '\0';
i++;
}

return destination; // Why? Because it just does.
}
```

While strncpy() will prevent us from a buffer overrun, passing in a string that is too large will create a different problem: It will not leave us with a usable C string. Per the cplusplus.com page on strncpy:

No null-character is implicitly appended at the end of destination if source is longer than num. Thus, in this case, destination shall not be considered a null terminated C string (reading it as such would overflow).

strncpy() will copy characters until it has copied a null (0) or has copied a max number of bytes you specify with the third parameter. If you copy that full amount, and no null was copied, the destination buffer will not have a null terminator. It won’t be a C string, and using it as one will do strange things.

```char buffer[10];
strncpy( buffer, "I forgot how much room I have", 10);```
```+---+---+---+---+---+---+---+---+---+---+
| I |   | f | o | r | g | o | t |   | h |
+---+---+---+---+---+---+---+---+---+---+```

In the above example, you end up with buffer full of 10 bytes of data without a null character at the end. You won’t be able to use it with any of the C string functions like printf(), strlen(), strcat(), etc. Well, you can, but you might not get what you expect.

If you try to use it, these functions will continue through memory until they find the next null character after the buffer. If the next byte in memory after the buffer just happens to be a zero, everything will work and you won’t see the problem. If the next 1K of memory is garbage that has NO zeros in it, the functions would begin at buffer and keep going as it if found a really long 1K string that starts out with what you expected, followed by garbage.

On operating systems that have memory protection, the program might even get terminated if it seeks through memory and ends up trying to read (still looking for that null byte) RAM that doesn’t belong that that process.

Don’t do that. Instead, when using strncpy(), always make sure you manually null terminate the buffer in case this situation occurs:

```// Copy up to 10 characters.
strncpy( buffer, "I forgot how much room I have", 10);
// null terminate destination at 10th byte (bytes 0-9)
buffer[9] = '\0';
```

The end result of that will be up to 9 characters followed by a null. (See #3 below for an optimization tip.)

```+---+---+---+---+---+---+---+---+---+---+
| I |   | f | o | r | g | o | t |   | 0 |
+---+---+---+---+---+---+---+---+---+---+```

Because of needing a null at the end, if you truly did want a 10 character string, you would have to make the buffer 11 bytes large, so it can hold the 10 characters you want plus the null character at the end.

Therefore, my suggestion for doing string copies is to do the following:

1. For clarity, use a #define for how large the buffer is, so you have that define available to use in strncpy(). This will mean only one place in code to change the buffer size if you need to (rather than having to edit every function instance of strncpy() that used it).
2. Always manually null terminate the last byte of the destination buffer after the copy, just in case a string as large or larger than the buffer is copied to it. (Yes, this will add to your code and take more CPU time, but we are going for safety here.)
3. Optimization tip: Since you are going to manually terminate the final byte anyway, you can copy one less byte in strncpy(). (There is no need to have strncpy() write out to the final byte only to have you write to that final byte again.)
```#define BUFSIZE 10

char buffer[BUFSIZE];

strncpy(buffer, "Here is my stuff", BUFSIZE-1);
buffer[BUFSIZE-1] = '\0';```

Above, we will allow up to 9 bytes to be copied in to the buffer (BUFSIZE-1), then we will null terminate the 10th byte (remember bytes are numbered starting at zero, so 0-9, therefore BUFSIZE-1 is 9, the 10th byte of the buffer).

If you truly did want to be able to hold a 10 character string, just add one to the BUFSIZE define. None of the other code will have to be touched.

Get in to that simple habit and your strncpy() will now not create a buffer overflow, and will never create a non-terminated C string.

And before you say it . . .

Now, obviously, if YOU are copying a literal “string” inside your own code, you can know exactly how large it is so using strcpy() is perfectly safe (and more efficient if you don’t need the rest of the destination buffer padded with zeros). BUT, if you code using strncpy() all the time, you will reduce the chances of you changing something later on that might cause you grief:

```strcpy( buffer, "Game Over" );
...later changed to...
strcpy( buffer, "Game over, dude! You suck!" );
```

Above, a simple change to a message might end up crashing your program because you forgot how large buffer was.

In the next installment, we will take a look at strcat() and appending strings.

C strcat, strcpy and armageddon, part 1

I am pretty sure writing crash-proof code used to be a thing. Back when computers were simpler, and an entire project was done by one developer, it was much rarer to find a glaring bug in a piece of released software.

Today I’d like to share some things you may never have known, or thought about, when it comes to standard C library string manipulation calls.

Upon the conclusion of this series, you will know some very simple things to do to make your program a bit more crash proof just by using “safer” versions of C string calls. Let’s begin with the “less safe” calls.

From the very-useful cplusplus.com website:

• strcat( char *destination, const char *source ) – Concatenate strings. Appends a copy of the source string to the destination string. The terminating null character in destination is overwritten by the first character of source, and a null-character is included at the end of the new string formed by the concatenation of both in destination.
• strcpy( char *destination, const char *source ) – Copy string. Copies the C string pointed by source into the array pointed by destination, including the terminating null character (and stopping at that point).
To avoid overflows, the size of the array pointed by destination shall be long enough to contain the same C string as source (including the terminating null character), and should not overlap in memory with source.

These simple functions allow us to copy, append and compare null terminated strings. A null is a zero that indicates the end of a string. Consider the string “HELLO WORLD”:

`char *message = "HELLO WORLD";`

In memory would be the characters “HELLO WORLD” followed by a 0 to indicate the end of the string:

```+---+---+---+---+---+---+---+---+---+---+---+---+---+
| H | E | L | L | O |   | W | O | R | L | D | ! | 0 |
+---+---+---+---+---+---+---+---+---+---+---+---+---+```

When we control everything, and know exactly what we are doing, strings are safe. But in the real world, there are some very serious problems that can happen with strings that can cause crashes from memory corruption. At best, this may just be annoying if your Arduino sketch won’t run, but at worst, it could mean Armageddon if the corruption is on a system that controls nuclear missiles.

Let’s discuss some very simple habits to get in to when using strings in C that will reduce the chances of an atomic meltdown. Or your blinking LEDs from stopping blinking…

First, here is an example of the problem. Suppose you have a 10-byte string buffer:

`char buffer[10];`

Somewhere in memory will be 10 bytes reserved for that buffer:

```+---+---+---+---+---+---+---+---+---+---+
|   |   |   |   |   |   |   |   |   |   | <- buffer
+---+---+---+---+---+---+---+---+---+---+```

You can load a string in to that buffer using strcpy() like this:

`strcpy( buffer, "HELLO" );`

Then buffer will contain your string, including the null character at the end:

```+---+---+---+---+---+---+---+---+---+---+
| H | E | L | L | O | 0 |   |   |   |   | <- buffer
+---+---+---+---+---+---+---+---+---+---+```

Anything else in the buffer may be random junk from what might have been in that memory before the buffer was allocated. If this was a concern, we could have initialized that buffer memory ahead of time by doing something like this:

```char buffer[10] = { 0 };
```

…or in your code, you could use memset() to clear the contents of the buffer:

```memset( buffer, 0, 10 ); // Initialize buffer with zeros
```

But I digress. (Not to self: I should share some pros and cons of initializing data like that sometime.)

When you control the world, you can do things like this safely. But what if you forget how large buffer is, and try to load something larger in to it?

`strcpy( buffer, "I forgot how much room I have" );`

strcpy() will gladly start copying everything you give it up until it finds a null at the end of the string. In C, when you created a “quoted string” it automatically has a null added to the end of it.

Thus, it begins copying “I forgot how…” in to the 10 bytes of buffer, and keeps on going well past the end of buffer, overwriting whatever happens to be in memory past that area. This is a very common mistake.
Suppose you had created two buffers, and the compiler created them in memory next to each other:

```char buffer[10];
char launchCode[10];```

As the above strcpy() began writing to buffer, it would fill up buffer then continue to write over part of launchCode, corrupting whatever was there. This is bad, and you might not even notice if if you hadn’t checked launchCode to see if it was still intact.

```+---+---+---+---+---+---+---+---+---+---+
| I |   | f | o | r | g | o | t |   | h | <- buffer
+---+---+---+---+---+---+---+---+---+---+
| o | w |   | m | u | c | h |   | r | o | <- launchCode
+---+---+---+---+---+---+---+---+---+---+
| o | m |   | I |   | h | a | v | e | 0 | ...other memory...
+---+---+---+---+---+---+---+---+---+---+```

Worse, you might have had other types of variables there for counters, high scores or mom’s birthday, and overwriting the buffer might end up corrupting some numeric variable and cause your program to behave in unexpected ways. This can be madness to figure out :)

The solution is to use a special version of string copy that lets you specify the maximum number of characters it will copy before it stops.

• strncpy( char *destination, const char *source, size_t num ) – Copy characters from string. Copies the first num characters of source to destination. If the end of the source C string (which is signaled by a null-character) is found before num characters have been copied, destination is padded with zeros until a total of numcharacters have been written to it.
No null-character is implicitly appended at the end of destination if source is longer than num. Thus, in this case,destination shall not be considered a null terminated C string (reading it as such would overflow).
destination and source shall not overlap (see memmove for a safer alternative when overlapping).

By using this, and specifying the max size of the destination buffer, we can prevent the overrun and life will be perfect and happy:

`strncpy( buffer, "I forgot how much room I have", 10 );`

…or will it?

```+---+---+---+---+---+---+---+---+---+---+
| I |   | f | o | r | g | o | t |   | h | <- buffer
+---+---+---+---+---+---+---+---+---+---+
|   |   |   |   |   |   |   |   |   |   | <- launchCode
+---+---+---+---+---+---+---+---+---+---+```

That looks fine, doesn’t it?

In the next part, I’ll demonstrate why the above example is bad, and show you an easy way to make it more bullet-proof. Maybe you see it already… (Hint: The description of strncpy specifically mentions the problem.)

I’ll also discuss appending strings with strcat() and the similar problems that happen there, plus a few unexpected things I did not realize about these functions until I was digging in to them this week for a work project.

To be continued…