Category Archives: C Programming

C strcat, strcpy and armageddon, part 5

See also: part 1part 2part 3 and part 4.

In our quest to find the safest (or at least a safer) way to copy or append strings, we have been discussing the C library functions strncpy() and strncat(). In the previous installment I mentioned there is a potential problem. That problem is with strlen(), which we are using for strncat() to determine how much data is already in the destination buffer:

// Create full name by starting with first name...
strncpy( fullName, firstName, FULLNAME_SIZE-1 );
fullName[FULLNAME_SIZE-1] = '\0';

// ...then appending a space...
strncat( fullName, " ", FULLNAME_SIZE-strlen(fullname)-1 );

// ...then appending the last name.
strncat( fullName, lastName, FULLNAME_SIZE-strlen(fullName)-1 );

We are assuming that the length of the data in the buffer is never greater than what the buffer can hold. But what if, somehow, the existing string was too large? If strlen(fullName) was larger than FULLNAME_SIZE, the math would break. Let’s say FULLNAME_SIZE is 20, but the string copied there was 25 (a buffer overrun):

FULLNAME_SIZE-strlen(fullName)-1

…is 20-25-1 which is -6 which means:

strncat( fullName, " ", -6 ); // error!

I would think that strncat() should fail because it is being passed a negative number for the count, but that third parameter is of type size_t, which, according to the cplusplus.com page, is:

Unsigned integral type
Alias of one of the fundamental unsigned integer types.It is a type able to represent the size of any object in bytes: size_t is the type returned by the sizeof operator and is widely used in the standard library to represent sizes and counts.

Unsigned numbers can only be positive (0 and up), while signed numbers can be negative or positive. If you pass in a -1 as an unsigned value, it will actually look like a huge positive number. (For signed numbers, one of the bits is used to indicate if it is positive or negative. If you use an 8-bit value, that can represent the unsigned values of 0-255, or the signed values of -128 to 127).

Thus, because of bad math, strncat() will operate as it was instructed, thinking the limit is some huge number.

Gotcha.

What we really need is a version or strlen() that stops counting when it his a maximuim size (like a strnlen() call). If we did, we could also give it the limit of our buffer:

FULLNAME_SIZE-strnlen(fullName, FULLNAME_SIZE)-1

That way, strnlen() would never return anything greater than FULLNAME_SIZE, so if it didn’t find a null (0) terminator before that character, it would just return the max. Thus, the above example would be 20-20-1 which is… -1.

Crap.

We would have to treat strnlen() the same way we do the other calls, and subtract one from it (a 40 character max buffer would return 39, always leaving room for an extra null terminator):

FULLNAME_SIZE-strnlen(fullName, FULLNAME_SIZE-1)-1

That would give us 20-19-1 which is 0, and passing in a 0 to strncat() should fail, not appending anything.

Problem solved.

Unfortunately, the ANSI-C standard has no such function (though some do offer it as a non-standard function). This means to be safe in these situations, you would have to create our own strnlen() function. A simple wrapper to the existing strnlen() should work:

// Return the length of the string, or num,
// whichever is smaller.
size_t strnlen( const char * str, size_t num )
{
  size_t len;

  len = strlen(str);

  // If string len is longer than we want...
  if (len > num)
  {
    // ...limit the len to be the max num.
    len = num;
  }

  // Return actual len, or max len.
  return len;
}

Problem actually solved.

This may seem to be a very unlikely error, but maybe you have a function written to deal with a buffer up up to 40 characters, but the caller was creating one that could hold up to 60. They might have a string there longer than you expect, so when you go to append (thinking 40 is the max) there might already be a longer string there.

This type of problem can be quite common when using code from other libraries or projects.

Or, perhaps someone properly used the buffer size, but used strncpy() and it filled it up without a null terminator to stop strlen(). Ah! This seems like a much more common issue.

Pity my proposed strnlen() is still potentially inefficient (and could even cause a crash). Do you see why? I guess we will have to fix that, too.

In the next part, I will summarize all of this on one simple “better practices” page for dealing with string copies or appends.

See you then…

C strcat, strcpy and armageddon, part 4

See also: part 1part 2 and part 3.

The story so far . . . String copies can be made much safer by using strncpy() instead of strcpy(). strncpy() will take up slightly more code space than strcpy() and may be slower.

Now let’s move on to appending strings with strcat() (“string concatenate”). From the very-useful cplusplus.com website:

  • strcat( char *destination, const char *source ) – Concatenate strings. Appends a copy of the source string to the destination string. The terminating null character in destination is overwritten by the first character of source, and a null-character is included at the end of the new string formed by the concatenation of both in destination.

As an example of normal use, perhaps you want to take two string buffers that contain a first and last name, and put them together and make a full name string buffer:

char firstName[40];
char lastName[40];
char fullName[81]; // firstName + space + lastName

// Load first name.
strcpy( firstName, "Zaphod" );

// Load last name.
strcpy( lastName, "Beeblebrox" );

// Create full name by starting with first name...
strcpy( fullName, firstName );

// ...then appending a space...
strcat( fullName, " " );

// ...then appending the last name.
strcat( fullName, lastName );

The fullName buffer looks like this:

// strcpy( fullName, firstName )
+---+---+---+---+---+---+---+---+---+---+---+---+-...
| Z | a | p | h | o | d | 0 |   |   |   |   |   | ...
+---+---+---+---+---+---+---+---+---+---+---+---+-...

// strcat( fullName, " " );
+---+---+---+---+---+---+---+---+---+---+---+---+-...
| Z | a | p | h | o | d |   | 0 |   |   |   |   | ...
+---+---+---+---+---+---+---+---+---+---+---+---+-...

// strcat( fullName, lastName )
+---+---+---+---+---+---+---+---+---+---+---+---+-...
| Z | a | p | h | o | d |   | B | e | e | b | l | ...
+---+---+---+---+---+---+---+---+---+---+---+---+-...

Hopefully you get the idea.

In this simple example, we are controlling the length of the name strings being copied in. We know that fullName can hold 81 bytes, so we know firstName and lastName plus the space in between must be less than 81 bytes long to avoid overflowing the fullName buffer.

In a perfect world, that is fine.  But in a perfect world, we would never need any error checking. Let’s just pretend the world isn’t perfect and do this the safe(r) way.

Just like strcpy() has strncpy(), strcat() also has a safer version. It is called strncat():

  • char * strncat ( char * destination, const char * source, size_t num ) – Append characters from string. Appends the first num characters of source to destination, plus a terminating null-character.
    If the length of the C string in source is less than num, only the content up to the terminating null-character is copied.

For strncat(), num specifies how many characters of the source to append to the destination buffer. If you know the destination buffer (fullName, in this example) is 81 characters (because it is, in this example), you might think you could just do this:

strncat( fullName, firstName, 81 );

While you can do that, that would be wrong. The num count only controls how many characters are appended — it does not have anything to do with the count of how many characters are already in the destination buffer. For example, say fullName already has a 6 character firstName copied to it:

+---+---+---+---+---+---+---+---+--...--+---+---+---+
| Z | a | p | h | o | d | 0 |   |  ...  |   |   |   | <- fullName
+---+---+---+---+---+---+---+---+--...--+---+---+---+

If fullname is able to hold 81 characters, and already contains “Zaphod” (6 characters, not counting the null), the maximum size of a string we could append would be 75 (81-6) characters. Remember that the null (0) character just marks the end of a C string, and gets overwritten by the next strcat() data.

This means what we really want is:

strncat( fullName, firstName, 75 );

Or do we? Actually, there’s another difference with strncat() versus strncpy(), and it was clearly stated in the function description:

Appends the first num characters of source to destination, plus a terminating null-character.

strncat() always appends the null (0) character, meaning if you gave it a long source string and told it to append up to 10 characters, it would append 10 characters plus the null (0) character. Thus, that 75 could actually append 76 characters! We have to always subtract one to avoid a buffer overrun.

// Copy up to 74 characters + null to the 81 character
// fullName buffer which already contains 6 characters.
strncat( fullName, firstName, 74 );

But hard-coding the numbers like that isn’t possible if we don’t know how many characters are already in the destination buffer. Instead, we can use some other C string calls to determine that and then do some math.

strlen() will return a count of how many characters are in a string buffer. It counts up to the first null (0) it finds. Thus, strlen(fullName) would return 6 (“Zaphod”). If we know the full size of the buffer, we can use strlen() to determine what is already there, and then simply subtract to know how many free bytes the buffer has. And since strncat() always adds a null, we subtract 1:

strncat( fullName, firstName, 81-strlen(fullName)-1 );

This is getting messy, but that is the basic way to ensure that strcat() doesn’t append too much data. Let’s update the original example a bit more:

#define FIRSTNAME_SIZE 40
#define LASTNAME_SIZE  40
// firstName + space + lastName
#define FULLANME_SIZE  FIRSTNAME_SIZE + 1 + LASTNAME_SIZE

char firstName[FIRSTNAME_SIZE];
char lastName[LASTNAME_SIZE];
char fullName[FULLNAME_SIZE]; 

// Load first name.
strncpy( firstName, "Zaphod", FIRSTNAME_SIZE-1 );
firstName[FIRSTNAME_SIZE-1] = '\0';

// Load last name.
strncpy( lastName, "Beeblebrox", LASTNAME_SIZE-1 );
lastName[LASTNAME_SIZE-1] = '\0';

// Create full name by starting with first name...
strncpy( fullName, firstName, FULLNAME_SIZE-1 );
fullName[FULLNAME_SIZE-1] = '\0';

// ...then appending a space...
strncat( fullName, " ", FULLNAME_SIZE-strlen(fullname)-1 );

// ...then appending the last name.
strncat( fullName, lastName, FULLNAME_SIZE-strlen(fullName)-1 );

So much extra code, but necessary to avoid a potential buffer overrun if the string being appended was passed in from another function where you don’t know the length.

Now we have a “safe” string append that can’t possibly write past the end of the destination buffer (fullName). If the string is too long, it just stops and puts a null there, truncating the string.

If every string copy is done using strncpy() to assure the destination buffer is never overran, and if every string append is done using strncat() with checks to limit how many characters can be appended, you practically eliminate the chance that a buffer overrun could occur and corrupt or crash your program.

However… If I were writing code to run a nuclear reactor, I might still take some extra steps to make sure the data I am using is valid.

Next time, we will look at a problem with this code. (Hint: What if something is wrong with the initial buffer that you are trying to append to?)

Until then…

C strcat, strcpy and armageddon, part 3

See also: part 1 and part 2.

Previously, I discussed a way to make string copies safer by using strncpy(). I mentioned there would be a bit of extra overhead and I’d like to discuss that. This made me wonder: how much overhead? I decided to try and find out.

First, I created a very simple test program that copied a string using strcpy(), or strncpy() (with the extra null added).

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define BUFSIZE 20

//#define SMALL

int main()
{
 char buffer[BUFSIZE];

#ifdef SMALL
 // Smaller and faster, but less safe.
 strcpy(buffer, "Hello");
#else
 // Larger and slower, but more safe.
 strncpy(buffer, "Hello", BUFSIZE-1);
 buffer[BUFSIZE-1] = '\0';
#endif

 // If you don't use 'buffer', it may be optimized out.
 puts(buffer);

 return EXIT_SUCCESS;
}

Since I am lazy, I didn’t want to make two separate test programs. Instead, I used a #define to conditionally compile which version of the string copy code I would use.

When I built this using GCC for Windows (using the excellent Code::Blocks editor/IDE), I found that each version produced a .exe that was 17920 bytes. I expect the code size difference might start showing up after using a bunch of these calls, so this test program was not good on a Windows compiler.

Instead, I turned to the Arduino IDE (currently version 1.6.7). It still uses GCC, but since it targets a smaller 16-bit AVR processor, it creates much smaller code and lets me see size differences easier. I modified the code to run inside the setup() function of an Arduino sketch:

#define BUFSIZE 10
#define SMALL

void setup() {
 // volatile to prevent optimizer from removing it.
 volatile char buffer[BUFSIZE];
 
#ifdef SMALL
 // Smaller and faster, but less safe.
 strcpy((char*)buffer, "Hello");
#else
 // Larger and slower, but more safe.
 strncpy((char*)buffer, "Hello", BUFSIZE-1);
 buffer[BUFSIZE-1] = '\0';
#endif
}

void loop() {
 // put your main code here, to run repeatedly:
}

Then I selected Sketch Verify/Compile (Ctrl-R, or the checkmark button). Here are the results:

  • SMALL (strcpy) – 540 bytes
  • LARGE (strncpy) – 562 bytes

It seems moving from strcpy() to strncpy() would add only 22 extra bytes to my sketch. (Without the “buffer[BUFSIZE-1] = ‘\0’;” line, it was 560 bytes.)

Now, this does not mean that every use of strncpy() is going to add 20 bytes to your program. When the compiler links in that library code, only one copy of the strncpy() function will exist, so this is more of a “one time” penalty. To better demonstrate this, I created a program that would always link in both strcpy() and strncpy() so I could then test the overhead of the actual call:

#define BUFSIZE 10
#define SMALL

void setup() {
 // volatile to prevent optimizer from removing it.
 volatile char buffer[BUFSIZE];

 // For inclusion of both strcpy() and strncpy()
 strcpy((char*)buffer, "Test");
 strncpy((char*)buffer, "Test", BUFSIZE);

#ifdef SMALL
 // Smaller and faster, but less safe.
 strcpy((char*)buffer, "Hello");
#else
 // Larger and slower, but more safe.
 strncpy((char*)buffer, "Hello", BUFSIZE-1);
 //buffer[BUFSIZE-1] = '\0';
#endif
}

void loop() {
 // put your main code here, to run repeatedly:
}

Now, with both calls used (and trying to make sure the optimizer didn’t remove them), the sketch compiles to 604 bytes for SMALL, or 610 bytes for the larger strncpy() version. (Again, without the “buffer[BUFSIZE-1] = ‘\0’;” line it would be 608 bytes.)

Conclusions:

  1. The strncpy() library function is larger than strcpy(). On this Arduino, it appeared to add 20 bytes to the program size. This is a one-time cost just to include that library function.
  2. Making a call to strncpy() is larger than a call to strcpy() because it has to deal with an extra parameter. On this Arduino, each use would be 4 bytes larger.
  3. Adding the null obviously adds extra code. On this Arduino, that seems to be 2 bytes. (The optimizer is probably doing something. Surely it takes more than two bytes to store a 0 in a buffer at an offset.)

Since the overhead of each use is only a few bytes, there’s not much of an impact to switch to doing string copies this safer way. (Assuming you can spare the extra 20 bytes to include the library function.)

Now we have a general idea about code space overhead, but what about CPU overhead? strncpy() should be slower since it is doing more work during the copy (checking for the max number of characters to copy, and possibly padding with null bytes).

To test this, I once again used the Arduino and it’s timing function, millis(). I created a sample program that would do 100,000 string copies and then print how long it took.

#define BUFSIZE 10
//#define SMALL

void setup() {
 // volatile to prevent optimizer from removing it.
 volatile char buffer[BUFSIZE];
 unsigned long startTime, endTime;

 Serial.begin(115200); // So we can print stuff.

 // For inclusion of both strcpy() and strncpy()
 strcpy((char*)buffer, "Test");
 strncpy((char*)buffer, "Test", BUFSIZE);

 // Let's do this a bunch of times to test.
 startTime = millis();

 Serial.print("Start time: ");
 Serial.println(startTime);

 for (unsigned long i = 0; i < 100000; i++)
 {
#ifdef SMALL
 // Smaller and faster, but less safe.
 strcpy((char*)buffer, "Hello");
#else
 // Larger and slower, but more safe.
 strncpy((char*)buffer, "Hello", BUFSIZE - 1);
 buffer[BUFSIZE - 1] = '\0';
#endif
 }
 endTime = millis();

 Serial.print("End time  : ");
 Serial.println(endTime);

 Serial.print("Time taken: ");
 Serial.println(endTime - startTime);
}

void loop() {
 // put your main code here, to run repeatedly:
}

When I ran this using SMALL strcpy(), it reports taking 396 milliseconds. When I run it using strncpy() with the null added, it reports 678 milliseconds. strcpy() appears to take about 60% of the time strncpy() does, at least for this test. (Maybe. Math is hard.)

Now, this is a short string that requires strncpy() to pad out the rest of the buffer. If I change it to use a 9 character string (leaving one byte for the null terminator):

#ifdef SMALL
    // Smaller and faster, but less safe.
    strcpy(buffer, "123456789");
#else
    // Larger and slower, but more safe.
    strncpy(buffer, "123456789", BUFSIZE - 1);
    buffer[BUFSIZE-1] = '\0';
#endif

…no padding will be done. Without padding, the SMALL version takes 572 and the strncpy()/null version takes… 478!?!

Huh? How can this be? How did the “small” version suddenly get SLOWER? Well, before, strcpy() only had to copy the five characters of “Hello” plus a null then it was done, while strncpy() had to copy “Hello” then pad out five nulls to fill the buffer. Once both had to do the same amount of work (copying nine bytes and a null), it appears that strncpy() is actually faster! (Your mileage may vary. Different compilers targeting different processors may generate code in vastly different ways.)

Perhaps there is just some optimization going on when the destination buffer size is know. (Note to self: Look in to the GCC strncpy source code and see what it does versus strcpy.)

Conclusion:

  • strncpy() isn’t necessarily going to be slower (at least on this Arduino)!
  • strncpy() might be significantly slower if you copy a very short string (“Hi”) in to a very long buffer (char buffer[80];).

Buyer Programmer beware!

I am sure glad we (didn’t) clear that up. In the next part, I’ll get back to talking about appending strings using strcat() and how to make that safer.

To be continued…

C strcat, strcpy and armageddon, part 2

In part 1, I gave a quick overview of the C library call, strcpy(), and showed how it is used to copy a string in to a buffer.  I specifically wanted to show a problem that can happen if you copy more data than the buffer can hold (a buffer overrun), and show a simple way to fix it using strncpy().

The final example would limit how much data was copied to the number of bytes the buffer could hold:

char buffer[10];
strncpy( buffer, "I forgot how much room I have", 10);

Not only does strncpy() limit how many characters will be copied, it also does something else. If the source string is shorter than the maximum number you specify to copy, it pads the rest of the destination buffer with null characters (zeros). If you give it a null terminated string shorter than the max…

strncpy( buffer, "HI", 10);

…that would result in:

+---+---+---+---+---+---+---+---+---+---+
| H | I | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+---+---+---+---+---+---+---+---+---+---+

Even though it’s clearly stated in the C library manual, I guess I never read that part. I had no idea it did that.

If I were to write my own crude implementation of strncpy(), it might look something like this:

// My own crude implementation of strncpy()
char * myStrncpy( char *destination, const char *source, size_t num )
{
  size_t i;

  // Now we will begin copying characters until we copy a null (0) or
  // we have copied num characters.
  for (i=0; i<num ; i++)
  {
    // If we have found a null character...
    if (source[i]=='\0') break; // Exit the for loop.

    // Copy a character over.
    destination[i] = source[i];
  }
  // Here we have copied 'i' characters.

  // If less than num, fill the rest with nulls.
  while(i < num)
  {
    destination[i] = '\0';
    i++;
  }

  return destination; // Why? Because it just does.
}

While strncpy() will prevent us from a buffer overrun, passing in a string that is too large will create a different problem: It will not leave us with a usable C string. Per the cplusplus.com page on strncpy:

No null-character is implicitly appended at the end of destination if source is longer than num. Thus, in this case, destination shall not be considered a null terminated C string (reading it as such would overflow).

strncpy() will copy characters until it has copied a null (0) or has copied a max number of bytes you specify with the third parameter. If you copy that full amount, and no null was copied, the destination buffer will not have a null terminator. It won’t be a C string, and using it as one will do strange things.

char buffer[10];
strncpy( buffer, "I forgot how much room I have", 10);
+---+---+---+---+---+---+---+---+---+---+
| I |   | f | o | r | g | o | t |   | h |
+---+---+---+---+---+---+---+---+---+---+

In the above example, you end up with buffer full of 10 bytes of data without a null character at the end. You won’t be able to use it with any of the C string functions like printf(), strlen(), strcat(), etc. Well, you can, but you might not get what you expect.

If you try to use it, these functions will continue through memory until they find the next null character after the buffer. If the next byte in memory after the buffer just happens to be a zero, everything will work and you won’t see the problem. If the next 1K of memory is garbage that has NO zeros in it, the functions would begin at buffer and keep going as it if found a really long 1K string that starts out with what you expected, followed by garbage.

On operating systems that have memory protection, the program might even get terminated if it seeks through memory and ends up trying to read (still looking for that null byte) RAM that doesn’t belong that that process.

Don’t do that. Instead, when using strncpy(), always make sure you manually null terminate the buffer in case this situation occurs:

// Copy up to 10 characters.
strncpy( buffer, "I forgot how much room I have", 10);
// null terminate destination at 10th byte (bytes 0-9)
buffer[9] = '\0';

The end result of that will be up to 9 characters followed by a null. (See #3 below for an optimization tip.)

+---+---+---+---+---+---+---+---+---+---+
| I |   | f | o | r | g | o | t |   | 0 |
+---+---+---+---+---+---+---+---+---+---+

Because of needing a null at the end, if you truly did want a 10 character string, you would have to make the buffer 11 bytes large, so it can hold the 10 characters you want plus the null character at the end.

Therefore, my suggestion for doing string copies is to do the following:

  1. For clarity, use a #define for how large the buffer is, so you have that define available to use in strncpy(). This will mean only one place in code to change the buffer size if you need to (rather than having to edit every function instance of strncpy() that used it).
  2. Always manually null terminate the last byte of the destination buffer after the copy, just in case a string as large or larger than the buffer is copied to it. (Yes, this will add to your code and take more CPU time, but we are going for safety here.)
  3. Optimization tip: Since you are going to manually terminate the final byte anyway, you can copy one less byte in strncpy(). (There is no need to have strncpy() write out to the final byte only to have you write to that final byte again.)
#define BUFSIZE 10

char buffer[BUFSIZE];

strncpy(buffer, "Here is my stuff", BUFSIZE-1);
buffer[BUFSIZE-1] = '\0';

Above, we will allow up to 9 bytes to be copied in to the buffer (BUFSIZE-1), then we will null terminate the 10th byte (remember bytes are numbered starting at zero, so 0-9, therefore BUFSIZE-1 is 9, the 10th byte of the buffer).

If you truly did want to be able to hold a 10 character string, just add one to the BUFSIZE define. None of the other code will have to be touched.

Get in to that simple habit and your strncpy() will now not create a buffer overflow, and will never create a non-terminated C string.

And before you say it . . .

Now, obviously, if YOU are copying a literal “string” inside your own code, you can know exactly how large it is so using strcpy() is perfectly safe (and more efficient if you don’t need the rest of the destination buffer padded with zeros). BUT, if you code using strncpy() all the time, you will reduce the chances of you changing something later on that might cause you grief:

strcpy( buffer, "Game Over" );
...later changed to...
strcpy( buffer, "Game over, dude! You suck!" );

Above, a simple change to a message might end up crashing your program because you forgot how large buffer was.

In the next installment, we will take a look at strcat() and appending strings.

C strcat, strcpy and armageddon, part 1

I am pretty sure writing crash-proof code used to be a thing. Back when computers were simpler, and an entire project was done by one developer, it was much rarer to find a glaring bug in a piece of released software.

Today I’d like to share some things you may never have known, or thought about, when it comes to standard C library string manipulation calls.

Upon the conclusion of this series, you will know some very simple things to do to make your program a bit more crash proof just by using “safer” versions of C string calls. Let’s begin with the “less safe” calls.

From the very-useful cplusplus.com website:

  • strcat( char *destination, const char *source ) – Concatenate strings. Appends a copy of the source string to the destination string. The terminating null character in destination is overwritten by the first character of source, and a null-character is included at the end of the new string formed by the concatenation of both in destination.
  • strcpy( char *destination, const char *source ) – Copy string. Copies the C string pointed by source into the array pointed by destination, including the terminating null character (and stopping at that point).
    To avoid overflows, the size of the array pointed by destination shall be long enough to contain the same C string as source (including the terminating null character), and should not overlap in memory with source.

These simple functions allow us to copy, append and compare null terminated strings. A null is a zero that indicates the end of a string. Consider the string “HELLO WORLD”:

char *message = "HELLO WORLD";

In memory would be the characters “HELLO WORLD” followed by a 0 to indicate the end of the string:

+---+---+---+---+---+---+---+---+---+---+---+---+---+
| H | E | L | L | O |   | W | O | R | L | D | ! | 0 |
+---+---+---+---+---+---+---+---+---+---+---+---+---+

When we control everything, and know exactly what we are doing, strings are safe. But in the real world, there are some very serious problems that can happen with strings that can cause crashes from memory corruption. At best, this may just be annoying if your Arduino sketch won’t run, but at worst, it could mean Armageddon if the corruption is on a system that controls nuclear missiles.

Let’s discuss some very simple habits to get in to when using strings in C that will reduce the chances of an atomic meltdown. Or your blinking LEDs from stopping blinking…

First, here is an example of the problem. Suppose you have a 10-byte string buffer:

char buffer[10];

Somewhere in memory will be 10 bytes reserved for that buffer:

+---+---+---+---+---+---+---+---+---+---+
|   |   |   |   |   |   |   |   |   |   | <- buffer
+---+---+---+---+---+---+---+---+---+---+

You can load a string in to that buffer using strcpy() like this:

strcpy( buffer, "HELLO" );

Then buffer will contain your string, including the null character at the end:

+---+---+---+---+---+---+---+---+---+---+
| H | E | L | L | O | 0 |   |   |   |   | <- buffer
+---+---+---+---+---+---+---+---+---+---+

Anything else in the buffer may be random junk from what might have been in that memory before the buffer was allocated. If this was a concern, we could have initialized that buffer memory ahead of time by doing something like this:

char buffer[10] = { 0 };

…or in your code, you could use memset() to clear the contents of the buffer:

memset( buffer, 0, 10 ); // Initialize buffer with zeros

But I digress. (Not to self: I should share some pros and cons of initializing data like that sometime.)

When you control the world, you can do things like this safely. But what if you forget how large buffer is, and try to load something larger in to it?

strcpy( buffer, "I forgot how much room I have" );

strcpy() will gladly start copying everything you give it up until it finds a null at the end of the string. In C, when you created a “quoted string” it automatically has a null added to the end of it.

Thus, it begins copying “I forgot how…” in to the 10 bytes of buffer, and keeps on going well past the end of buffer, overwriting whatever happens to be in memory past that area. This is a very common mistake.
Suppose you had created two buffers, and the compiler created them in memory next to each other:

char buffer[10];
char launchCode[10];

As the above strcpy() began writing to buffer, it would fill up buffer then continue to write over part of launchCode, corrupting whatever was there. This is bad, and you might not even notice if if you hadn’t checked launchCode to see if it was still intact.

+---+---+---+---+---+---+---+---+---+---+
| I |   | f | o | r | g | o | t |   | h | <- buffer
+---+---+---+---+---+---+---+---+---+---+
| o | w |   | m | u | c | h |   | r | o | <- launchCode
+---+---+---+---+---+---+---+---+---+---+
| o | m |   | I |   | h | a | v | e | 0 | ...other memory...
+---+---+---+---+---+---+---+---+---+---+

Worse, you might have had other types of variables there for counters, high scores or mom’s birthday, and overwriting the buffer might end up corrupting some numeric variable and cause your program to behave in unexpected ways. This can be madness to figure out :)

The solution is to use a special version of string copy that lets you specify the maximum number of characters it will copy before it stops.

  • strncpy( char *destination, const char *source, size_t num ) – Copy characters from string. Copies the first num characters of source to destination. If the end of the source C string (which is signaled by a null-character) is found before num characters have been copied, destination is padded with zeros until a total of numcharacters have been written to it.
    No null-character is implicitly appended at the end of destination if source is longer than num. Thus, in this case,destination shall not be considered a null terminated C string (reading it as such would overflow).
    destination and source shall not overlap (see memmove for a safer alternative when overlapping).

By using this, and specifying the max size of the destination buffer, we can prevent the overrun and life will be perfect and happy:

strncpy( buffer, "I forgot how much room I have", 10 );

…or will it?

+---+---+---+---+---+---+---+---+---+---+
| I |   | f | o | r | g | o | t |   | h | <- buffer
+---+---+---+---+---+---+---+---+---+---+
|   |   |   |   |   |   |   |   |   |   | <- launchCode
+---+---+---+---+---+---+---+---+---+---+

That looks fine, doesn’t it?

In the next part, I’ll demonstrate why the above example is bad, and show you an easy way to make it more bullet-proof. Maybe you see it already… (Hint: The description of strncpy specifically mentions the problem.)

I’ll also discuss appending strings with strcat() and the similar problems that happen there, plus a few unexpected things I did not realize about these functions until I was digging in to them this week for a work project.

To be continued…

Splitting a 16-bit value to two 8-bit values in C

Recently in my day job, I came across some C code that just felt inefficient. It was code that appeared to take a 16-bit integer and split the high and low bytes in to two 8-bit integers. In all my years of C coding, I had never seen it done this way, so obviously it must be wrong.

NOTE: In this example, I am using modern C99 definitions for 8-bit and 16-bit unsigned values. “int” may be different on different systems (it only has to be “at least” 16-bits per the C standard. On the Arduino it is 16-bits, and on my PC it is 32-bits).

uint8_t  bytes[2];
uint16_t value;

value = 0x1234;

bytes[0] = *((uint8_t*)&(value)+1); //high byte (0x12)
bytes[1] = *((uint8_t*)&(value)+0); //low byte  (0x34)

This code just felt bad to me because I had previously seen how much larger a program becomes when you are accessing structure elements like “foo.a” repeatedly in code. Each access was a bit larger, so it you used it more than a few times in a block of code you were better off to put it in a temporary variable like “temp = foo.a” and use “temp” over and over. Surely all this “address of” and math (+1) would be generating something like that, right?

Traditionally, the way I always see this done is using bit shifting and logical AND:

uint8_t  bytes[2];
uint16_t value;

value = 0x1234;

bytes[0] = value >> 8;     // high byte (0x12)
bytes[1] = value & 0x00FF; // low byte (0x34)

Above, bytes[0] starts out with the 16-bit value and shifts it right 8 bits. That turns 0x1234 in to 0x0012 (the 0x34 falls off the end) so you can set bytes[0] to 0x12.

bytes[1] uses logical AND to get only the right 8-bits, turning 0x1234 in to 0x0034.

I did a quick test on an Arduino, and was surprised to see that the first example compiled in to 512 bytes, and the second (using bit shift) was 516. I had expected a simple AND and bitshift to be smaller, but apparently, on this processor/compiler, getting a byte from an address was smaller. (I did not tests to see which one used more clock cycles, and did no experiments with compiler optimizations.)

On a Windows PC under GNU-C, the first compiled to 784 bytes, and the second to 800. Interesting.

I ran across this code in a project targeting the Texas Instruments MSP430 processor. The MSP430 Launchpad is very Arduino-like, and previous developers had to do many tricks to get the most out of the limited RAM, flash and CPU cycles of these small devices.

I do not know if I can get in the habit of doing my integer splits this way, but perhaps I should retrain myself since this does appear incrementally better.

Update: Timing tests (using millis() on Arduino, and clock() on PC) show that it is also faster.

Here is my full Arduino test program. Note the use of “volatile” variable types. This prevents the compiler from optimizing them out (since they are never used unless you uncomment the prints to display them).

#define OURWAY

void setup() {
  volatile char bytes[2];
  volatile uint16_t  value;

  //Serial.begin(9600);
  
  value = 0x1234;

#ifdef OURWAY  
  // 512 bytes:
  bytes[0] =  *((uint8_t*)&(value)+1); //high byte (0x12)
  bytes[1] =  *((uint8_t*)&(value)+0); //low byte  (0x34)
#else  
  // 516 bytes:
  bytes[0] = value >> 8;     // high byte (0x12)
  bytes[1] = value & 0x00FF; // low byte  (0x34)
#endif

  //Serial.println(bytes[0], HEX); // 0x12
  //Serial.println(bytes[1], HEX); // 0x34
}

void loop() {

  // put your main code here, to run repeatedly: 
  
}

Using Termcap – part 3

See also: Part 1 and Part 2.

The termcap file is a text file found in /etc/termcap on a Unix system. As it was ported to other operating systems, the default location would change accordingly. The contents of the termcap file is basically a series of entries for each terminal supported. Each entry contains various elements separated by colons. To make the file more readable, a backslash can be used at the end of a line to break it up.

See the Wikipedia page for a summary, or the GNU Software page for a more descriptive summary… Or see this for a vague summary:

For example:

2characterterminalname|longerterminalname|verboseterminalname:\
  :capability=characetersqeuence:\
  :capability=charactersequence:

The 2 character name is legacy and is no longer used, but remains for ancient backwards compatibility. For a DEC VT-52 terminal, it might look like this:

dw|vt52|DEC vt52:\
  :cr=^M:do=^J:nl=^J\bl=^G:\
  ...etc...

Each capability has a two character abbreviation. Above, we see that to generate a carriage return (cr) we send a control-M (enter key). A new line (nl) is ^J. The bell (bl) is a ^G (beep!). There are many other simple codes.

For moving the cursor position, a DEC VT52 terminal used the sequence: ESCAPE followed by [ (left bracket) followed by line followed by semicolon followed by column followed by H.

ESCAPE [ 10 ; 4 H

That would mean move the cursor to line 10, column 4. To represent sequences like this with variables inside of them (line, column, etc.), there are more complex termcap entries:

:cm=\E%i%d;%dH:

Above, \E represents ESCAPE (just like ^ represents CONTROL). %i is a special flag that means “increment the two values supplied” (base 1 numbering) then the two %ds are the variables similar to a C printf() call.

The %i is because termcap assumes base 0, so an 80 column screen would be 0-79. The VT terminal (and PC ANSI, I think) assume base 1, 1-80, so to make it universal, all termcap applications expect a screen that is base 0 (0-79) and the entry knows whether or not to output 0-79 or 1-80. Fun.

Termcap has pages of codes for all kinds of features, like cursor up, delete line, clear screen, clear to end of line, etc. If a terminal does not support a feature, the entry is not present. Applications that use termcap will query these capabilities then use what they can. In my situation, I needed “cm” for cursor movement — and if that feature was not there, I couldn’t work (or, better, I could default to a mode of just lines of text).

There are more advanced features where a termcap entry can reference another entry. For instance, there were series of terminals made and as new models came out, they added new features but maintained the earlier ones as well. The first version terminal would have an entry, then the “v2” terminal would have an entry that described the new features, but by adding a capability of “tc=terminal-v1” or whatever, it would get any other capabilities from the “terminal-v1” entry.

This cuts down on redundant information but also means you can’t just look at one termcap entry and necessarily know everything the terminal does. If you were writing your own code to parse a termcap file, you would have to take this in to consideration.

In a C program that will be linking to the termcap library, to load the terminal type you want, you need a buffer for it to be loaded (2K is the defined max size):

char term_buffer[2048];

…and then you just use the termcap tgetent() function:

status = tgetent(term_buffer, "ANSI");

If the termcap file is found, and there is an entry called “ANSI”, it will be copied in to the term_buffer. By checking for errors (always a good idea), you will know if the entry was not found.

But hard coding is bad. What if this code ever runs on a non-ANSI terminal? Termcap programs typically read the TERM environment variable, then get whatever that is set to. In windows you might “set TERM=ansi” and on Linux you might “export TERM=vt100”. Then the C program would query that environment variable first:

char *termtype;
termtype = getenv("TERM");
if (termtype==NULL) { /* handle error if env var not set */ }

termtype will come back pointing to whatever the TERM environment variable is set to (“ANSI” in the windows example above, or “vt100” in the Linux example above). Then the tgetent() is done using that response:

status = tgetend(term_buffer, termtype);

If both of those are successful, the individual capabilities can be loaded using the tgetstr() function. tgetstr() will parse capabilities in the loaded termcap entry and write them to a buffer that is processed to be the actual output (less any variables that get substituted when the actual sequence is used later). For instance, the termcap entry might say:

:bl=^G:

…but when you use tgetstr() to parse for the “bl” entry, it will write out the control-G (ASCII 7) character in the output buffer. Basically, it converts all the \E (escape) and ^X (control) ASCII characters to what they really represent. This saves work later when they are output to the screen.

A second buffer (that must remain around) needs to be allocated to store the resulting output. Most examples also do a 2K buffer:

char capbuff[2048]; // output sequences are stored here

Then, as each capability is obtained, a pointer is passed in to where the output should be written, and when the call returns, that pointer is advanced to the next place in the buffer where the next capability will go. As tgetstr() is called over and over, the pointer increments filling up the output buffer with entries, and returning the location where each one the user cares about is located within that buffer.

char *tempPtr = capbuff; // start out pointing to our output buffer

If you want to know the code that clears the screen, it would be:

char *cl; // clear screen sequence

cl = tgetstr("cl", &amp;tempPtr);

If cl comes back non-NULL, you know have a pointer to the byte sequence that will clear the screen. tempPtr returns with a higher value, so when you get the next capability you use it again:

ce = tgetstr("ce", &amp;tempPtr);

This is repeated over and over for every code you wish to send. You check for NULL to know which capabilities actually exist, so you could write functions like this:

void clearscreen()
{
  if (cl==NULL)
  {
    printf("Sorry, I cannot clear the screen...");
  } else {
  tputs(cl, 1, outchar);
  }
}

And now we see how these pointers get used. The tputs() function is a special output routine that handles padding (time delays for slower terminals) and other features (though it ends up writing the character out using a function you specify — such as outchar() in this example).

For the cursor movement (cm) capability, it uses a special tgoto() function that knows how to substitute the X and Y values:

void setcursor(int x, int y)
{
  tputs(tgoto(cm, x, y), 1, outchar);
}

tgoto() processes the cm output string and returns one that has everything set up with the x and y coordinate in it.

By now, you may see where I cam going with this… Read the termcap entry, parse the ones you care about, then create simple functions that output the screen code sequences:

void clearscreen();
void setcursor(int x, int y);
void underlineon();
void underlineoff();

…etc…

In the next installment, I will share with you my very basic and simple 1995 code that let me convert OS-9 L2 (and MM/1 K-Windows) text programs to run under Termcap on any supported type of terminal.

And then I will explain why I decided NOT to use termcap for my current project.

To be continued…

Using Termcap – part 2

Previously, I mentioned a bit about the ancient termcap system which is used to send display codes (clear, move cursor, underline, blink) to terminals of different kinds. In this modern GUI world, none of this is necessary … folks pretty much have to rewrite the whole program to work with native Mac, Windows, Linux, Java, etc. I suppose the modern equivalent to termcap would be a cross platform GUI (kind of like a graphical termcap?) which turns things like “create a pop up window” or “create a menu with the following options” in to whatever it takes to display them on the end operating system.

I suppose that’s truly what I should be learning right now — my program could make use of graphics on the Linux-based Raspberry Pi, Windows or Mac. However, since my program is not intended to be a windows-based program (no pull down menus, no mouse, etc.) and really had no use for graphics, I decided to write everything using text.

My original prototype was rather bland, spitting out 80 column descriptive text. This was perfect for debugging, but certainly not what we want the end-user to have to deal with:

Ticket system, 80 column window prototype.

Prototype running in a Mac Terminal window.

By writing the application as a strict ANSI-C program, and just using text, it would compile and run on a Windows PC as well, in a DOS-style COMMAND window (CMD.EXE):

Prototype running in a Windows CMD.EXE window.

Prototype running in a Windows CMD.EXE window.

The target system for this project would be Raspberry Pis, the $25-$35 micro-Linux computer designed for educational use. We would be using the $35 model, which has more RAM (512mb) and USB/ethernet. The Pis support HDMI and composite video output. Instead of using large (pricy) HDMI TV/monitors, I found tiny composite color monitors for under $20 (4.3″, about the size of a GPS unit or large smartphone screen).

By setting the resolution of the Pi to match the display’s 480×272 resolution (/boot/config.txt: framebuffer_width=480 and framebuffer_height=272), and by choosing the largest font available (setfont /usr/share/consolefonts/Uni3-Terminus32x16.psf.gz) I was able to get a large, easy to read (on the tiny screen) 30×8 display. It would look something like this:

+------------------------------+
|123456789012345678901234567890|
|2                             |
|3                             |
|4                             |
|5                             |
|6                             |
|7                             |
|8                             |
+------------------------------+

The actual screen ends up much wider than this text drawing, since the displays I am using are 16×9 widescreen.

Have you noticed the lack of mentioning Termcap so far? Let’s correct that now.

If I was going to be using this particular screen and run on just a Raspberry Pi, I could just hard-code everything to expect 30×8 characters, and whatever display codes were needed to clear the screen or change colors.

And that would be bad programming. Doing this “because this is all it is planned to ever run on” is like not having insurance because you never plan to get in an accident. It’s certainly fine to write anything for yourself any way you darn well please (I do that all the time, too), but I try to think ahead and write things to be as portable and as flexible as I can.

Fortunately, I wrote such portable code back around 1995. At the time, I had created a text-based user interface called EthaWin. It was written for OS-9 Level 2 on the Tandy Color Computer 3, and later ported to the OS-9/68000 MM/1 computer. The CoCo 3’s terminal window system supported all kinds of screen codes for basic things like color, blinking, move cursor, etc., much like ANSI graphics on a PC did. The MM/1 was meant to be a next generation replacement for the CoCo OS-9 users, so it’s K-Windows system replicated (and expanded upon) those same screen codes.

My EthaWin was not the “portable code” I am speaking of. It would only run on a CoCo or MM/1 under OS-9. When I began working for Microware, the company that had created OS-9, none of my stuff would run on the “headless” OS-9 computers in the building — most didn’t even have video displays. The way you accessed them was via an RS232 serial port and a hardware terminal (or software terminal program), or from telnetting in across the network.

OS-9, being created as a very Unix-like operating system, supported termcap, and text editors like vim, uMacs, etc. made use of this to give full screen editing. If I wanted EthaWin to work on these OS-9 machines, I was going to have to learn how termcap worked.

In the next installment, I will finally talk about termcap and show how very simple it is to do very simple things.

Until then…

Using Termcap – part 1

Before stand-alone computers became common, most computer time was spent in front of a dumb terminal — basically a keyboard and screen that would send whatever the user typed to a big computer somewhere, and display whatever it received back from the big computer:

http://en.wikipedia.org/wiki/Computer_terminal

As I mentioned in an earlier posting, my first interaction with a computer in the 1970s was via a printing terminal at my elementary school. The next time I used a computer, it was a TRS-80 at Radio Shack. I kind of missed the whole dumb terminal phase of computing, but I certainly spent endless time with my home computer acting as a dumb terminal as it dialed in across the phone lines to other computers running BBSes (bulletin board software).

In a way, this concept lives on via the Internet and cloud computing. Our computers are just far smarter “dumb terminals” when they display all the content generated from Facebook’s servers, or display virtual shopping catalogs that are indexed at Amazon.

I guess there really isn’t anything new.

While today, a modern smart “dumb terminal” may be running JavaScript or (can you believe it?) Flash, decades ago dumb terminals were doing similar rendering – though limited to simple things like moving a cursor around the screen, or turning on underlined text. Or blinking. Anyone remember when things blinked?

Frighteningly enough, you probably still see examples of this at some modern businesses. My car dealer still uses some text-based program in its service department, and it is not that uncommon to see the same in banks or other businesses.

Dedicated terminals were still alive and well in the mid-1990s when I took my first dream job and was teaching week-long OS-9 courses around America (and sometimes in Canada). I would arrive on Sunday night, and at the hotel would be a bunch of boxes that had been delivered. We would rent dumb terminals from local suppliers, and I would unbox and set up eight of them and wire them all up via RS232 serial cables up to the multi-user OS-9 computer I brought with me.

I would then go through the task of configuring the settings on each terminal (they were pretty smart for dumb terminals) to make sure all the settings matched what we would need for the class (like baud rate and serial port settings, as well as emulation mode).

Just like today, there were competing standards back then. A VT100 terminal might expect a particular series of bytes to indicate “clear the screen”, but a DEC terminal might use a different set. What a mess this must have been in the early years!

Fortunately, smart people came up with smart solutions to deal with all these different standards. One such solution was called Termcap – which stood for terminal capabilities.

http://en.wikipedia.org/wiki/Termcap

Created in 1978, termcap was a database of various terminal types with entries describing how each one did things like move the cursor or clear the screen. Programs could be written to use the termcap library and then, on startup, they would load the proper codes to match whatever terminal type the user was using — provided it was in the database.

This must have been a major breakthru for writing portable apps, much like Java was a breakthru to let the same app run on Mac, Unix and Windows… Today, HTML5 and JavaScript allow web content to run on desktops, tablets or phones.

I guess there really isn’t anything new.

During my OS-9 days, termcap was important since almost all connections were done via a terminal (or terminal/telnet program). Most industrial OS-9 machines did not have video screens. I found this surprising, since I had come from the hobbyist OS-9 world where all our systems (Radio Shack Color Computer, Delmar, Tomcat, MM/1, etc.) all had graphics and user interfaces. But in the embedded/industrial space, they were just a box of realtime computing, and if the user did need to interact with it, they hooked up a terminal.

But I digress.

Recently, I began working on a new ticket barcode project that would ultimately run on small Raspberry Pi computers. I was doing all the prototyping work on Windows, and also compiling the same for Mac. Since I do not know anything about writing GUI programs, let alone how I would write something that would work on Windows, Mac and ultimately Linux, I was writing everything as an old style text-mode program.

This was more than enough to test all the functions, even if ultimately we would be using small 7″ color displays at each system. (I sense that I will be writing some articles on Raspberry Pi video in the future.)

Because I am now at the stage where I want to do more than just scroll text, I decided to revisit the old Termcap system and see if I could at least write fancy text programs that could use color, and create fancier text screens, and actually work on all my development platforms. I actually looked in to termcap few months ago when I wanted to do something similar on Arduinos, but at the time I decided it was impossible due to limited memory. (Update: That may not actually be the case.)

My goals of the next few articles will be:

  1. Explain how termcap works.
  2. Explain how to get termcap running on a Windows system (as well as Mac, and Raspberry Pi).
  3. Create a simple library of common screen features (actually based on code I wrote in late 1995, to convert my EthaWin OS-9 user interface to run on the OS-9 machines at work via termcap).

…and after all this, I think I will have a version that works on Arduino as well.

More to come…

ANSI C and subtracting or adding time (hours, etc.) part 2

In my previous article, I rambled a bit about how I first learned to program in C in the late 1980s under the OS-9 operating system on a Radio Shack Color Computer 3. I mentioned this to point out that, even after 25 years, there were still things about C I had never used.

One of those things involves doing time math (adding or subtracting time). If you take a look at the various C related time functions (see time.h):

http://www.cplusplus.com/reference/ctime/

…you will see that the C library has two ways of representing time. One is a time_t value which is some value (but not defined by the standard as to what that value is), and a struct tm structure, which contains fields for things like hour, minute, second, day, month, year, etc. Some time functions work with time_t values, and some work with struct tm structures.

There are four main time functions. You can get the amount of processor time used by a program with clock(). You can get the current time using time() (as a time_t value of some sort). You can get the difference between two times using difftime(). And you can make a time using mktime().

The other time.h functions are conversion utilities: asctime() returns a string representing the current date/time (from a struct tm). ctime() returns a similar string but works on a time_t value. gmtime() converts a time_t value to a struct tm and adjusts the resulting to to be GMT (universal time zone, Greenwich Mean Time). localtime() is like gmtime but it returns a struct tm in local timezone. And lastly, strftime() is like a printf for time. It lets you create a custom string representation of date and time from a struct tm. This is useful if the asctime() and ctime() do not return the format you need.

The current project I am working on deals with event tickets, and the system needs to know when a ticket is valid. By default, a ticket is valid on the day it is activated and then it shuts off. The problem was that a ticket would stop working after midnight, so I needed to implement a grace period. I wanted to define a ticket good for “Today” (or “Friday” or “5/4/2015”) and have it know that even after midnight (when the day became Tomorrow, Saturday or 5/5/2015) it would still be accepted for a certain amount of time.

Almost all examples of handling time I could find relied on knowing something about what the time_t number was. If you *knew* that time_t was “number of seconds since January 1 1970”, all you would have to do is add or subtract a certain amount of seconds from that value and you would be done.

But, according to the ANSI C standard, time_t is implementation specific. If you really want to write portable ANSI C code, you can’t assume anything about time_t other than it being some number.

My program is currently being built for Windows, Mac and Raspberry Pi. All three of these systems seem to handle time_ t the same way, but what if my code gets ported later to some embedded operating system that did it some other way?

The good news is that it’s really not much work to do things the “proper” way, though I certainly understand the lazy programmer mentality of “if it works for me, ship it!”

Here is what I learned and what prompted me to write this article: you can create struct tm values with invalid values and the C time library functions can normalize them.

Here is an example… If you want to create a time of 2:30 a.m., the struct tm values would look like this:

tm_hour = 2;
tm_min = 30;
tm_min = 0;

If all the tm values are properly formatted, you can pass them in to functions like asctime() and they work.

BUT, you can also represent 2:30 a.m. as “150 seconds after midnight” like this:

tm_hour = 0;
tm_min = 0;
tm_sec = 150;

This tm structure appears to be invalid since minutes is listed as being 0-59 in the references I looked at. Because of this, it never dawned on me I might be able to pass in a value other than 0-59.

If you pass this invalid struct tm in to asctime(), it will fail. However, if you pass it in to mktime(), it will normalize the values (adjusting them to the proper hour, minute and second values) and return that as a time_t time. Interesting.

It seemed I might be able to add time simply by adding a number of seconds or hours. 2 hours in the future might be as simple as:

tm_hour = time_hour + 2;

…then I would use mktime() to get an adjust time_t that now represents the time 2 hours in the future.

A few quick tests showed that this did work. Unfortunately, the way I was approaching my ticket expiration task required me to look 2 hours in the past. It didn’t seem possible, since that would mean using negative numbers and surely that wouldn’t work.

Or would it?

I had noticed that the tm_xxx variables were “int” values rather than “unsigned int”. Why? If values are 0-59 (minutes) or 0-23 (hour) or 0-365 (days since January 1), why would it ever need to be a signed negative value? But since a value could clearly be greater than 59 for seconds, perhaps negative values worked as well and that’s why they were “ints”.

Indeed, this is the case. I had never known you could do something like this:

tm_hour = tm_hour - 2;

By doing that, then converting it using mktime() in to a time_t, you end up with a time_t representation of two hours in the past.

Simple, and portable, and “proper.” Even if it looks strange.

The only issue with doing it this way is you need a few more steps. In my case, I had a time_t value that represented when a ticket was activated. It began life as something like this:

time_t activationTime = time(NULL);

When I would be doing my time checks, I would need to know the current time:

time_t currentTime = time(NULL);

And since I was not concerned with the time of day of the activation, just the actual day (month/day/year), I would need to convert each of these in to tm structures so I could look at those values:

struct tm *activationTmPtr;
int activationMonth, activationDay activationYear;
activationTmPtr = localtime(&activationTime);
activationMonth = activationTmPtr->tm_mon;
activationDay = activationTmPtr->tm_mday;
activationYear = activationTmPtr->tm_year;

NOTE: I am copying the values in to my own local variables because localtime(), gmtime() and other calls return a pointer to static data contained inside those functions. If I were to do something like this:

struct tm *activationTmPtr, *currentTmPtr;
activationTmPtr = localtime(&activationTime);
currentTmPtr = localtime(&currentTime);

…that might look proper, but each time localtime() is called, it handles the conversion and returns a pointer to the static memory inside the function. Every call to localtime() is returning the same pointer, so each call to localtime() updates that static memory.

activationTmPtr and currentTmPtr would both be the same address, and would both point to whatever the last localtime() conversion was.

Easy mistake to make, and one of the reasons returning pointers to static data is problematic. The caller has to understand this, and make copies of any data it wishes to keep. (Yeah, this is something I learned the hard way.)

With this in mind, I could get the parts of the local time the same way:

struct tm *currentTmPtr;
int currentMonth, currentDay currentYear;
currentTmPtr = localtime(&currentTime);
currentMonth = currentTmPtr->tm_mon;
currentDay = currentTmPtr->tm_mday;
currentYear = currentTmPtr->tm_year;

Now to see if Activation Day was the same as Today, I could just compare:

if ((currentDay==activationDay) && (currentMonth==activationMonth) && (currentYear==activationYear))

Simple. Though in my application, the ticket could also specify a day-of-week, so there could be a weekend pass active only on “Saturday” and “Sunday”, or a pass good for only “Thursday”. I would do this the same way, but I would use the tm_wday variable (day of week, 0-6).

To deal with a grace period, my logic looked like this:

  • Check current day (either month/day/year or day of week) against target valid day (again, either a month/day/year value, or a specific day of week).
  • If invalid, try the comparison again, but this time have “current day” be X hours earlier. If X was 2 hours, then I could test at 1:30 a.m. and it would be comparing as if the time was still 11:30 p.m. the previous day, and it would pass.

Easy peasy.

To do this, I simply created a special graceTime and graceTmPtr like this:

// Start with current time again and convert to a struct tm we can do math on.
graceTmPtr = localtime(&currentTime);
// Adjust the values to be 2 hours earlier.
gracetTmPtr->tm_hour = graceTmPtr->tm_hour - 2;
// We need to normalize this so we can get the real Month/Year/Day.
graceTime = mktime(graceTmPtr);
// And now we need it back as a struct tm so we can get to those elements.
graceTmPtr = localtime(&graceTime);
// And now we can get to those values.
graceMonth = graceTmPtr->tm_mon;
graceDay = graceTmPtr->tm_mday;
graceYear = graceTmPtr->tm_year;

After this, I could do the same check:

if ((graceDay==activationDay) && (graceMonth==activationMonth) && (graceYear==activationYear))

Problem solved.

Maybe this will help someone else. I know I spent far too much time searching for how to do this before I stumbled on some old post somewhere that mentioned this.

Have fun!