Imagine running across a block of C code that is meant to create a comma separated list of items like this:
2024/10/18,12:30:06,100,0.00,0,0,0,902.0,902.0,928.0,31.75,0,0,100 ...
2024/10/18,12:30:07,100,0.00,0,0,0,902.0,902.0,928.0,31.75,0,0,100 ...
2024/10/18,12:30:08,100,0.00,0,0,0,902.0,902.0,928.0,31.75,0,0,100 ...
2024/10/18,12:30:09,100,0.00,0,0,0,902.0,902.0,928.0,31.75,0,0,100 ...
And the code that does it was modular, so new items could be inserted anywhere, easily. This is quite flexible:
snprintf (buffer, BUFFER_LENGTH, "%u", value1);
strcat (outputLineBuffer, buffer);
strcat (outputLineBuffer, ",");
snprintf (buffer, BUFFER_LENGTH, "%u", value2);
strcat (outputLineBuffer, buffer);
strcat (outputLineBuffer, ",");
snprintf (buffer, BUFFER_LENGTH, "%.1f", value3);
strcat (outputLineBuffer, buffer);
strcat (outputLineBuffer, ",");
snprintf (buffer, BUFFER_LENGTH, "%.1f", value4);
strcat (outputLineBuffer, buffer);
strcat (outputLineBuffer, ",");
snprintf is used to format the variable into a temporary buffer, then strcat is used to append that temporary buffer to the end of the line output buffer that will be written to the file. Then strcat is used again to append a comma… and so on.
Let’s ignore the use of the “unsafe” strcat which can trash past the end of a buffer is the NIL (“\0”) zero byte is not found. We’ll just say strncat exists for a reason and can prevent buffer overruns crashing a system.
Many C programmers never think about what goes on “behind the scenes” when a function is called. It just does what it does. Only if you are on a memory constrained system may you care about how large the code is, and only on a slow system may you care about how slow the code is. As an embedded C programmer, I care about both since my systems are often slow and memory constrained.
strcat does what?
The C string functions rely on finding a zero byte at the end of the string. strlen, for example, starts at the beginning of the string then counts until if finds a 0, and returns that as the size:
size_t stringLength = strlen (someString);
And strcat would do something similar, starting at the address of the string passed in, then moving forward until a zero is found, then copying the new string bytes there up until it finds a zero byte at the end of the string to be copied. (Or, in the case of strncat, it might stop before the zero if a max length is reached.)
I have previously written about these string functions, including showing implementations of “safe” functions that are missing from the standard C libraries. See my earlier article series about these C string functions. And this one.
But what does strcat look like? I had assumed it might be implemented like this:
char *myStrcat (char *dest, const char *src)
{
// Scan forward to find the 0 at the end of the dest string.
while (*dest != 0)
{
dest++;
}
// Copy src string to the end.
do
{
*dest = *src;
dest++;
src++;
} while (*src != 0);
return dest;
}
That is a very simple way to do it.
But why re-invent the wheel? I looked to see what GCC does, and their implementation of strcat makes use of strlen and strcpy:
/* Append SRC on the end of DEST. */
char *
STRCAT (char *dest, const char *src)
{
strcpy (dest + strlen (dest), src);
return dest;
}
When you use strcat on GCC, it does the following:
- Call strlen() to count all the characters in the destination string up to the 0.
- Call strcpy() to copy the source string to the destination string address PLUS the length calculated in step 1.
- …and strcpy() is doing it’s own thing where it copies characters until it finds the 0 at the end of the source string.
Reusing code is efficient. If I were to write a strcat that worked like GCC, it would be different than my quick-and-dirty implementation above.
This is slow, isn’t it?
In the code I was looking at…
snprintf (buffer, BUFFER_LENGTH, "%u", value1);
strcat (outputLineBuffer, buffer);
strcat (outputLineBuffer, ",");
…there is alot going on. First snprint does alot of work to convert the variable in to string characters. Next, strcat is calling strlen on the outputLineBuffer, then calling strcpy. Finally, strcat calls strlen on the outputLineBuffer again, then calls strcpy to copy over the comma character.
That is alot of counting from the start of the destination string to the end, and each step along the way is more and more work since there are more characters to copy. Suppose you were going to write out ten five-digit numbers:
11111,22222,33333,44444,55555
The first number is quick because nothing is in the destinations string yet so strcat has nothing to count. “11111” is copied. Then, there is a scan of five characters to get to the end, then the comma is copied.
For the second number, strcat has to scan past SIX characters (“11111,”) and then the process continues.
The the third number has to scan past TWELVE (“11111,22222,” characters.
Each entry gets progressively slower and slower as the string gets longer and longer.
Can we make it faster?
If things were set in stone, you could do this all with one snprint like this:
snprintf ("%u,%u,%.1f,%.1f\n", value1, value2, value3, value4);
Since printf is being used to format all the values in to a buffer, then doing the whole string with one call to snprintf may be the smallest and fastest way to do this.
But if you are dealing with something with dozens of values, when you go in later to add one in the middle, there is great room for error if you get off by a comma or a variable in the parameter list somewhere.
I suspect this is why the code I am seeing was written the way it was. It makes adding something in the middle as easy as adding three new lines:
snprintf (buffer, BUFFER_LENGTH, "%u", newValue);
strcat (outputLineBuffer, buffer);
strcat (outputLineBuffer, ",");
Had fifty parameters been inside some long printf, there is a much greater chance of error when updating the code later to add new fields. The “Clean Code” philosophy says we spend more time maintaining and updating and fixing code than we do writing it, so writing it “simple and easy to understand” initially can be a huge time savings. (Spend a bit more time up front, save much time later.)
So since I want to leave it as-is, this is my suggestion which will cut the string copies in half: just put the comma in the printf:
snprintf (buffer, BUFFER_LENGTH, "%u,", value1);
strcat (outputLineBuffer, buffer);
snprintf (buffer, BUFFER_LENGTH, "%u,", value2);
strcat (outputLineBuffer, buffer);
snprintf (buffer, BUFFER_LENGTH, "%.1f,", value3);
strcat (outputLineBuffer, buffer);
snprintf (buffer, BUFFER_LENGTH, "%.1f,", value4);
strcat (outputLineBuffer, buffer);
And that is a very simple way to reduce the times the computer has to spend starting at the front of the string and counting every character forward until a 0 is found. For fifty parameters, instead of doing that scan 100 times, now we only do it 50.
And that is a nice savings of CPU time, and also saves some code space by eliminating all the extra calls to strcat.
You know better, don’t you?
But I bet some of you have an even better and/or simpler way to do this.
Comment away…