Now maybe someone here can tell me if this makes any sense:
#define SOME_NAME "SomeName\0"
I ran across something like this in my day job and wondered what the purpose of adding a “\0” zero byte was to the end of the string. C already does that, doesn’t it?
C escape codes
I learned about using backslash to embed certain codes in strings when I was first learning C on my Radio Shack Color Computer. I was using OS-9/6809 and a pre-ANSI K&R C compiler.
I learned about “\n” at the end of a line, and that may be the only one I knew about back then. (I expect even K&R has “\l” and maybe “\t” too, but I never used them in any of my code back then.)
The wikipedia has a handy reference:
Escape sequences in C – Wikipedia
It lists many I was completely unaware of – like “vertical tab.” I’d have to look up what a vertical tab is, as well ;-)
It was during my “modern” career that I learned you could embed any value in a printf by escaping it with “\x” and a hex value:
int main()
{
const char bytes[] = "\x01\x02\x03\x04\x05";
printf ("sizeof(bytes) = %zu\n", sizeof(bytes));
for (int idx=0; idx<sizeof(bytes); idx++)
{
printf ("%02x ", bytes[idx]);
}
printf ("\n");
return EXIT_SUCCESS;
}
This code makes a character array containing the bytes 0x01, 0x02, 0x03, 0x04 and 0x05. A zero follows, added by C to terminate the quoted string. The output looks like:
sizeof(bytes) = 6
01 02 03 04 05 00
I do not know how I learned it, but it was just two jobs ago when I used this to embed a bunch of data in a C program. I believe I was tokenizing some strings to reduce code size, and I had some kind of lookup table of strings, and then the “token” strings of bytes that referred back to the full string. Something like this, except less stupid:
#include <stdio.h>
#include <stdlib.h> // for EXIT_SUCCESS
#include <stdint.h>
const char *words[] =
{
"I",
"know",
"you"
};
const uint8_t sentence[] = "\x01\x02\x03\x02\x01\x02";
int main()
{
printf ("sizeof(sentence) = %zu\n", sizeof(sentence));
for (int idx=0; idx<sizeof(sentence)-1; idx++)
{
printf ("%s ", words[sentence[idx]-1]);
}
printf ("\n");
return EXIT_SUCCESS;
}
In this silly example, I have an array of strings, and then an encoded sentence with bytes representing each word. The encoded bytes will have a 0 at the end, so I use 1 for the first word, and so on, with 0 marking the end of the sequence. But, this example doesn’t actually look for the 0. It just uses the number of bytes in the sentence (minus one, to skip the 0 at the end) via sizeof().
It really should use the 0, so this could be a function. You could pass it the dictionary of words, and the sentence bytes, and let it decode them in a more flexible/modular way:
#include <stdio.h>
#include <stdlib.h> // for EXIT_SUCCESS
#include <stdint.h>
// Dictionary of words
const char *words[] =
{
"I",
"know",
"you"
};
// Encoded sentence
const uint8_t sentence[] = "\x01\x02\x03\x02\x01\x02";
// Decoder
void showSentence(const char *words[], const uint8_t sentence[])
{
int idx = 0;
while (sentence[idx] != 0)
{
printf ("%s ", words[sentence[idx]-1]);
idx++;
}
printf ("\n");
}
// Test
int main()
{
printf ("sizeof(sentence) = %zu\n", sizeof(sentence));
showSentence (words, sentence);
return EXIT_SUCCESS;
}
But I digress. My point is — I’m still learning things in C, even after knowing it since the late 1980s.
So back to the original question: What is adding a “\0” to a string doing? This is one advantage of using sizeof() versus strlen(). strlen() will stop at the 0, but sizeof() will tell you everything that is there.
#include <stdio.h>
#include <stdlib.h> // for EXIT_SUCCESS
#include <string.h> // for strlen()
int main()
{
const char string[] = "This is a test.\0And so is this.\0And this is also.";
printf ("strlen(string) = %zu\n", strlen(string));
printf ("sizeof(string) = %zu\n", sizeof(string));
return EXIT_SUCCESS;
}
The output:
strlen(string) = 15
sizeof(string) = 50
If you try to printf() that string, it will print only up to the first \0. But, there is more “hidden” data after the zero. If you have the sizeof(), that size could be used in a routine to print everything. But why? We can already do string arrays or just embed carriage returns in a string if we wanted to print multiple lines.
But it’s still neat.
Have you ever done something creating with C escape codes? Leave a comment…
Until then…
It normally makes zero sense to manually append a \0 to the end of a string constant in C since, as you say, C already does that. The only reason I can think of where it makes sense is if you need a set of string constants that are always the same number of bytes in which case manual NUL padding like that might make sense. But just adding a \0 at the end of every string is completely pointless unless you have some really weird usage that needs a double NUL or something.
I learn C things all the time (mostly from comments here) so I suspect maybe the original programmer did not know about the automatic NIL at the end? (er, is “\0” considered NIL? Am I remembering that correctly?)
Obviously the original programmer is definitely not a C programmer … ;)
Thanks okay. Some say that about me too ;)
If you want to learn C, then there are two books you need to get. The first is The C Programming Language, 2nd Edition by Kerninghan and Ritchie. This covers the language, and goes a bit into the standard C library. The second is The Standard C Library by P. J. Plauger. This talks about each function in the standard C library and goes into the history of why it is the way it is.
Those were the only two books in C that I found useful.
I am pretty sure o started with the K&R book when I started learning C back in the late 1980s! That, and a C pocket reference an Amiga friend loaned me (he was also teaching me C), where the only two resources for the language I had back then. I still remember the article in Rainbow touting “C: the language of the 80s” or something. Seemed like a good idea at the time!
It is unfortunate that most of the compilers I have worked with in the embedded space are so far behind the standard.