Updates:
- 2024-08-27 – Adding a note about strlen()/sizeof() that was mentioned by Dave in the comments.
I am used to using sizeof() to know the size of a structure, or size of a variable…
typedef struct {
char a;
short b;
int c;
long d;
} MyStruct;
printf ("sizeof(MyStruct) is %d\n", sizeof(MyStruct));
MyStruct foo;
printf ("sizeof(foo) is %d\n", sizeof(foo));
…but every time I re-learn you can use it on strings, I am surprised:
#include <stdio.h>
#define VERSION_STRING __DATE__" "__TIME__
int main()
{
printf ("Build: %s\n", VERSION_STRING);
printf ("sizeof(): %ld\n", sizeof(VERSION_STRING));
return 0;
}
Normally, I see strlen() used, and that works for a string that is in a buffer, or a constant string:
#define VERSION_STRING "1.0.42-beta"
const char versionString[] = "1.0.42-beta";
printf ("strlen(VERSION_STRING) = %d\n", strlen(VERSION_STRING));
printf ("strlen(versionString) = %d\n", strlen(versionString));
…but if you know it is a #define string constant, you can use sizeof() and that will be changed in to the hard-coded value that matches the length of that hard-coded string. This will be smaller code, and faster, since strlen() has to scan through the string memory looking for the ‘0’ at the end, counting along the way.
I wonder how many times I have posted about this over the years.
Additional Notes:
In the comments, Dave added:
sizeof a string literal includes the terminating nul character, so it will be strlen +1.
– Dave
Ah, yes – a very good thing to note. C strings have a 0 byte added to the end of them, so “hello” is really “hello\0”. The standard C string functions like strcpy(), strlen(), etc. look for that 0 to know when to stop.
#include <stdio.h>
#include <stdlib.h> // for EXIT_SUCCESS
#include <string.h> // for strlen()
#define STRING "hello"
int main()
{
printf ("sizeof(STRING) = %ld\n", sizeof(STRING));
printf ("strlen(STRING) = %ld\n", strlen(STRING));
return EXIT_SUCCESS;
}
Output would show:
sizeof(STRING) = 6
strlen(STRING) = 5
So if using sizeof() to memcpy() bytes somewhere without the overhead of a strlen() counting first, you’d really want something like…
memcpy (buffer, STRING, sizeof(STRING)-1);
Until next time…
sizeof a string literal includes the terminating nul character, so it will be strlen +1.
sizeof “hello” == 6, while strlen(“hello”) == 5
That is a very good thing to note. I will update.
To better remember this realize that arrays are not pointers, and string literals are arrays (that can decay to pointers).
const char arrayVersion[] = “1.0.42-beta”;
const char* pointerString = “1.0.42-beta”;
printf (“sizeof(arrayVersion) = %d\n”, sizeof(arrayVersion));
printf (“sizeof(pointerString) = %d\n”, sizeof(pointerString));
Outputs
sizeof(arrayVersion) = 12
sizeof(pointerString) = 4
You know, I don’t think I realized that. I know at some point, I started using pointers for all my strings, declaring them as char *namePtr = “foo” or whatever. I would intentionally put the “Ptr” at the end to remind me it was a pointer. Seeing this is enough to make me switch back to using “char name[]” just for the purpose of the sizeof(name) being what I’d expect, and not a pointer.
Except for situations where arrays are pointers:
~/dev/c/tests> cat arrays.c
#include <stdio.h>
int test(char arr[]) {
printf("sizeof arr: %d\n", sizeof(arr));
}
int main() {
char a[] = "x\n";
test(a);
}
~/dev/c/tests> gcc arrays.c -o arrays && ./arrays
sizeof arr: 8
Yes and no. The array object itself is of course not a pointer. The issue is that while the written syntax of the parameter looks like an array the type is not. In N4917, Section 9.3.4.6, paragraph 5 it explains the type is actually pointer:
So the parameter is actually a pointer and already in the call has the array argument decayed. You can even give the “array” in the parameter a fixed size and it will always output 8.
sizeof does not result in an int, so using %d is not correct.
Correct observation – compiler warning should scream at that, if enabled. A cast to int or similar is what I would do. I don’t think we know what a size_t is other than a number. I had to use %d on some machines I work with, and %ld is on others. Is there a printf parameter that is better than one of those and casting? I only recently learned about %p. I learned C on a K&R pre-ANSI compiler so a lot of this still feels new to me ;)
Thank you! New post coming, thanks to your comment, and another, pointing me to %zu.
How both cases deal with unicode characters? Do they behave the same?
I have never worked with Unicode. I see there are C escape codes to add them in a C string. I wonder if my embedded C compiler even honors those. I will make a note to explore that.