sizeof() matters

Updates:

  • 2016/02/29 – Per a comment by James, I corrected my statement that sizeof() is a macro. It is not. It is a keyword. My bad.

In C, the sizeof() macro can be used to determine the size of a variable type or structure. For instance, if you need to know the size of an “int” on your system, you can use sizeof(int). If you have a variable like “int i;” or “long i;”, you can also use sizeof(i).

On the Arduino, an int is 16-bits:


void setup() {
// put your setup code here, to run once:
Serial.begin(9600);
Serial.print("sizeof(int) is ");
Serial.println(sizeof(int));
}

void loop() {
// put your main code here, to run repeatedly:
}

On the Arduino, that produces:

sizeof(int) is 2

On a Windows system, an int is 32-bits:


int main()
{
printf("sizeof(int) is %dn", sizeof(int));

return EXIT_SUCCESS;
}

That displays:

sizeof(int) is 4

Note: sizeof() is not a library function. It is a macro C keyword that is handled by the C preprocessor during compile time. It will be replaced with the number representing the size the same way a #define replaces the define in the source code. At least, I think that’s what it does.

You should avoid making any assumptions about the size of data types beyond what the C standard tells you. For example, an “int” should be “at least 16-bit”. Thus, even a PC compiler could have chosen to make an “int” be 16-bits instead of 32.

A better way to use data types was added in the C99 specification, where you can include stdint.h and then request specific types of variables:


uint8_t unsignedByte;

uint16_t unsignedWord;

int32_t signed32bit;

But I digress.

The point of this article was to mention that you can also use sizeof() on strings IF they are known to the compiler at compile time. You can, of course, get the size of a pointer:


char *ptr;

printf("sizeof(ptr) is %dn", sizeof(ptr));

Depending on the size of a pointer on your system  (16-bits on the Arduino, 32 on the PC), you will get back 2 or 4 (or 8 if it’s a 64-bit pointer, I suppose).

And the pointer is still the same size regardless of what it points to. You still get the same size even if you had something like this:


char *msgPtr = "This is my message.";

printf("sizeof(msgPtr) is %dn", sizeof(msgPtr));

But, if you had declared that string as an array of characters, rather than a pointer to a character, you get something different because the compiler knows a bit about what you are pointing to:


char msgArray[] = "This is my message.";

printf("sizeof(msgArray) is %dn", sizeof(msgArray));

There, you see the compiler actually substitutes the size of the array of characters:

sizeof(msgArray) is 20

This is an instance where using “char *ptr =” is different than “char ptr[] = ” even though, ultimately, they both are pointers to some memory location where those characters exist.

At work, I ran across a bunch of test code that did this:


const char    PROMPT[] = "Shell: ";
const uint8_t PROMPT_LEN = 7;

const char    LOGIN[] = "Login: ";
const uint8_t LOGIN_LEN = 7;

Those strings would be used elsewhere, and the length needed to be known by some write()-type function. Counting bytes in a quotes string and keeping that number updated sounds like work, so instead they could have used the sizeof() macro. Since it returns the size of the array (including the NIL zero byte at the end), they’d need to subtract one like this:


const char    PROMPT[] = "Shell: ";
const uint8_t PROMPT_LEN = sizeof(PROMPT)-1;

const char    LOGIN[] = "Login: ";
const uint8_t LOGIN_LEN = sizeof(LOGIN)-1;

At compile time, the size of the character array is known, and the compiler substitutes that length where the “sizeof()” macro is. If the string is changed, that value also changes (at compile time).

Of course, since we are using NIL terminated strings, you could also just use the strlen() function. But, that is more for strings of unknown length, and it runs code that counts every character until the NIL zero, which is wasted CPU use and code space if you don’t actually need to do that.

My optimization tip for today is: If you are using hard coded constant strings, and you need to know the size of them, declare them as C arrays (not a pointer to the string) and use the sizeof() macro as a constant. Use strlen() only for times when the compiler cannot know the size of the character array (dynamic strings or things being passed in to a function from the outside).

Speaking of that … as long as the compiler can “see” where the array is declared, sizeof() will work. But if you had something like this:


void showSize(char *ptr)
{
printf("showSize - sizeof(ptr) = %dn", sizeof(ptr));
}

int main()
{
const char    LOGIN[] = "Login: ";

showSize(LOGIN);

return EXIT_SUCCESS;
}

…that will not work. By the time you pass in just a “pointer to” the array, all the compiler sees (inside that showSize function) is a pointer, and thus can only tell you the size of the pointer, and not what it points to.

As you see, this tip is of limited use, but I think it is still neat and a potential way to save some CPU cycles and program space bytes from time to time. Since I have worked on a number of Arduino Sketches that have gotten too big to fit (also on some TI MSP430 projects), small tricks like this can make a very big difference in getting something to fit or not fit.

sizeof() can matter :-)

3 thoughts on “sizeof() matters

  1. James Jones

    The preprocessor doesn’t know sizeof. It’s not a macro; look closely at the standard and you’ll see that one form, I think the one where you hand it a type rather than an expression, doesn’t need the parentheses. The standard is set up to allow the preprocessor to be a separate program, and that means it may not know the sizes of types.

    Reply
  2. James Jones

    Looks like I got it backwards. Two of the alternatives for the nonterminal unary-expression in the C grammar are

    sizeof unary-expression
    sizeof ( type-name )

    Since you can derive a parenthesized expression from a unary-expression, I (and I suspect about everybody else) use the parentheses all the time, though you can write “sizeof a”, or even “sizeof -a”, if you want–but the fact that the parentheses are optional suffices to show that sizeof can’t be a macro. Whoever actually evaluates sizeof has to have access to the symbol table and knowledge of the target environment–not what you’d want a preprocessor to have to know.

    Reply
  3. Pingback: C structures and padding and sizeof | Sub-Etha Software

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.