C structures and padding and sizeof

This came up at my day job when two programmers were trying to get a block of data to be the size both expected it to be. Consider this example:

typedef struct
{
    uint8_t     byte1;  // 1
    uint16_t    word1;  // 2
    uint8_t     byte2;  // 1
    uint16_t    word2;  // 2
    uint8_t     byte3;  // 1
                        // 7 bytes
} MyStruct1;

The above structure represents three 8-bit byte values and two 16-bit word values for a total of 7 bytes.

However, if you were to run this code in GCC for Windows, and print the sizeof() that structure, you would see it returns 10:

sizeof(MyStruct1) = 10

This is due to the compiler padding variables so they all start on a 16-bit boundary.

The expected data storage in memory feels like it should be:

[..|..|..|..|..|..|..] = 7 bytes
 |  |  |  |  |  |  |
 |  |  |  |  \  /  byte3
 |  |  |  |  word2
 |   \ /  byte2
 |  word1
 byte1

But, using GCC on a Windows 10 machine shows that each value is stored on a 16-bit boundary, leaving unused padding bytes after the 8-bit values:

[..|xx|..|..|..|xx|..|..|..|xx] = 10 bytes
 |     |  |  |     |  |  |
 |     |  |  |     \  /  byte3
 |     |  |  |     word2
 |      \ /  byte2
 |     word1
 byte1

As you can see, three extra bytes were added to the “blob” of memory that contains this structure. This is being done so each element starts on an even-byte address (0, 2, 4, etc.). Some processors require this, but if you were using one that allowed odd-byte access, you would likely get a sizeof() 7.

Do not rely on processor architecture

To create portable C, you must not rely on the behavior of how things work on your environment. The same can/will could produce different results on a different environment.

See also: sizeof() matters, where I demonstrated a simple example of using “int” and how it was quite different on a 16-bit Arduino versus a 32/64-bit PC.

Make it smaller

One easy thing to do to reduce wasted memory in structures is to try to group the 8-bit values together. Using the earlier structure example, by simple changing the ordering of values, we can reduce the amount of memory it uses:

typedef struct
{
    uint8_t     byte1;  // 1
    uint8_t     byte2;  // 1
    uint8_t     byte3;  // 1
    uint16_t    word1;  // 2
    uint16_t    word2;  // 2
                        // 7 bytes
} MyStruct2;

On a Windows 10 GCC compiler, this will produce:

sizeof(MyStruct1) = 8

It is still not the 7 bytes we might expect, but at least the waste is less. In memory, it looks like this:

[..|..|..|xx|..|..|..|..] = 8 bytes
 |  |  |     |  |  \  /
 |  |  |     \  /  word2
 |  |  |     word1
 |  |  byte3
 |  byte2
 byte1

You can see an extra byte of padding being added after the third 8-bit value. Just out of curiosity, I moved the third byte to the end of the structure like this:

typedef struct
{
    uint8_t     byte1;  // 1
    uint8_t     byte2;  // 1
    uint16_t    word1;  // 2
    uint16_t    word2;  // 2
    uint8_t     byte3;  // 1
                        // 7 bytes
} MyStruct3;

…but that also produced 8. I believe it is just adding an extra byte of padding at the end (which doesn’t seem necessary, but perhaps memory must be reserved on even byte boundaries and this just marks that byte as used so the next bit of memory would start after it).

[..|..|..|..|..|..|..|xx] = 8 bytes
 |  |  |  |  |  |  |
 |  |  |  |  \  /  byte3
 |  |  \  /  word2
 |  |  word1
 |  byte2
 byte1

Because you cannot ensure how a structure ends up in memory without knowing how the compiler works, it is best to simply not rely or expect a structure to be “packed” with all the bytes aligned like the code. You also cannot expect the memory usage is just the values contained in the structure.

I do frequently see programmers attempt to massage the structure by adding in padding values, such as:

typedef struct
{
    uint8_t     byte1;    // 1
    uint8_t     padding1; // 1

    uint16_t    word1;    // 2

    uint8_t     byte2;    // 1
    uint8_t     padding2; // 1

    uint16_t    word2;    // 2

    uint8_t     byte3;    // 1
    uint8_t     padding3; // 1
                          // 10 bytes
} MyPaddedStruct1;

At least on a system that aligns values to 16-bits, the structure now matches what we actually get. But what if you used a processor where everything was aligned to 32-bits?

It is always best to not assume. Code written for an Arduino one day (with 16-bit integers) may be ported to a 32-bit Raspberry Pi Pico at some point, and not work as intended.

Here’s some sample code to try. You would have to change the printfs to Serial.println() and change how it prints the sizeof() values, but then you could see what it does on a 16-bit Arduino UNO versus a 32-bit PC or other system.

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>

typedef struct
{
    uint8_t     byte1;  // 1
    uint16_t    word1;  // 2
    uint8_t     byte2;  // 1
    uint16_t    word2;  // 2
    uint8_t     byte3;  // 1
                        // 7 bytes
} MyStruct1;

typedef struct
{
    uint8_t     byte1;  // 1
    uint8_t     byte2;  // 1
    uint8_t     byte3;  // 1
    uint16_t    word1;  // 2
    uint16_t    word2;  // 2
                        // 7 bytes
} MyStruct2;

typedef struct
{
    uint8_t     byte1;  // 1
    uint8_t     byte2;  // 1
    uint16_t    word1;  // 2
    uint16_t    word2;  // 2
    uint8_t     byte3;  // 1
                        // 7 bytes
} MyStruct3;

int main()
{
    printf ("sizeof(MyStruct1) = %u\n", (unsigned int)sizeof(MyStruct1));
    printf ("sizeof(MyStruct2) = %u\n", (unsigned int)sizeof(MyStruct2));
    printf ("sizeof(MyStruct3) = %u\n", (unsigned int)sizeof(MyStruct3));

    return EXIT_SUCCESS;
}

Until next time…

4 thoughts on “C structures and padding and sizeof

  1. William Astle

    The padding at the end is to ensure an array of structures will have all elements aligned properly since offsets in arrays are basically “arraypointer + index * sizeof(element)”. (Calculating the offset is more efficient for some sizes as well.) If it’s not obvious why that needs to be the case, consider arrays allocated with malloc(). They have the same alignment requirements but malloc() has no idea that it’s allocating an array of structures. It just gets a number of bytes, which will be wrong if the alignment padding isn’t included in sizeof.

    Memory allocation itself may also benefit from certain sizes, but sizeof and structs don’t need to care about that. Only the allocator needs to care about that.

    Reply
    1. Lee

      I’m not actually sure why the C compiler adds the padding, and I may be wrong on some of what I’m about to say, but this is my understanding. Some CPUs can retrieve a word (2 bytes) in one “cycle” as long as it’s from an even memory address. If it’s from an odd memory address, the CPU must retrieve the first half (the byte before it and the first byte), then the second half (the second byte and the byte after) and piece the parts together. This is a lot more inefficient (requires more CPU cycles), so it could be that the compilers try to align multi-byte data (such as words) on even addresses for performance gains.

      Reply
      1. Allen Huffman Post author

        We have a 16-bit C compiler at work that takes int32s, floats and doubles. It puts global int8s side by side and we found a bug that happened when accessing a value from an ISR at the same time another side by side value was being modified. It would corrupt. I seem to recall that the architecture allowed direct access in lower RAM, but after a certain point it had to generate extra code t9 retrieve, modify and store the variable, so it only happened to variables that existed there. Weird but to track down and figure out but I found it.

        Reply

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.