Category Archives: Programming

printf portability problems persist… possibly.

TL:DNR – You all probably already knew this, but I just learned about inttypes.h. (Well, not actually “just”; I found out about it last year for a different reason, but I re-learned about it now for this issue…)

I was today years old when I learned that there was a solution to a bothersome warning that most C programmers probably never face: printing ints or longs in code that will compile on 16-bit or 32/64-bit systems.

For example, this code works fine on my 16-bit PIC24 compiler and a 32/64-bit compiler:

int x = 42;
printf ("X is %dn", x);

long y = 42;
printf ("Y is %ldn", y);

This is because “%d” represents and “int“, whatever that is on the system — 16-bit, 32-bit or 64-bit — and “%ld” represents a “long int“, whatever that is on the system.

On my 16-bit PIC24 compiler, “int” is 16-bits and “long int” is 32-bits.

On my PC compiler “int” is 32-bits, and “long int” is 64-bits.

But int isn’t portable, is int?

As far as I recall, the C standard says an int is “at least 16-bits.” If you want to represent a 16-bit value in any compliant ANSI-C code, you can use int. It may be using 32 or 64 bits (or more?), but it will at least hold 16-bits.

What if you need to represent 32 bits? This code works fine on my PC compiler, but would not work as expected on my 16-bit system:

unsigned int value = 0xaabbaabb;

printf ("value: %u (0x%x) - ", value, value);

for (int bit = 31; bit >= 0; bit--)
{
    if ( (value & (1<<bit)) == 0)
    {
        printf ("0");
    }
    else
    {
        printf ("1");
    }
}
printf ("n");

On a 16-bit system, an “unsigned int” only holds 16-bits, so the results will not be what one would expect. (A good compiler might even warn you about that, if you have warnings enabled… which you should.)

stdint.h, anyone?

In my embedded world, writing generic ANSI-C code is not always optimal. If we must have 32-bits, using “long int” works on my current system, but what if that code gets ported to a 32-bit ARM processor later? On that machine, “int” becomes 32-bits, and “long” might be 64-bits.

Having too many bits is not as much of an issue as not having enough, but the stdint.h header file solves this by letting us request what we actually want to use. For example:

#include <stdio.h>
#include <stdint.h> // added

int main()
{
    uint32_t value = 0xaabbaabb; // changed
    
    printf ("value: %u (0x%x) - ", value, value);
    
    for (int bit = 31; bit >= 0; bit--)
    {
        if ( (value & (1<<bit)) == 0)
        {
            printf ("0");
        }
        else
        {
            printf ("1");
        }
    }
    printf ("n");

    return 0;
}

Now we have code that works on a 16-bit system as well as a 32/64-bit system.

Or do we?

There is a problem, which I never knew the solution to until recently.

printf ("value: %u (0x%x) - ", value, value);

That line will compile without warnings on my PC compiler, but I get a warning on my 16-bit compiler. On a 16-bit compiler, “%u” is for printing an “unsigned int”, as is “%x”. But on that compiler, the “uint32_t” represents a 32-bit value. Normal 16-bit compilers would probably call this an “unsigned long”, but my PIC24 compiler has its own internal variable types, so I see this in stdint.h:

typedef unsigned int32 uint32_t;

On the Arduino IDE, it looks more normal:

typedef unsigned long int uint32_t;

And a “good” compiler (with warnings enabled) should alert you that you are trying to print a variable larger than the “%u” or “%x” handles.

So while this works fine on my 32-bit compiler…

// For my 32/64-bit system:
uint32_t value32 = 42;
printf ("%u", value32);

…it gives a warning on the 16-bit ones. To make it compile on the 16-bit compiler, I change it to use “%lu” like this:

// For my 16-bit system:
uint32_t value32 = 42;
printf ("%lu", value32);

…but then that code will generate a compiler warning on my 32/64-bit system ;-)

There are some #ifdefs you can use to detect architecture, or make your own using sizeof() and such, that can make code that compiles without warnings, but C already solved this for us.

Hello, inttypes.h! Where have you been all my C-life?

On a whim, I asked ChatGPT about this the other day and it showed me define/macros that are in inttypes.h that take care of this.

If you want to print a 32-bit value, instead of using “%u” (on a 32/64-bit system) or “%lu” on a 16-bit, you can use PRIu32 which represents whatever print code is needed to print a “u” that is 32-bits:

#define PRIu32 "lu"

Instead of this…

uint32_t value = 42;
printf ("value is %u\n", value);

…you do this:

uint32_t value = 42;
printf ("value is %" PRIu32 "\n", value);

Because of how the C preprocessor concatenates strings, that ends up creating:

printf ("value is %lu\n", value); // %lu

But on a 32/64-bit compiler, that same header file might represent it as:

#define PRIu32 "u"

Thus, writing that same code using this define would produce this on the 32/64-bit system:

printf ("value is %u\n", value); // %u

Tada! Warnings eliminated.

And now I realize I have used this before, for a different reason:

uintptr_t
PRIxPTR

If you try to print the address of something, like this:

void *ptr = 0x1234;
printf ("ptr is 0x%x\n", ptr);

…you should get a compiler warning similar to this:

warning: format ‘%x’ expects argument of type ‘unsigned int’, but argument 2 has type ‘void *’ [-Wformat=]

%x is for printing an “unsigned int”, and ptr is a “void *”. Over the years, I made this go away by casting:

printf ("ptr is 0x%x\n", (unsigned int)ptr);

But, on my 32/64-bit compiler, the “unsigned int” is a 32-bit value, and %x is not for 32-bit values. Thus, I still get a warning. There, I would use “%lx” for a “long int”.

To make that go away, last year I learned about using PRIxPTR to represent the printf code for printing a pointer as hex:

printf ("pointer is 0x%" PRIxPTR "\n",

On my 16-bit compiler, it is:

#define PRIxPTR "lx"

This is because pointers are 32-bit on a PIC24 (even though an “int” on that same system is 16-bits).

On the 32/64-bit compiler (GNU-C in this case), it changes depending on if the system:

#ifdef _WIN64
...
#define PRIxPTR "I64x" // 64-bit mode
...
else
...
#define PRIxPTR "x" // 32-bit mode
...
#endif

I64 is something new to me since I never write 64-bit code, but clearly this shows there is some extended printf formatting for 64-bit values, versus just using “%x” for the default int size (32-bits) and “%lx” for the long size.

Instead of casting to an “(unsigned int)” or “(unsigned long int)” before printing, there is a special “uintptr_t” type that will be “whatever size a pointer is.

This gives me a warning:

printf ("ptr is 0x%" PRIxPTR "\n", (unsigned int)ptr);

warning: format ‘%lx’ expects argument of type ‘long unsigned int’, but argument 2 has type ‘unsigned int’ [-Wformat=]

But I can simply change the casting of the pointer:

printf ("ptr is 0x%" PRIxPTR "\n", (uintptr_t)ptr);

You may have also noticed I still have a warning when declaring the pointer with a value:

void *ptr = 0x1234;

warning: initialization of ‘void *’ from ‘int’ makes pointer from integer without a cast [-Wint-conversion]

Getting rid of this is as simple as making sure the value is cast to a “void *”:

void *ptr = (void*)0x1234;

This is what happens when you learn C on a K&R compiler in the late 1980s and go to sleep for awhile without keeping up with all the subsequent standards, including one from 2023 that I just found out about while typing this up!

Per BING CoPilot…

  • C89/C90 (ANSI X3.159-1989): The first standard for the C programming language, published by ANSI in 1989 and later adopted by ISO as ISO/IEC 9899:1990.
  • C95 (ISO/IEC 9899/AMD1:1995): A normative amendment to the original standard, adding support for international character sets.
  • C99 (ISO/IEC 9899:1999): Introduced several new features, including inline functions, variable-length arrays, and new data types like long long int.
  • C11 (ISO/IEC 9899:2011): Added features like multi-threading support, improved Unicode support, and type-generic macros.
  • C17 (ISO/IEC 9899:2018): A bug-fix release that addressed defects in the C11 standard without introducing new features.
  • C23 (ISO/IEC 9899:2023): The latest standard, which includes various improvements and new features to keep the language modern and efficient.

The more you know…

Though, I assume all the younguns that grew up in the ANSI-C world already know this. I grew up when you had to write functions like this:

/* Function definition */
int add(x, y)
int x, y;
{
return x + y;
}

Now to get myself in the habit of never using “%u”, “%d”, etc. when using stdint.h types…

Until then…

Splitting up strings in C source code.

When printing out multiple lines of text in C, it is common to see code like this:

printf ("+--------------------+\n");
printf ("| Welcome to my BBS! |\n");
printf ("+--------------------+\n");
printf ("| C)hat    G)oodbye  |\n");
printf ("| E)mail   H)elp     |\n");
printf ("+--------------------+\n");

That looks okay, but is calling a function for each line. You could just as easily combine multiple lines and embed the “\n” new line escape code in one long string.

printf ("+--------------------+\n| Welcome to my BBS! |\n+--------------------+\n| C)hat    G)oodbye  |\n| E)mail   H)elp     |\n+--------------------+\n");

Not only does it make the code a bit smaller (no overhead of making the printf call multiple times), it should be a bit faster since it removes the overhead of going in and out of a function.

But man is that ugly.

At some point, I learned about the automatic string concatenation that the C preprocessor (?) does. That allows you to break up quoted lines like this:

const char *message = "This is a very long message that is too wide for "
    "my source code editor so I split it up into separate lines.\n";

“Back in the day” if you had C code that went to the next line, you were supposed to put a \ at the end of the line.

if ((something == true) && \
    (somethingElse == false) && \
    (somethingCompletelyDifferent == banana))
{

…but modern compilers do not seem to care about source code line length, so you can usually do this:

printf ("+--------------------+\n"
        "| Welcome to my BBS! |\n"
        "+--------------------+\n"
        "| C)hat    G)oodbye  |\n"
        "| E)mail   H)elp     |\n"
        "+--------------------+\n");

That looks odd if you aren’t aware of it, but makes for efficient code that is easy to read.

However, not all compilers are created equally. A previous job used a compiler that did not allow constant strings any longer than 80 characters! If you did something like this, it would not compile:

printf ("12345678901234567890123456789012345678901234567890123456789012345678901234567890x");

I had to contact their support to have them explain the weird error it gave me. On that compiler, trying to do this would also fail:

printf ("1234567890"
        "1234567890"
        "1234567890"
        "1234567890"
        "1234567890"
        "1234567890"
        "1234567890"
        "1234567890x");

But that is not important to the story. I just mention it to explain that my background as an embedded C programmer has me limited, often, by sub-standard C compilers that do not support all the greatness you might get on a PC/Mac compiler.

These days, I tend to break all my multi-line prints up like that, so the source code resembles the output:

printf ("This is the first line.\n"
        "\n"
        "And we skipped a line above and below.\n"
        "\n"
        "The end.\n");

I know that may look odd, but it visually indicates that there will be a skipped line between those lines of text, where this does not:

printf ("This is the first line.\n\n"
        "And we skipped a line above and below.\n\n"
        "The end.\n");

Do any of you do this?

And, while today any monitor will display more than 80 columns, printers still default to this 80 column text. Sure, you can downsize the font (but the older I get, the less I want to read small print). Some coding standards I have worked under want source code lines to be under 80 characters, which does make doing a printout code review much easier.

And this led me to breaking up long lines like this…

printf ("This is a very long line that is too long for our"
        "80 character printout\n");

That code would print one line of text, but the source is short enough to fit within the 80 column width preferred by that coding standard.

And here is why I hate it…

I have split lines up like this in the past, and created issues when I later tried to find where in the code some message was generated. For example, if I wanted to find “This is a very long line that is too long for our 80 character printout” and searched for that full string, it would not show up. It does not exist in the source code. It has a break in between.

Even searching for “our 80 character” would not be found due to this.

And that’s the downside of what I just presented, and why you may not want to do it that way.

Thank you for coming to my presentation.

Fantastic C buffers and where to find them.

In my early days of learning C on the Microware OS-9 C compiler running on a Radio Shack Color Computer, I learned about buffers.

char buffer[80];

I recall writing a “line input” routine back then which was based on one I had written in BASIC and then later BASIC09 (for OS-9).

Thirty-plus years later, I find I still end up creating that code again for various projects. Here is a line input routine I wrote for an Arduino project some years ago:

LEDSign/LineInput.ino at master · allenhuffman/LEDSign (github.com)

Or this version, ported to run on a PIC24 using the CCS compiler:

https://www.ccsinfo.com/forum/viewtopic.php?t=58430

That routine looks like this:

byte lineInput(char *buffer, size_t bufsize);

In my code, I could have an input buffer, and call that function to let the user type stuff in to it:

char buffer[80];

len = lineInput (buffer, 80); // 80 is max buffer size

Though, when I first learned this, I was always passing in the address of the buffer, like this:

len = lineInput (&buffer, 80); // 80 is max buffer size

Both work and produce the same memory location. Meanwhile, for other variable types, it is quite different:

int x;

function (x);
function (&x);

I think this may be why one of my former employers had a coding standard that specified passing buffers like this:

len = lineInput (&buffer[0], 80); // 80 is max buffer size

By writing it out as “&buffer[0]” you can read it as “the address of the first byte in this buffer. And that does seem much more clear than “buffer” of “&buffer”. Without more context, these don’t tell you what you need to know:

process (&in);
process (in);

Without looking up what “in” is, we might assume it is some numeric type. The first version passes the address in, so it can be modified, while the second version passes the value in, so if it is modified by the function, it won’t affect the variable outside of that function.

But had I seen…

process (&in[0]);

…I would immediately think that “in” is some kind of array of objects – char? int? floats? – and whatever they were, the function was getting the address of where that array was located in memory.

So thank you, C, for giving us multiple ways to do the same thing — and requiring programmers to know that these are all the same:

#include <stdio.h>

void showAddress (void *ptr)
{
    printf ("ptr = %p\n", ptr);
}

int main()
{
    char buffer[80];
    
    showAddress (buffer);
    
    showAddress (&buffer);
    
    showAddress (&buffer[0]);

    return 0;
}

How do you handle buffers? What is your favorite?

Comments welcome…

C escape codes

Now maybe someone here can tell me if this makes any sense:

#define SOME_NAME "SomeName\0"

I ran across something like this in my day job and wondered what the purpose of adding a “\0” zero byte was to the end of the string. C already does that, doesn’t it?

C escape codes

I learned about using backslash to embed certain codes in strings when I was first learning C on my Radio Shack Color Computer. I was using OS-9/6809 and a pre-ANSI K&R C compiler.

I learned about “\n” at the end of a line, and that may be the only one I knew about back then. (I expect even K&R has “\l” and maybe “\t” too, but I never used them in any of my code back then.)

The wikipedia has a handy reference:

Escape sequences in C – Wikipedia

It lists many I was completely unaware of – like “vertical tab.” I’d have to look up what a vertical tab is, as well ;-)

It was during my “modern” career that I learned you could embed any value in a printf by escaping it with “\x” and a hex value:

int main()
{
    const char bytes[] = "\x01\x02\x03\x04\x05";
    
    printf ("sizeof(bytes) = %zu\n", sizeof(bytes));

    for (int idx=0; idx<sizeof(bytes); idx++)
    {
        printf ("%02x ", bytes[idx]);
    }
    
    printf ("\n");

    return EXIT_SUCCESS;
}

This code makes a character array containing the bytes 0x01, 0x02, 0x03, 0x04 and 0x05. A zero follows, added by C to terminate the quoted string. The output looks like:

sizeof(bytes) = 6
01 02 03 04 05 00

I do not know how I learned it, but it was just two jobs ago when I used this to embed a bunch of data in a C program. I believe I was tokenizing some strings to reduce code size, and I had some kind of lookup table of strings, and then the “token” strings of bytes that referred back to the full string. Something like this, except less stupid:

#include <stdio.h>
#include <stdlib.h> // for EXIT_SUCCESS
#include <stdint.h>

const char *words[] =
{
    "I",
    "know",
    "you"
};

const uint8_t sentence[] = "\x01\x02\x03\x02\x01\x02";
int main()
{
    printf ("sizeof(sentence) = %zu\n", sizeof(sentence));

    for (int idx=0; idx<sizeof(sentence)-1; idx++)
    {
        printf ("%s ", words[sentence[idx]-1]);
    }
    
    printf ("\n");

    return EXIT_SUCCESS;
}

In this silly example, I have an array of strings, and then an encoded sentence with bytes representing each word. The encoded bytes will have a 0 at the end, so I use 1 for the first word, and so on, with 0 marking the end of the sequence. But, this example doesn’t actually look for the 0. It just uses the number of bytes in the sentence (minus one, to skip the 0 at the end) via sizeof().

It really should use the 0, so this could be a function. You could pass it the dictionary of words, and the sentence bytes, and let it decode them in a more flexible/modular way:

#include <stdio.h>
#include <stdlib.h> // for EXIT_SUCCESS
#include <stdint.h>

// Dictionary of words
const char *words[] =
{
    "I",
    "know",
    "you"
};

// Encoded sentence
const uint8_t sentence[] = "\x01\x02\x03\x02\x01\x02";

// Decoder
void showSentence(const char *words[], const uint8_t sentence[])
{
    int idx = 0;
    
    while (sentence[idx] != 0)
    {
        printf ("%s ", words[sentence[idx]-1]);
        
        idx++;
    }
    
    printf ("\n");
}

// Test
int main()
{
    printf ("sizeof(sentence) = %zu\n", sizeof(sentence));

    showSentence (words, sentence);

    return EXIT_SUCCESS;
}

But I digress. My point is — I’m still learning things in C, even after knowing it since the late 1980s.

So back to the original question: What is adding a “\0” to a string doing? This is one advantage of using sizeof() versus strlen(). strlen() will stop at the 0, but sizeof() will tell you everything that is there.

#include <stdio.h>
#include <stdlib.h> // for EXIT_SUCCESS
#include <string.h> // for strlen()

int main()
{
    const char string[] = "This is a test.\0And so is this.\0And this is also.";

    printf ("strlen(string) = %zu\n", strlen(string));

    printf ("sizeof(string) = %zu\n", sizeof(string));
    
    return EXIT_SUCCESS;
}

The output:

strlen(string) = 15
sizeof(string) = 50

If you try to printf() that string, it will print only up to the first \0. But, there is more “hidden” data after the zero. If you have the sizeof(), that size could be used in a routine to print everything. But why? We can already do string arrays or just embed carriage returns in a string if we wanted to print multiple lines.

But it’s still neat.

Have you ever done something creating with C escape codes? Leave a comment…

Until then…

C strings and pointers and arrays, revisited…

Previously, I posted more of my “stream of consciousness” ramblings ending this bit of code:

#include <stdio.h>
#include <stdlib.h> // for EXIT_SUCCESS
#include <string.h> // for strlen()

int main()
{
    const char *stringPtr = "hello";
    
    printf ("sizeof(stringPtr) = %ld\n", sizeof(stringPtr));
    printf ("strlen(stringPtr) = %ld\n", strlen(stringPtr));

    printf ("\n");

    const char string[] = "hello";

    printf ("sizeof(string) = %ld\n", sizeof(string));
    printf ("strlen(string) = %ld\n", strlen(string));

    return EXIT_SUCCESS;
}

Sean Patrick Conner commented:

I would expect the following:

sizeof(stringPtr) = 8; /* or 4 or 2, depending upon the pointer size */
strlen(stringPtr) = 5;

sizeof(string) = 6; /* because of the NUL byte at the end */
strlen(string) = 5;

– Sean Patrick Conner

Sean sees things much more clearly than I. When I tried it, I was initially puzzled by the output and had to get my old brain to see the obvious. His comments explain it clearly.

These musings led me to learning about “%zu” for printing a size_t, and a few other things, which I have now posted here in other articles.

I learn so much from folks who take time to post a comment.

More to come…

Yes, Virginia. You CAN printf a size_t! And pointers.

I always learn from comments. Sadly, I don’t mean the comments inside the million lines of code I maintain for my day job — they usually don’t exist ;-)

I have had two previous posts dealing with sizeof() being used on a string constant like this:

#define VERSION "1.0.42-beta"
printf ("sizeof(VERSION) = %d\n", sizeof(VERSION));

Several comments were left to make this more better.

Use %z to print a size_t

The first pointed out that sizeof() is not returning a %d integer:

sizeof does not result in an int, so using %d is not correct.

– F W

Indeed, this code should generate a compiler warning on a good compiler. I would normally cast the sizeof() return value to an int like this:

printf ("sizeof(VERSION) = %d\n", (int)sizeof(VERSION));

BUT, I knew that really wasn’t a solution since that code is not portable. An int might be 16-bits, 32-bits or 64-bits (or more?) depending on the system architecture. I often write test code on a PC using Code::Blocks which uses the GNU-C compiler. On that system, I would need to use “%ld” for a long int. When that code is used on an embedded compiler (such as the CCS compiler for PIC42 chips), I need to make that “%d”.

I just figured printf() pre-dates stuff like that and thus you couldn’t do anything about it.

But now I know there is a proper solution — if you have a compiler that supports it. In the comments again…

… when you want to print a size_t value, using %zu.

– Sean Patrick Conner

Thank you, Sean Patrick Conner! You have now given me new information I will use from now on. I was unaware of %z. I generally use the website www.cplusplus.com to look up C things, and sure enough, on the printf entry it mentions %z — just in a separate box below the one I always look at. I guess I’d never scrolled down.

cplusplus.com/reference/cstdio/printf/

This old dog just learned some new tricks!

int var = 123;

printf ("sizeof(var) = %zu\n", sizeof(var));

Thank you very much for pointing this out to me. Unfortunately, the embedded compiler I use for my day job does not support any of the new stuff, and only has a sub-set of printf, but the Windows compiler I use for testing does.

Bonus: printing pointers for fun and profit

I’d previously ran in to this when trying to print out a pointer:

int main()
{
    char *ptr = 0x12345678;
    
    printf ("ptr = 0x%x\n", ptr);

    return EXIT_SUCCESS;
}

A compiler should complain about that, like this:

warning: format ‘%x’ expects argument of type ‘unsigned int’, but argument 2 has type ‘char *’ [-Wformat=]

…so I’d just do a bit of casting, to cast the pointer to what %x expects:

printf ("ptr = 0x%x\n", (unsigned int)ptr);

BUT, that assumes an “int” is a certain size. This casting might work find on a 16-bit Arduino, then need to be changed for a 32-bit or 64-bit PC program.

And, the same needs to be done when trying to assign a number (int) to a char pointer. This corrects both issues, but does so the incorrect way:

int main()
{
    char *ptr = (char*)0x12345678;

    printf ("ptr = 0x%lx\n", (unsigned long)ptr);

    return EXIT_SUCCESS;
}

First, I had to cast the number to be a character pointer, else it would not assign to “char *ptr” without a warning.

Second, since %x expects an “unsigned int”, and pointers on this sytem are long, I had to change the printf to use “%lx” for a long version of %x, and cast the “ptr” itself to be an “unsigned long”.

Had I written this initially on a system that uses 16-bit ints (like Arduino, PIC24, etc.), I would have had to do it differently, casting things to “int” instead of “long.”

This always drove me nuts, and one day I wondered if modern C had a way to deal with this. And, indeed, it does: %p

This was something that my old compilers either didn’t have, or I just never learned. I only discovered this within the past five years at my current job. It solves the problems by handling a “pointer” in whatever size it is for the system the code is compiled on. AND it even includes the “0x” prefix in the output:

int main()
{
    char *ptr = (char*)0x12345678;

    printf ("ptr = %p\n", ptr);

    return EXIT_SUCCESS;
}

I suppose when I found there was a “real” way to print pointers I should have expected there was also a real way to print size_t … but it took you folks to teach me that.

And I thank you.

Until next time…

C strings and pointers and arrays…

In a previous post about using sizeof() on string literals, there was an interesting comment by S. Enevoldsen:

To better remember this realize that arrays are not pointers, and string literals are arrays (that can decay to pointers).

const char arrayVersion[] = “1.0.42-beta”;
const char* pointerString = “1.0.42-beta”;
printf (“sizeof(arrayVersion) = %d\n”, sizeof(arrayVersion));
printf (“sizeof(pointerString) = %d\n”, sizeof(pointerString));

Outputs

sizeof(arrayVersion) = 12
sizeof(pointerString) = 4

– S. Enevoldsen

If I knew this, I have long forgotten it. Over the years at my “day jobs” I have gotten used to making string pointers like this:

const char *versionStringPtr = "1.0.42-beta";

I generally add the “Ptr” at the end to remind me (or other programmers) that it is a pointer to a string. In my mind, I knew I could have done “char *string” or “char string[]” and gotten the same use from normal code, but I do not recall if I knew they were treated differently.

What do you expect the output of this to be?

#include <stdio.h>
#include <stdlib.h> // for EXIT_SUCCESS
#include <string.h> // for strlen()

int main()
{
    const char *stringPtr = "hello";
    
    printf ("sizeof(stringPtr) = %ld\n", sizeof(stringPtr));
    printf ("strlen(stringPtr) = %ld\n", strlen(stringPtr));

    printf ("\n");

    const char string[] = "hello";

    printf ("sizeof(string) = %ld\n", sizeof(string));
    printf ("strlen(string) = %ld\n", strlen(string));

    return EXIT_SUCCESS;
}

Output would show … what?

sizeof(stringPtr) = ???
strlen(stringPtr) = ???

sizeof(string) = ???
strlen(string) = ???

To be continued…

In C, you can sizeof() a string constant?

Updates:

  • 2024-08-27 – Adding a note about strlen()/sizeof() that was mentioned by Dave in the comments.

I am used to using sizeof() to know the size of a structure, or size of a variable…

typedef struct {
   char a;
   short b;
   int c;
   long d;
} MyStruct;

printf ("sizeof(MyStruct) is %d\n", sizeof(MyStruct));

MyStruct foo;
printf ("sizeof(foo) is %d\n", sizeof(foo));

…but every time I re-learn you can use it on strings, I am surprised:

#include <stdio.h>

#define VERSION_STRING __DATE__" "__TIME__

int main()
{
    printf ("Build: %s\n", VERSION_STRING);

    printf ("sizeof(): %ld\n", sizeof(VERSION_STRING));

    return 0;
}

Normally, I see strlen() used, and that works for a string that is in a buffer, or a constant string:

#define VERSION_STRING "1.0.42-beta"
const char versionString[] = "1.0.42-beta";

printf ("strlen(VERSION_STRING) = %d\n", strlen(VERSION_STRING));

printf ("strlen(versionString) = %d\n", strlen(versionString));

…but if you know it is a #define string constant, you can use sizeof() and that will be changed in to the hard-coded value that matches the length of that hard-coded string. This will be smaller code, and faster, since strlen() has to scan through the string memory looking for the ‘0’ at the end, counting along the way.

I wonder how many times I have posted about this over the years.

Additional Notes:

In the comments, Dave added:

sizeof a string literal includes the terminating nul character, so it will be strlen +1.

– Dave

Ah, yes – a very good thing to note. C strings have a 0 byte added to the end of them, so “hello” is really “hello\0”. The standard C string functions like strcpy(), strlen(), etc. look for that 0 to know when to stop.

#include <stdio.h>
#include <stdlib.h> // for EXIT_SUCCESS
#include <string.h> // for strlen()

#define STRING "hello"

int main()
{
    printf ("sizeof(STRING) = %ld\n", sizeof(STRING));
    
    printf ("strlen(STRING) = %ld\n", strlen(STRING));

    return EXIT_SUCCESS;
}

Output would show:

sizeof(STRING) = 6
strlen(STRING) = 5

So if using sizeof() to memcpy() bytes somewhere without the overhead of a strlen() counting first, you’d really want something like…

memcpy (buffer, STRING, sizeof(STRING)-1);

Until next time…

C and returning values quickly or safely. But not both.

WARNING: This article contains a C coding approach that many will find uncomfortable.

In my day job as a mild-mannered embedded C programmer, I am usually too busy maintaining what was created before me to be creating something new for others to maintain after me. There was that one time I had two weeks that were very different, and fun, since they were almost entirely spent “creating” versus “maintaining.”

Today’s quick C tidbit is about getting parameters back from a C function. In C, you only get one thing back — typically a variable type like an int or float or whatever:

int GetTheUltimateAnswer()
{
    return 42;
}

int answer = GetTheUltimateAnswer();
print ("The Ultimate Answer is %d\n", answer);

If you need more than one thing returned, it is common to pass in variables by reference (the address of, or pointer to, the variable in memory) and have the function modify that memory to update the variables:

void GetMinAndMax (int *min, int *max)
{
    *min = 0;
    *max = 100;
}

int min, max;
GetMinAndMax (&min, &max)
printf ("Min is %d and Max is %d\n", min, max);

The moment pointers come in to play, things get very dangerous. But fast.

When passing values in, they get copied in to a new variable:

int variable = 42;

printf ("variable = %d\n", variable);
Function (variable);
printf ("variable = %d\n", variable);

void Function (int x)
{
    x = x + 1;
}

Try it: https://onlinegdb.com/WC3ihCAuj

Above, Function() gets a new variable (called “x” in this case) with the value of the variable that was passed in to the call. The function is like Las Vegas. Anything that happens to that variable inside the function stays inside the function – the variable disappears at the end of the function, while the original variable remains as-was.

C++ changes this, I have learned, so you can pass in variables that can be modified, but I am not a C++ programmer so this post is only about old-skool C.

Pointing to a variable’s memory

By passing in the address of a variable, the function can go to that memory and modify the variable. It will be changed:

int variable = 42;

printf ("variable = %d\n", variable);
Function (&variable);
printf ("variable = %d\n", variable);

void Function (int *x)
{
    *x = *x + 1;
}

Try it: https://onlinegdb.com/Y2Z9WUvFG

Passing by value is slower, since a new variable has to be created. Passing by reference just passes an address and the code uses that address – no new variable is created.

But, using a reference for just for speed is dangerous because the function can modify the variable even if you didn’t want it to. Consider passing in a string buffer, which is a pointer to a series of character bytes:

void PrintError (char *message)
{
    print ("ERROR: %s\n", message);
}

PrintError ("Human Detected");

We do this all the time, but since PrintError() has access to the memory passed in, it could try to modify it. If we passed in a constant string like “Human Detected”, that string would typically be in program memory (though this is not true for Harvard Architecture systems like the PIC and Arduino). At best, an operating system with memory protection would trap that access with an exception and kill the program. At worst, the program would self-modify (which was the case when I learned this on OS-9/6809 back in the late 80s — no memory protection on my TRS-80 CoCo!).

void PrintError (char *message)
{
    message[0] = 42;
}

PrintError ("Human Detected");

Above would likely crash, though if the user had passed in the buffer holding a string, it would just be modified:

void PrintError (char *message)
{
    message[0] = 42;
}

char buffer[80];
strncpy (buffer, "Hello, world!", 80);
printf ("buffer: %s\n", buffer);
PrintError (buffer);
printf ("buffer: %s\n", buffer);

Try it: https://onlinegdb.com/L50JRWYj

And your point is?

My point is — there are certainly times when speed is the most important thing, and it outweighs the potential problems/crashes that could be caused by a bug with code using the pointer. Take for example anything that passes in a buffer:

void UppercaseString (char *buffer)
{
    for (int idx=0; idx<strlen(buffer); idx++)
    {
        buffer[i] = toupper(buffer[I])
    }
}

There are many bad things that could happen here. By using “strlen”, the buffer MUST be a string that has a NIL (‘\0’) byte at the end. This routine could end up trampling through memory uppercasing bytes that are beyond the caller’s string.

It is wise to always add another parameter that is the max size of the buffer:

void UppercaseString (char *buffer, int bufferSize)
{
    for (int idx=0; idx<bufferSize; idx++)
    {
        buffer[i] = toupper(buffer[I])
    }
}

That helps. But it is still up to the compiler to catch the wrong type of pointer being passed in.

int Number = 10;

UppercaseString (&Number, 100);

The compiler should not let you do that, but some may just issue a warning and build it anyway. (This is why I always try to have NO warnings in my code. The more warnings there are, the more likely you will start ignoring them.)

Try #1: Passing by Reference

Suppose we have a function that returns the date and time as individual values (year, month, day, hour, minute and second). Since we cannot get six values back from a function, we first try passing in six variables by reference and having the routine modify them:

void GetDateTime1 (int *year, int *month, int *day,
                   int *hour, int *minute, int *second)
{
    *year = 2023;
    *month = 8;
    *day = 19;
    *hour = 4;
    *minute = 20;
    *second = 0;
}

int year, month, day, hour, minute, second;
GetDateTime1 (&year, &month, &day, &hour, &minute, &second);
printf ("GetDateTime1: %d/%d/%d %02d:%02d:%02d\n",
        year, month, day, hour, minute, second);

That works fine … as long as you know the parameters are “ints” (whatever that is) and not shorts or longs or any other numeric type. This, for example, would be bad:

short year, month, day, hour, minute, second;

GetDateTime1 (&year, &month, &day, &hour, &minute, &second);

Above, we are passing in a short (let’s say that is a 16-bit variable on this system) in to a function that expects an int (let’s say that is a 32-bit signed variable on this system). The function would try to place 32-bits of information at the address of a 16-bit value.

Bad things, as they say, can happen.

Try #2: Passing a structure by reference

Passing in six variable pointers is more work than passing in one, so if we put the values in a structure we could pass in just the pointer to that structure. This has the benefit of making sure the structure is only loaded with values it can handle (unlike passing in an address of something that might be 8, 16, 32 or 64 bits).

typedef struct
{
    int year;
    int month;
    int day;
    int hour;
    int minute;
    int second;
} TimeStruct;

void GetDateTime2 (TimeStruct *timePtr)
{
    timePtr->year = 2023;
    timePtr->month = 8;
    timePtr->day = 19;
    timePtr->hour = 4;
    timePtr->minute = 20;
    timePtr->second = 0;   
}

TimeStruct time;
GetDateTime2 (&time);
printf ("GetDateTime2: %d/%d/%d %02d:%02d:%02d\n",
        time.year, time.month, time.day,
        time.hour, time.minute, time.second);

This should greatly reduce the potential problems since you only have one pointer to screw up, and if you get the type correct (a TimeStruct) the values it contains should be fine since the compiler takes care of trying to set a “uint8_t” to “65535” (a warning, hopefully, and storing 8-bits of that 16-bit value as a “loss of precision”).

Try #3: Returning the address of a static

An approach various standard C library functions take is having some fixed memory allocated inside the function as a static variable, and then returning a pointer to that memory. The user doesn’t make it and therefore isn’t passing in a pointer that could be wrong.

TimeStruct *GetDateTime3 (void)
{
    static TimeStruct s_time;
    
    s_time.year = 2023;
    s_time.month = 8;
    s_time.day = 19;
    s_time.hour = 4;
    s_time.minute = 20;
    s_time.second = 0;

    return &s_time;
}

TimeStruct *timePtr;
timePtr = GetDateTime3 ();  
printf ("GetDateTime3: %d/%d/%d %02d:%02d:%02d\n",
       timePtr->year, timePtr->month, timePtr->day,
       timePtr->hour, timePtr->minute, timePtr->second);

This approach is better, since it gets the speed from using a pointer, and the safety of not being able to get the pointer wrong since the function tells you where it is, not the other way around.

BUT … once you have the address of that static memory, you can modify it.

TimeStruct *timePtr;
timePtr = GetDateTime3 ();
timePtr->year = 1969;

In a real Date/Time function (like the one in the C library), those variables are populated with the system time when you call the function, so even if the user changed something like this, it would be set back to what it was the next time it was called. But, I can see where there could be issues with other types of functions that just hold on to memory like this.

Plus, it’s always holding on to that memory whether anyone is using it or not. That is a no-no when working on memory constrained systems like an Arduino with 4K of RAM.

Try #4: Returning a copy of a structure

And now the point of today’s ramblings… I rarely have used this, since it’s probably the slowest way to do things, but … you don’t just have to return a date type like and int or a bool or a pointer. You can return a structure, and C will give the caller a copy of the structure.

TimeStruct GetDateTime4 (void)
{
    TimeStruct time;
    
    time.year = 2023;
    time.month = 8;
    time.day = 19;
    time.hour = 4;
    time.minute = 20;
    time.second = 0;

    return time;
}

TimeStruct time;
time = GetDateTime4 ();    
printf ("GetDateTime4: %d/%d/%d %02d:%02d:%02d\n",
       time.year, time.month, time.day,
       time.hour, time.minute, time.second);

Above is possibly the safest way to return data, since no pointers are used. The called makes an new structure variable, and then the function creates a new structure variable and the return copies that structure in to the caller’s structure.

Try it: https://onlinegdb.com/F6rR1V-xb

This is slower, and consumes more memory during the process of making all these copies, BUT it’s far, far safer. Even ChatGPT agrees that, if going to “safe” code, this is the better approach.

And, at my day job, I experimented with this and it’s been working very well. It’s about the closest thing C has to “objects”. I even use it for a BufferStruct so I can pass a buffer around without using a pointer (though internally there is a pointer to the buffer memory). It looks something like this:

#include <stdio.h>
#include <string.h>

typedef struct
{
    char buffer[80];
    char bufferSize;
} BufferStruct;

BufferStruct GetBuffer ()
{
    BufferStruct buf;
    
    strncpy (buf.buffer, "Hello, world!", sizeof(buf.buffer));
    buf.bufferSize = strlen(buf.buffer);
    
    return buf;
}

void ShowBuffer (BufferStruct buf)
{
    printf ("Buffer: %s\n", buf.buffer);
    printf ("Size  : %d\n", buf.bufferSize);
}

int main()
{
    BufferStruct myBuffer;
    myBuffer = GetBuffer ();
    ShowBuffer (myBuffer);

    BufferStruct testBuffer;
    strncpy (testBuffer.buffer, "I put this in here",
             sizeof(testBuffer.buffer));
    testBuffer.bufferSize = strlen (testBuffer.buffer);
    ShowBuffer (testBuffer);
    
    return 0;
}   

The extra overhead may be a problem if you are coding for speed, but doing this trick (while trying not to think about all the extra work and copying the code is doing) gives you a simple way to pass things around without ever using a pointer. You could even do this:

typedef struct
{
    int year;
    int month;
    int day;
    int hour;
    int minute;
    int second;
} TimeStruct;

// Global time values.
int g_year, g_month, g_day, g_hour, g_minute, g_second;

void SetTime (TimeStruct time)
{
    // Pretend we are setting the clock.
    g_year = time.year;
    g_month = time.month;
    g_day = time.day;
    g_hour = time.hour;
    g_minute = time.minute;
    g_second = time.second;
}

TimeStruct GetTime ()
{
    TimeStruct time;

    // Pretend we are reading the clock.
    time.year = g_year;
    time.month = g_month;
    time.day = g_day;
    time.hour = g_hour;
    time.minute = g_minute;
    time.second = g_second;

    return time;
}

TimeStruct time;

time.year = 2023;
time.month = 8;
time.day = 19;
time.hour = 12;
time.minute = 4;
time.second = 20;
SetTime (time);

...

time = GetTime ();

And now a certain percentage of C programmers who stumble in to this article should be having night terrors at what is going on here.

Until next time…