Category Archives: C Programming

Generating C functions and prototypes using macros – part 2

See Also: part 1 and part 2.

In the previous installment, I discussed how lazy I am and shared my general dislike of doing things manually when they can be automated.

So let’s automate…

If you program in C, you are likely familiar with using #define to set a value you can use elsewhere in the code:

#define NUMBER_OF_GUESSES 10

int answer = RandomNumber (100);
for (int try=1; try<=NUMBER_OF_GUESSES; try++)
{
    printf ("Try #%d of %d - guess a number from 1-100: ",
             try, NUMBER_OF_GUESSES);

    guess = InputNumber ();

    // ... logic continues ...
}

Now instead of updating multiple places in the file for the number of guesses, only the #define has to be changed. The #define has the advantage of not taking extra code or memory space like a variable would, which is important when you are working on embedded systems with 8K of RAM.

You probably have also seen #defines used for marking code to be included or not included in a program:

#define DEBUG_MODE

void Function ()
{
#if defined(DEBUG_MODE)
    printf ("Inside Function()\n");
#endif

    // ...logic continues...
}

WIth “#define DEBUG_MODE” there, the printf() will be included. Remove that #define (or comment it out) and it will not.

But #defines can also become macros with parameters, such as this:

#define SNOOZE(x) SleepMs(x*1000)

If you want to sleep for seconds, you could use a macro that turns into the call to the millisecond sleep with the passed-in value multiplied by 1000:

void Function ()
{
    printf ("Pausing for 5 seconds...\n");

    SNOOZE (5);

    // ...logic continues...
}

The C preprocessor will take that “5” and substitute where the “x” is in the replacement text, becoming:

void Function ()
{
    printf ("Pausing for 5 seconds...\n");

    SleepMs (5*1000);

    // ...logic continues...
}

Now if the code is ported to a system with a different sleep call, the #define can be changed to use whatever is available.

I’d expect anyone who has programmed in C has done one or all of these things.

But wait, there’s more!

But a macro does not have to be a simple number, string or function name. It can be a whole block of code that gets substituted. You can put in many lines, just by adding a “\” at the end of the line to continue parsing the next line after it:

#define DISPLAY_COUNTDOWN(x) \
    for (int idx=x; idx>=0; idx++) \
    { \
        printf ("%d...", idx); \
        sleep (1000); /* 1000ms, 1 second sleep */
    }

void Function ()
{
    printf ("Self-destruct activated...\n");

    DISPLAY_COUNTDOWN (10);

    // ...logic continues ...
}

And that would be processed to replace the “DISPLAY_COUNTDOWN(10)” with the C code in the #define:

void Function ()
{
    printf ("Self-destruct activated...\n");

    for (int idx=x; idx>=0; idx++) {         printf ("%d...", idx);         sleep (1000); /* 1000ms, 1 second sleep */     }

    // ...logic continues ...
}

Yeah, it would look ugly if you could see how the C preprocessor puts it in, but it builds and runs and you never see it (unless you specifically look at preprocessed output files).

But that is probably dumb. You should just make a “DisplayCountdown()” function and have it be more normal.

But wait, there’s less dumb…

But in the case of my panel functions, each one of them had a unique panel name and panel identifier, so using a function for them was not really possible. Each one had to be its own function since the functions contained the name of the panel (“PanelMainInit()”, “PanelMainTerm()”, etc.).

But a #define can do that…

#define GENERATE_PANEL_PROTOTYPES(panel_name) \
    int panel_name##Init (void); \
    int panel_name##GetHandle (void); \
    int panel_name##Display (void); \
    int panel_name##Hide (void); \
    int panel_name##Term (void);

The macro uses “panel_name” as the substitution “variable” passed in, and will place whatever text is there anywhere in the macro where “panel_main” appears. Since I wanted to pass in the filename (without extension) of the panel such as “PanelMain” or “PanelFaults”) and build a function name out of it, I use the ## concatenate feature that will glue the items before and after it together. That macro used like this:

GENERATE_PANEL_PROTOTYPES(PanelMain)

GENERATE_PANEL_PROTOTYPES(PanelFaults)

GENERATE_PANEL_PROTOTYPES(PanelAdmin)

…effectively generates the prototypes like this:

int PanelMainInit (void);
int PanelMainGetHandle (void);
int PanelMainDisplay (void);
int PanelMainHide (void);
int PanelMainTerm (void);

int PanelFaultsInit (void);
int PanelFaultsGetHandle (void);
int PanelFaultsDisplay (void);
int PanelFaultsHide (void);
int PanelFaultsTerm (void);

int PanelAdminInit (void);
int PanelAdminGetHandle (void);
int PanelAdminDisplay (void);
int PanelAdminHide (void);
int PanelAdminTerm (void);

…though it actually looks like one long run-one line for each one if you looked at the pre-processed C output, but the result is the same.

A similar macro could generate the actual functions:

#define STRINGIFY(x) #x
#define TOSTRING(x) STRINGIFY(x)

#define GENERATE_PANEL_FUNCTIONS(panelName, panelResourceID) \
    static int S_##panelName##Handle = 0; /* Zero is not a valid panel handle. */ \
    \
    int panelName##Init (void) \
    { \
        int panelHandle = 0; \
        if (S_##panelName##Handle <= 0) \
        { \
            panelHandle = LoadPanel (0, TOSTRING(panelName)".uir", panelResourceID); \
            if (panelHandle > 0) \
            { \
                S_##panelName##Handle = panelHandle; \
                \
                panelName##UserInit (panelHandle); \
            } \
        } \
        else \
        { \
            panelHandle = S_##panelName##Handle; \
        } \
        return panelHandle; \
    } \
    \
    int panelName##GetHandle (void) \
    { \
         return panelName##Init (); \
    } \
    \
    int panelName##Display (void) \
    { \
        int status = UIEHandleInvalid; \
        int panelHandle = panelName##Init (); \
        if (panelHandle > 0) \
        { \
            status = DisplayPanel (panelHandle); \
        } \
        return status; \
    } \
    \
    int panelName##Hide (void) \
    { \
        int status = UIEHandleInvalid; \
        if (S_##panelName##Handle > 0) \
        { \
            status = HidePanel (S_##panelName##Handle); \
        } \
        return status; \
    } \
    \
    /* Unload the panel, if valid. */ \
    int panelName##Term (void) \
    { \
        int status = UIEHandleInvalid; \
        if (S_##panelName##Handle > 0) \
        { \
            status = DiscardPanel (S_##panelName##Handle); \
            if (status == UIENoError) \
            { \
                S_##panelName##Handle = 0; \
            } \
        } \
        return status; \
    }

That macro would be used like this:

GENERATE_PANEL_FUNCTIONS(PanelMain, PANEL_MAIN)

GENERATE_PANEL_FUNCTIONS(PanelFaults, PANEL_FAULTS)

GENERATE_PANEL_FUNCTIONS(PanelAdmin, PANEL_ADMIN)

…and it would create a fully populated set of functions for those panels.

This allowed me to have a header file that had those macros, such as “PanelMacros.h”, and then have a .c and .h for each panel, or one big file that had them all in it.

// Panels.h
GENERATE_PANEL_PROTOTYPES(PanelMain);
GENERATE_PANEL_PROTOTYPES(PanelFaults);
GENERATE_PANEL_PROTOTYPES(PanelAdmin);

// Panels.c
GENERATE_PANEL_FUNCTIONS(PanelMain, PANEL_MAIN)
GENERATE_PANEL_FUNCTIONS(PanelFaults, PANEL_FAULTS)
GENERATE_PANEL_FUNCTIONS(PanelAdmin, PANEL_ADMIN)

And it worked great! And, if I later decided I wanted to add debugging output or something else, instead of editing one hundred different panel functions I could just modify the macro. For example:

#define GENERATE_PANEL_FUNCTIONS(panelName, panelResourceID) \
    static int S_##panelName##Handle = 0; /* Zero is not a valid panel handle. */ \
    \
    int panelName##Init (void) \
    { \
        int panelHandle = 0; \
        if (S_##panelName##Handle <= 0) \
        { \
            panelHandle = LoadPanel (0, TOSTRING(panelName)".uir", panelResourceID); \
            if (panelHandle > 0) \
            { \
                DebugPrintf ("Panel %s loaded.\n", TOSTRING(panelName)); \
                S_##panelName##Handle = panelHandle; \
                \
                panelName##UserInit (panelHandle); \
            } \
        } \
        else \
        { \
            DebugPrintf ("Panel %s already initialized.\n", TOSTRING(panelName)); \

There are a few things to unpack in this example, such as the use of macros STRINGIFY(x) and TOSTRING(x), but those probably could be their own blog post.

Anyway, if you are lazy, and faced with generating dozens or hundreds of almost identical functions, this macro approach can save a ton of time. The macros I made for my original project, dealing with message functions, are vastly more complex than these, but I figured if I started with those most would run away screaming. (I know I sure would if I had been presented them by a coworker.)

I am sure there will be more to say about this, so perhaps a part 3 will show up.

Until then, I’d love to hear what an experienced C macro programmer has to say about this. I bet there are some better techniques and things I am completely unaware of. I’d love it if you’d share.

Thanks…

Addendum: Since I began writing this post, I have converted about 50 panels at work using a much more complex set of #define macros. They keep evolving as I needed to add support for “parent/child” panels, or extra debugging, or even new functions to check if a panel is displayed at the moment. All I did was update the macro, and the next build could use the new functions. I expect it has already saved me days of typing…

Generating C functions and prototypes using macros – part 1

See Also: part 1 and part 2.

There is always a first time for everything, and my first time doing this was a few years ago with a day job task. I was going to create a Windows DLL (dynamic link library) that would handle hundreds of messages (write/read response) via the I2C protocol.

The message protocol featured a header that contained a Message Type and Command Code. There were several types of messages, but for this blog post let’s just look at two types:

  • DATA – a message sent to do something, with our without payload data. These messages get a response back that is either an ACKnowledgement (it worked) or a NAK (it did not work). For example “set the time to X”.
  • QUERY – a message sent to request information, such as “what time is it?”

The Command Code was a byte, which meant there could be up to 256 (0-255) commands per message type. For example, some DATA messages might be defined like this:

#define DATA_PING        0 // ACK if there, NAK if not
#define DATA_RESET_CPU   1 // Reset CPU
#define DATA_EXPLODE     2 // Cause the penguin on the TV set to explode.

And QUERY messages might be like:

#define QUERY_STATUS      0 // Returns a status response
#define QUERY_VOLTAGE     1 // Returns the voltage
#define QUERY_TEMPERATURE 2 // Returns the temperature

These are, of course, made up examples, but you get the idea.

We had a number of different board types on our system that could receive these messages. Some messages were only applicable to specific boards (like, you couldn’t get temperature from a board that did not have a thermometer circuit or whatever). My idea was to create simple C functions that represented the message sent to the specific board, like this:

resp = AlphaBoardPING ();
resp = BetaBoardPING ();
resp = DeltaBoardPING ();
...
resp = BetaBoardRESET_CPU ();
...
resp = DeltaBoardQUERY_STATUS (...);

…and so on. I thought it would make a super simple way to write code to send messages and get back the status or response payload or whatever.

This is what prompted me to write a post about returning values as full structures in C. That technique was used there, making it super simple to use (and no chance of a wrong pointer crashing things).

I experimented with these concepts on my own time to make sure this idea would work. Some of the things I did ended up on my GitHub page:

  • https://github.com/allenhuffman/GetPutData – routines that let you put bytes or other data types in a buffer. Code similar to this was used to create the message bytes (header, payload, checksum at the end, etc.)
  • https://github.com/allenhuffman/StructureToFromBuffer – routines that would let me define how bytes in a buffer should be copied into a C structure. I was proud of this approach since it let me pass a pointer to a buffer of bytes along with some tables and have it return a fully populated C structure ready to use. This greatly simplified the amount of work needed to use these messages.

But, as I like to point out, I am pretty lazy and hate doing so much typing. The thought of creating hundreds of functions by hand was not how I wanted to spend my time. Instead, I wanted to find a way to automate the creation of these functions. After all, they all followed the same pattern:

  • Header – containing Message Type, Command Code, number of byte in a payload, etc.
  • Payload – any payload bytes
  • Checksum – at the end

The only thing custom would be what Message Type and Command Code to put in, and if there was a payload to send, populating those bytes with the appropriate data.

When a response was received, it would be parsed based on the Message Type and Command Code, and return a C structure matching the response payload.

A program to write a program?

Initially, I thought about making a program or script that would spit out hundreds of functions. But, this not lazy enough. Sure, I could have done this and created hundreds of functions, but what if those functions needed a change later? I’d have to update the program that created the functions and regenerate all the functions all over again.

There has to be a better lazier way.

Macros: a better lazier way

I realized I could make a set of #define macros that could insert proper C code or prototypes. Then, if I ever needed to change something, I only had to change the macro. There would be no regeneration needed, since the next compile would use the updated macro. Magic!

It worked very well, and created hundreds and hundreds of functions without me ever having to type more than the ones in the macro.

It worked so well that I ended up using this approach very recently for another similar task I was far too lazy to do the hard way. I thought I would share that much simpler example in case you are lazy as well.

A much simpler example

At work we use LabWindows/CVI, a Windows C compiler with its own GUI. It has a GUI editor where you create your window with buttons and check boxes and whatever, then you use functions to load the panel, display it, and hide it when done. They look like this:

int panelHandle = LoadPanel (0, "PanelMAIN.uir", PANEL_MAIN);

DisplayPanel (panelHandle);

// Do stuff...

HidePanel (panelHandle);

DiscardPanel (panelHandle);

Then, when you interact with the panel, you have callback functions (if the user clicks the “OK” button, it jumps to a function you might name “UserClickedOKButtonCallback()” or whatever.

If you need to manipulate the panel, such as changing the status of a Control (checkbox, text box, or whatever), you can set Values or Attributes of those Controls.

SetCtrlVal (panelHandle, PANEL_MAIN_OK_BUTTON, 1);
SetCtrlAttribute (panelHandle, PANEL_MAIN_CANCEL_BUTTON, ATTR_DIMMED, 1);

It is a really simple system and one that I, as a non-windows programmer who had never worked with GUIs before, was able to pick up and start using quickly.

Simplicity can get complicated quickly…

One of the issues with this setup is that you had to have the panel handle in order to do something. If a message came in from a board indicating there was a fault, that code might need to toggle on some “RED LED” graphics on a GUI panel to indicate the faulted condition. But, that callback function may not have any of the panel IDs. The designed created a lookup function to work around this:

int mainPanelHandle = LookUpPanelHandle(MAIN_PANEL);
SetCtrlVal (mainPanelHandle, PANEL_MAIN_FAULT_LED, 1);

A function similar to that was in the same C file where all the panels were loaded. Their handles saved in variables, then the LookUp function would go through a huge switch/case with special defines for every panel and return the actual panel handle that matched the define passed in.

It worked great but it was slower since it had to scan through that list every time we wanted to look up a panel. At some point, all the panel handles were just changed to global variables so they could be accessed quickly without any lookup:

SetCtrlVal (g_MainPanelHandle, PANEL_MAIN_FAULT_LED, 1);

This also worked great, but did not work from threads that did not have access to the main GUI context. Since I am not a Windows programmer, and have never used threads on any embedded systems, I do not actually understand the problem (but I hear there are “thread safe” variables that can be used for this purpose).

Self-contained panel functions for the win!

Instead of learning those special “thread safe” techniques, I decided to create a set of self-contained panel functions so you could do things like this:

int mainPanelHandle = PanelMainInit (); // Load/init the main panel.
PanelMainDispay (); // Display the panel.
SetCtrlVal (mainPanelHandle, PANEL_MAIN_FAULT_LEFT, 1);
...
PanelMainHide ();
...
PanelMainTerm (); // Unload main panel and release the memory.

When I needed to access a panel from another routine, I would use a special function that returned the handle:

int panelMainHandle = PanelMainGetHandle ();
SetCtrlVal (mainPanelHandle, PANEL_MAIN_FAULT_LEFT, 1);

I even made these functions automatically Load the panel if needed, meaning a user could just start using a panel and it would be loaded on-demand if was not already loaded. Super nice!

Here is a simple version of what that code looks like:

static int S_panelHandle = 0; // Zero is not a valid panel handle.

int PanelMainInit (void)
{
    int panelHandle = 0;

    if (S_panelHandle <= 0) // Zero is not a valid panel handle. 
    {
        panelHandle = LoadPanel (0, "PanelMAIN.uir", PANEL_MAIN);

        // Only set our static global if this was successful.
        if (panelHandle > 0) // Zero is not a valid panel handle. 
        {
            S_panelHandle = panelHandle; 
        }
    }
    else // S_panelHandle was valid.
    {
        panelHandle = S_panelHandle;
    }	
    
    // Return handle or status in case anyone wants to error check.
    return panelHandle;
}

int PanelMainGetHandle (void)
{
    // Return handle or status in case anyone wants to error check.
    return PanelMainInit ();
}

int PanelMainTerm (void)
{
    int status = UIEHandleInvalid;

    if (S_panelHandle > 0) // Zero is not a valid panel handle.
    {
        status = DiscardPanel (S_panelHandle);
        if (status == UIENoError)
        {
            S_panelHandle = 0; // Zero is not a valid panel handle. 
        }
    }

    // Return status in case anyone wants to error check.
    return status;
}

int PanelMainDisplay (void)
{
    int status = UIEHandleInvalid;
    int panelHandle;
	
    panelHandle = PanelMainInit (); // Init if needed.
	
    if (panelHandle > 0) // Zero is not a valid panel handle.
    {
        status = DisplayPanel (panelHandle);
    }
	
    // Return status in case anyone wants to error check.
    return status;
}

int PanelMainHide (void)
{
    int status = UIEHandleInvalid;

    if (S_panelHandle > 0) // Zero is not a valid panel handle.
    {
        status = HidePanel (S_panelHandle);
    }
	
    // Return status in case anyone wants to error check.
    return status;
}

This greatly simplified dealing with the panels. Now they could “just be used” without worrying about loading, etc. There was no long Look Up table, and no global variables. The only places the panel handles were kept was inside the file where the panel’s functions were.

Nice and simple, and it worked even greater than the first two attempts.

…until you have to make a hundred of these functions…

…and then decide you need to change something and have to make that change in a hundred functions.

My solution was to use #define macros to generate the code and prototypes, then I would only have to change the macro to alter how all the panels works. (Spoiler: This worked even greater than the previous greater.)

In part 2, I will share a simple example of how this works. If you are lazy enough, you might actually find it interesting.

Until then…

C, I can be taught! At least about calloc.

A long, long time ago, I learned about malloc() in C. I could make a buffer like this:

char *buffer = malloc (1024);

…use it, then release it when I was done like this:

free (buffer);

I have discussed malloc here in the past, including a recent post about an error checking malloc I created to find a memory leak.

But today, the Bing CoPilot A.I. suggested I use calloc instead.

And I wasn’t even sure I even remembered this was a thing. And it has been there since before C was standardized…

void* calloc (size_t num, size_t size);

malloc() will return a block of memory and, depending on the operating system and implementation in the compiler, that memory may have old data in it. Hackers were known to write programs that would allocate blocks of memory then inspect it to see what they could find left over from another program previously using it.

Hey, everybody’s gotta have a hobby…

calloc() is a “clear allocation” where it will initialize the memory it returns to zero before returning it to you. It also takes two parameters instead of just one. While malloc() wants to know the number of bytes you wish to reserve, calloc() wants to how know many things (bytes, structures, etc.) you want to reserve, and the size of each thing.

To allocate 1024 bytes using calloc() you would use:

char *buffer = calloc (1, 1024);

…and get one thing of 1024 bytes. Or maybe you prefer reversing that:

char *buffer = calloc (1024, 1);

…so you get 1024 things that are 1 byte each.

Either way, what you get back is memory all set to zeros.

calloc() was suggested by the A.I. because I was allocating a set of structures like this:

// Typedefs
typedef struct
{
    int x;
    int y;
    int color;
} MyStruct;

MyStruct *array = malloc (sizeof(MyStruct) * 42);

The A.I. saw that, and suggested calloc() instead, like this:

MyStruct *array = calloc (42, sizeof(MyStruct));

I do think that looks a bit cleaner and more obvious, if you are familiar with calloc(), and as long as you don’t need the extra speed (setting that memory to 0 should take more time than not doing that), it seems like something to consider.

And maybe that will break me (and other programmers who wrote code before me that I may one day maintain) from doing it manually like…

char *ptr = malloc (1024);
memset (ptr, 0x0, 1024);

I wonder if I will even remember this the next time I need to malloc() something.

I mean calloc() something.

Whatever.

Until then…

C and VLAs (Variable Length Arrays)

When you are old (or “experienced” if you prefer), you begin to realize how much of what you learned is wrong. Even if it was “right” when you learned it. I think of all peers that went through computer courses at colleges back in the late 1980s or 1990s, learning now-obsolete languages and being taught methods and approaches that are today considered wrong.

When I learned C, it was on a pre-ANSI K&R C compiler. I learned it on my Radio Shack Color Computer 3 under the OS-9 operating system, with assistance from a friend of mine who had learned C on his Commodore Amiga.

I had alot of new things to learn in 1995 when I took a job with Microware Systems Corporation (creator of OS-9 and the K&R compiler I had learned on). Their Ultra-C compiler was an ANSI compiler, and it did things quite different.

In that era of the C89/C90 standard, arrays were just arrays and we liked it that way:

int array[42];

if you wanted things to be more flexible, you had to malloc() memory yourself.

int *array = malloc (sizeof(int)*42);

…and remember to stay within your boundaries and clean up/free that memory when you were done with it.

But C99 changed this, somewhat, with the introduction of VLAs (Variable Length Arrays). Now you could declare an array using a variable like this:

int x=42;

int array(x);

Neat. I do not think I have ever used this. One downside is you cannot do this with static variables, since those are created/reserved at compile time. But it is still neat.

But today I learned, you couldn’t rely on VLA is you were using C11. Apparently, they became optional that year. A compiler would define a special define if it did not support them:

__STD_NO_VLA__

But at least for twelve years of the standard, you could rely on them, before not being able to rely on them.

And then C23 happened, which I just learned made VLAs mandatory again.

So, uh, I guess if you have the latest and greatest, you can use them. For now. Until some future change makes them option again. Or removes them. Or whatever.

Still neat.

But I doubt any of the embedded C compilers I use for my day job support them.

A safer memcpy with very limited use cases

Here is a quick one… At my day job, I found lines of code like this:

memcpy(systemDataPtr->serialNumber, resp.serialNumber, 16);

A quick peek at systemDataPtr->serialNumber shows it defined as this:

unsigned char serialNumber[MAX_SERIAL_NUMBER_LENGTH];

…with that constant defined as:

#define MAX_SERIAL_NUMBER_LENGTH        16

So while 16 is correct, the use of hard-coded “magic numbers” (hat tip to a previous manager, Pete S., who introduced me to that term) is probably best to be avoided. Change that #define, and things could go horribly wrong with a memory overrun or massive nuclear explosion or something.

One simple fix is to use the #define in the memcpy:

memcpy(systemDataPtr->serialNumber, resp.serialNumber, MAX_SERIAL_NUMBER_LENGTH);

This, of course, assumes that resp.serialNumber is also 16. Let’s see:

char serialNumber[16];

Ah, magic number! In this case, it comes from a DLL header file that does not share that #define, and the header file for the DLL was made by someone who had never made a Windows DLL before (me) and did not make #defines for these various lengths.

What if the DLL value ever got out-of-sync? Worst case, not all data would be copied (only 16 bytes). That seems fine. But if the DLL value became smaller, like 10, then the memcpy would still copy 16 bytes, copying the 10 from the DLL buffer plus 6 bytes of data in memory after it — buffer overrun?

In this case, since the destination buffer can hold 16 bytes, and we only copy up 16 bytes, the worst case is we could get some unintended data in that buffer.

sizeof() exists for a reason.

One thing I tend to do is use sizeof() instead of hard-coded numbers or the #define, since it will continue to work if the source buffer ever got changed from using the #define:

memcpy(systemDataPtr->serialNumber, resp.serialNumber, sizeof(systemDataPtr->serialNumber));

But this still has the same issue if the source resp.serialNumber became larger.

A safer, and more ridiculous, memcpy

Naturally, I came up with a ridiculous “solution”: A safer memcpy() that is much more of a pain to use because you have to know the size of each buffer and tell it the size of each buffer so it can make sure not to copy something larger than will fit into the destination buffer.

Here is the prototype of memcpy():

void * memcpy ( void * destination, const void * source, size_t num );

It will blindly copy up to “num” bytes from “source” to “destination”. But a ridiculous safer memcpy might look like this:

void * memcpy_safer ( void * destination, size_t sizeOfDestination,
                      const void * source, size_t sizeOfSource,
                      size_t num );

Just think of the extra overhead to add two more parameters for every use! Plus, it is a longer function name so you get to type even more! Wonderful.

Here is a quick implementation:

void * memcpy_safer ( void * destination, size_t sizeOfDestination,
                      const void * source, size_t sizeOfSource,
                      size_t num )
{
    // Use whichever size is the smallest.
    if ((num > sizeOfDestination) || (num > sizeOfSource))
    {
        if (sizeOfDestination < sizeOfSource)
        {
            num = sizeOfDestination;
        }
        else
        {
            num = sizeOfSource;
        }
    }
    
    return memcpy ( destination, source, num);
}

Now that the function knows the size of source and destination, it can do a simple check to make sure to use whichever is smallest for the max number of bytes to copy — regardless (or irregardless, depending on which you prefer) of how many bytes you specified.

If both buffers are larger than the “num” passed in, num is used. But if “num” is larger than either buffer, it is reduced to match the largest of the two buffers.

Note that this does not do any NULL checks, so it is relying on the behavior of memcpy() which would likely be a crash if NULL is passed in. To improve, NULL checks could be added.

I decided to pass this function into Bing CoPilot and see what it “thought” about this:

Limitations & Risks

Silent truncation: If num is too large, the function silently reduces it. This can lead to subtle bugs if the caller expects all bytes to be copied.

No error signaling: There’s no return code or logging to indicate that truncation occurred.

Still not “safe” in all cases: If destination or source are NULL, or if the buffers overlap, memcpy may still cause undefined behavior.

No runtime buffer validation: It assumes the caller provides correct buffer sizes, which may not be guaranteed.

– Bing CoPilot

And, of course, the A.I. offered up these changes:

#include <string.h>
#include <stddef.h>
#include <stdio.h>

void * memcpy_safer ( void * destination, size_t sizeOfDestination,
                      const void * source, size_t sizeOfSource,
                      size_t num )
{
    if (!destination || !source) {
        fprintf(stderr, "Error: NULL pointer passed to memcpy_safer.\n");
        return NULL;
    }

    if (num > sizeOfDestination || num > sizeOfSource) {
        size_t adjusted = (sizeOfDestination < sizeOfSource) ? sizeOfDestination : sizeOfSource;
        fprintf(stderr, "Warning: Truncating copy from %zu to %zu bytes.\n", num, adjusted);
        num = adjusted;
    }

    return memcpy(destination, source, num);
}

That version adds NULL checks, returns a NULL if either buffer passed in was NULL, and adds prints to standard error if a NULL happens or if the value was truncated.

Not bad, predictive language model.

My ridiculous test program

Here is my test program, which I wrote using the Online GDB C compiler:

/******************************************************************************

Welcome to GDB Online.
  GDB online is an online compiler and debugger tool for C, C++, Python, PHP, Ruby, 
  C#, OCaml, VB, Perl, Swift, Prolog, Javascript, Pascal, COBOL, HTML, CSS, JS
  Code, Compile, Run and Debug online from anywhere in world.

*******************************************************************************/
#include <stdint.h> // for uint8_t
#include <stdio.h>  // for printf()
#include <stdlib.h> // for EXIT_SUCCESS
#include <string.h> // for memcpy()

/*---------------------------------------------------------------------------*/
// PROTOTYPES
/*---------------------------------------------------------------------------*/

void * memcpy_safer ( void * destination, size_t sizeOfDestination,
                      const void * source, size_t sizeOfSource,
                      size_t num );

void * memcpy_safer2 ( void * destination, size_t sizeOfDestination,
                       const void * source, size_t sizeOfSource,
                       size_t num );

void initializeBuffer (void *dataPtr, size_t dataSize, uint8_t value);

void dumpBuffer (const char* prefix, void *dataPtr, size_t dataSize);

/*---------------------------------------------------------------------------*/
// MAIN
/*---------------------------------------------------------------------------*/

int main()
{
    uint8_t smallerBuffer[10];
    uint8_t largerBuffer[15];
    
    // Test 1: copy longer buffer into smaller buffer.
    
    printf ("\nInitialized buffers:\n\n");    
    
    // Initialize buffers with something we can identify later.
    initializeBuffer (smallerBuffer, sizeof(smallerBuffer), 0x1);
    dumpBuffer ("smallerBuffer", smallerBuffer, sizeof(smallerBuffer));

    initializeBuffer (largerBuffer, sizeof(largerBuffer), 0x2);
    dumpBuffer ("largerBuffer ", largerBuffer, sizeof(largerBuffer));

    printf ("\nTest 1: Copying largerBuffer into smallerBuffer...\n\n");

    memcpy_safer (smallerBuffer, sizeof(smallerBuffer), largerBuffer, sizeof(largerBuffer), 42);

    dumpBuffer ("smallerBuffer", smallerBuffer, sizeof(smallerBuffer));

    // Test 2: copy smaller buffer into larger buffer.

    printf ("\nInitialized buffers:\n\n");

    // Initialize buffers with something we can identify later.
    initializeBuffer (smallerBuffer, sizeof(smallerBuffer), 0x1);
    dumpBuffer ("smallerBuffer", smallerBuffer, sizeof(smallerBuffer));

    initializeBuffer (largerBuffer, sizeof(largerBuffer), 0x2);
    dumpBuffer ("largerBuffer ", largerBuffer, sizeof(largerBuffer));

    printf ("\nTest 2: Copying smallerBuffer into largerBuffer...\n\n");

    memcpy_safer (largerBuffer, sizeof(largerBuffer), smallerBuffer, sizeof(smallerBuffer), 42);

    dumpBuffer ("largerBuffer ", largerBuffer, sizeof(largerBuffer));

    return EXIT_SUCCESS;
}


/*---------------------------------------------------------------------------*/
// FUNCTIONS
/*---------------------------------------------------------------------------*/

/*---------------------------------------------------------------------------*/
// My ridiculous "safer" memcpy.
/*---------------------------------------------------------------------------*/
void * memcpy_safer ( void * destination, size_t sizeOfDestination,
                      const void * source, size_t sizeOfSource,
                      size_t num )
{
    // Use whichever size is the smallest.
    if ((num > sizeOfDestination) || (num > sizeOfSource))
    {
        if (sizeOfDestination < sizeOfSource)
        {
            num = sizeOfDestination;
        }
        else
        {
            num = sizeOfSource;
        }
    }
    
    return memcpy ( destination, source, num);
}


/*---------------------------------------------------------------------------*/
// Bing CoPilot changes.
/*---------------------------------------------------------------------------*/
void * memcpy_safer2 ( void * destination, size_t sizeOfDestination,
                       const void * source, size_t sizeOfSource,
                       size_t num )
{
    if (!destination || !source) {
        fprintf(stderr, "Error: NULL pointer passed to memcpy_safer.\n");
        return NULL;
    }

    if (num > sizeOfDestination || num > sizeOfSource) {
        size_t adjusted = (sizeOfDestination < sizeOfSource) ? sizeOfDestination : sizeOfSource;
        fprintf(stderr, "Warning: Truncating copy from %zu to %zu bytes.\n", num, adjusted);
        num = adjusted;
    }

    return memcpy(destination, source, num);
}


/*---------------------------------------------------------------------------*/
// Utility function to initialize a buffer to a set value.
/*---------------------------------------------------------------------------*/
void initializeBuffer (void *dataPtr, size_t dataSize, uint8_t value)
{
    if (NULL != dataPtr)
    {
        memset (dataPtr, value, dataSize);
    }
}


/*---------------------------------------------------------------------------*/
// Utility function to dump bytes in a buffer, with an optional prefix.
/*---------------------------------------------------------------------------*/
void dumpBuffer (const char* prefix, void *dataPtr, size_t dataSize)
{
    if (NULL != dataPtr)
    {
        if (NULL != prefix)
        {
            printf ("%s: ", prefix);
        }

        for (size_t idx=0; idx<dataSize; idx++)
        {
            printf ("%02x ", ((uint8_t*)dataPtr)[idx]);
        }
        printf ("\n");
    }
}

// End of memcpy_safer.c

If you want to run it there, you can use this link:

https://onlinegdb.com/Eu7FToIcQ

But of course, I am not using this code. It is ridiculous and requires extra typing.

Besides, I know exactly what I am doing in C and never make any mistakes… Really.

Until next time…

I only understand this a bit…

NOTE: Credit to the content of the article, other than my comments about the content, goes to whatever A.I. engine Microsoft uses for CoPilot…

While trying out a feature of Microsoft’s CoPilot A.I., I decided to ask it a simple question:

What is a good way to count the set bits in a 32-bit integer in the C programming language?

At my day job, we collect a byte from boards that multiplex eight sub-boards. There can be up to four of these multiplexer boards for a total of 32 sub-boards being multiplexed. Bits in those bytes will be set to indicate if a sub-board was found. The original designer has code that retrieves a byte from each multiplexer board then combines them into a 32-bit integer. There is code that counts the set bits to make sure we found as many boards as the system is configured to expect.

To my surprise, CoPilot gave me five different routines. The first was easy to understand, and a bit more clever than what I would have created. It called this the Iterative Bitwise Method with a description of “simple and portable, but not the fastest.”

int countSetBits(int n) {
    int count = 0;
    while (n) {
        count += n & 1;
        n >>= 1;
    }
    return count;
}

Next was something called the Brian Kernighan’s Algorithm, which was “efficient for sparse bit patterns.”

int countSetBits(int n) {
    int count = 0;
    while (n) {
        n &= (n - 1);
        count++;
    }
    return count;
}

Third was a Lookup Table Method for “fast for repeated calls but requires extra memory.”

unsigned char lookupTable[256];

void initializeLookupTable() {
    for (int i = 0; i < 256; i++) {
        lookupTable[i] = countSetBits(i); // Use any method to count bits
    }
}

int countSetBitsUsingLookup(int n) {
    return lookupTable[n & 0xFF] +
           lookupTable[(n >> 8) & 0xFF] +
           lookupTable[(n >> 16) & 0xFF] +
           lookupTable[(n >> 24) & 0xFF];
}

Fourth was something I had never heard of: Compiler Intrinsics. It said “may modern compilers provide built-in functions to count set bits efficiently” and described it as “extremely fast but platform-dependent.”

#include <intrin.h>

int countSetBits(int n) {
    return __popcnt(n);
}

In my world of “barely C” embedded compiles that don’t even support the modern printf formats for printing pointers, I do not expect to have access to this.

And fifth was Parallel Bitwise Operations, described as “fast and portable, but harder to understands.”

And that is the version that made me post this:

int countSetBits(int n) {
    n = n - ((n >> 1) & 0x55555555);
    n = (n & 0x33333333) + ((n >> 2) & 0x33333333);
    n = (n + (n >> 4)) & 0x0F0F0F0F;
    n = n + (n >> 8);
    n = n + (n >> 16);
    return n & 0x3F;
}

Harder to understand is an understatement. My brane hurts just looking at that trying to figure out how it works.

And I thought I’d share that hurt with you.

Have a good C day…

Old C dog, new C tricks part 5: inline prototypes?

See Also: part 1, part 2, part 3, part 4 and part 5.

This post is a departure from what most of the others are like. I am most certainly not going to be using this “trick” I just learned.

Background

Recently in my day job, I was doing a code review and came across something that was most certainly not legal C code. In fact, I was confident that line wouldn’t even compile without issuing a warning. Yet, the developer said he did not see a warning about it.

The bit of code was supposed to be calling a function that returns a populated structure. Consider this silly example:

#include <stdio.h>

// typedefs
typedef struct {
    unsigned int major;
    unsigned int minor;
    unsigned int patch;
} VersionStruct;

// prototypes
VersionStruct GetVersion (void);

// main
int main()
{
    VersionStruct foo;
    
    foo = GetVersion ();
    
    printf ("Version %u.%u.%u\n", foo.major, foo.minor, foo.patch);

    return 0;
}

// functions
VersionStruct GetVersion ()
{
    VersionStruct ver;
    
    ver.major = 1;
    ver.minor = 0;
    ver.patch = 42;
    
    return ver;
}

But in the code, the call to the function was incomplete. There was no return variable, and even had “void” inside the parens. It looked something like this:

int main()
{
    VersionStruct GetVersion (void);

    return 0;
}

I took one look at that and said “no way that’s working.” But there had been no compiler warning.

So off I went to the Online GDB Compiler to type up a quick example.

And it built without warning.

Well, maybe the default is to ignore this warning… So I added “-Wall” and “-Wextra” to the build flags. That should catch it :)

And it built without warning.

“How can this work? It looks like a prototype in the middle of a function!” I asked.

Yes, Virginia. You can have inline prototypes.

A brief bit of searching told me that, yes, inline prototypes were a thing.

This should give a compiler warning:

#include <stdio.h>

int main()
{
function ();

return 0;
}

void function (void)
{
printf ("Inside function.\n");
}

When I built that, I received two compiler warnings:

main.c: At top level:
main.c:10:6: warning: conflicting types for ‘function’; have ‘void(void)’
10 | void function (void)
| ^~~~~~~~
main.c:5:5: note: previous implicit declaration of ‘function’ with type ‘void(void)’
5 | function ();
| ^~~~~~~~

The first warning is not about the missing prototype, but about “conflicting types”. In C, a function without a prototype is assumed to be a function that returns an int.

Had I made function like this…

int function (void)
{
    printf ("Inside function.\n");
    return 0;
}

…I’d see only one, but different, warning:

main.c: In function ‘main’:
main.c:5:5: warning: implicit declaration of function ‘function’ [-Wimplicit-function-declaration]
5 | function ();
| ^~~~~~~~

For the first example, the compiler makes an assumption about what this function should be, then finds code using it the wrong way. It warns me that I am not using it like the implied prototype says it should be used. Sorta.

For the next, my function matches the implied prototype, so those warnings go away, but a real “implicit declaration” warning is given.

Going back to the original “void” code, I can add an inline prototype in main() to make these all go away:

#include <stdio.h>

int main()
{
    void function(void); // Inline prototype?
    
    function ();

    return 0;
}

void function (void)
{
    printf ("Inside function.\n");
}

I had no idea that was allowed.

I have no idea why one would do that. BUT, I suppose if you wanted to get to one function without an include file with a prototype for it, you could just stick that right before you call the function…

But Why would you want to do that?

I learned this is possible. I do not think I want to ever do this. Am I missing some great benefit for being able to have a prototype inside a function like this? Is there some “clean code” recommendation that might actually say this is useful?

“It wouldn’t be in there if it didn’t have a reason.”

Let me know what you know in the comments. Until next time…

Old C dog, new C tricks part 4: no more passing buffers?

In the previous installment, I rambled on about the difference of “char *line;” and “char line[];”. The first is a “pointer to char(s)” and the second is “an array of char(s)”. But, when you pass them into a function, they both are treated as a pointer to those chars.

One “benefit” of using “line[]” was that you could use sizeof(line) on it and get the byte count of the array. This is faster than using strlen().

But if you pass it into a function, all you have is a pointer so strlen() is what you have to use.

While you can’t pass an “array of char” into a function as an array of char, you can pass a structure that contains an “array of char” and sizeof() will work on that:

#include <stdio.h>
#include <stdlib.h> // for EXIT_SUCCESS
#include <string.h>
typedef struct
{
    char buffer[80];
} MyStruct;
void function (MyStruct test)
{
    printf ("sizeof(test.buffer) = %zu\n", sizeof(test.buffer));
}
int main(void)
{
    MyStruct test;
    
    strncpy (test.buffer, "This is a test", sizeof(test.buffer));
    
    function (test);
    return EXIT_SUCCESS;
}

You may notice that was passing a copy of the structure in, but stay with me for a moment.

If you have a function that is supposed to copy data into a buffer:

#define VERSION_STRING "1.0.42b-alpha"
void GetVersion (char *buffer)
{
    if (NULL != buffer)
    {
        strcpy (buffer, VERSION_STRING);
    }
}

…you can easily have a buffer overrun problem if the function writes more data than is available in the caller’s buffer. Because of this potential problem, I add a size parameter to such functions:

void GetVersion (char *buffer, size_t bufferSize)
{
    if (NULL != buffer)
    {
        // Copy up to bufferSize bytes.
        strncpy (buffer, VERSION_STRING, bufferSize);
    }
}

As long as the caller passes the correct parameters, this is safe:

char buffer[20];
GetVersion (buffer, 20);

But the caller could still screw up:

char buffer[20];
GetVersion (buffer, 200); // oops, one too many zeros

But if you use a structure, it is impossible for the caller to mess it up (though, of course, they could mess up the structure on their side before calling your function). The compiler type checking will flag if the wrong data type is passed in. The “buffer” will always be the “buffer.” No chance of a “bad pointer” or “buffer overrun” crashing the program.

To allow the buffer inside the structure to be modified, pass it in by reference:

#include <stdio.h>
#include <stdlib.h> // for EXIT_SUCCESS
#include <string.h>
#define VERSION_STRING "1.0.42b-alpha"
typedef struct
{
    char buffer[80];
} MyStruct;
void GetVersion (MyStruct *test)
{
    strncpy (test->buffer, VERSION_STRING, sizeof(test->buffer));
}
int main(void)
{
    MyStruct test;
    
    GetVersion (&test);
    
    printf ("Version: %s\n", test.buffer);
    return EXIT_SUCCESS;
}

Using this approach, you can safely pass a “buffer” into functions and they can get the sizeof() the buffer to ensure they do not overwrite anything.

But wait, there’s more…

It is pretty easy for a function to get out of control if you are trying to get back more than one thing. If you just want an “int”, that’s easy…

int GetCounter ()
{
    static int s_count = 0;
    return s_count++;
}

But if you wanted to get the major, minor, patch and build version, you end up passing in ints by reference to get something like this:

void GetVersion (int *major, int *minor, int *patch, int *build)
{
   if (NULL != major)
   {
      *major = MAJOR_VERSION;
   }
   if (NULL != minor)
   {
      *major = MINOR_VERSION;
   }
   if (NULL != patch)
   {
      *major = PATCH_VERSION;
   }
   if (NULL != build)
   {
      *major = BUILD_VERSION;
   }
}

Of course, anytime pointers are involved, the caller could pass in the wrong pointer and things could get screwed up. Plus, look at all those NULL checks to make sure the pointer isn’t 0. (This does not help if the pointer is pointing to some random location in memory.)

#include <stdio.h>
#include <stdlib.h> // for EXIT_SUCCESS
#define MAJOR_VERSION 1
#define MINOR_VERSION 0
#define PATCH_VERSION 0
#define BUILD_VERSION 42
typedef struct
{
    int major;
    int minor;
    int patch;
    int build;
} VersionStruct;
VersionStruct GetVersion ()
{
    VersionStruct ver;
    
    ver.major = MAJOR_VERSION;
    ver.minor = MINOR_VERSION;
    ver.patch = PATCH_VERSION;
    ver.build = BUILD_VERSION;
    
    return ver;
}
int main(void)
{
    VersionStruct ver;
    
    ver = GetVersion ();
    
    printf ("Version: %u.%u.%u.%u\n",
        ver.major, ver.minor, ver.patch, ver.build);
    return EXIT_SUCCESS;
}

If you are concerned about overhead of passing structures, you can pass them by reference (pointer) and the compiler should still catch if a wrong pointer type is passed in:

#include <stdio.h>
#include <stdlib.h> // for EXIT_SUCCESS
#define MAJOR_VERSION 1
#define MINOR_VERSION 0
#define PATCH_VERSION 0
#define BUILD_VERSION 42
typedef struct
{
    int major;
    int minor;
    int patch;
    int build;
} VersionStruct;
void GetVersion (VersionStruct *ver)
{
    if (NULL != ver)
    {
        ver->major = MAJOR_VERSION;
        ver->minor = MINOR_VERSION;
        ver->patch = PATCH_VERSION;
        ver->build = BUILD_VERSION;
    }
}
int main(void)
{
    VersionStruct ver;
    
    GetVersion (&ver);
    
    printf ("Version: %u.%u.%u.%u\n",
        ver.major, ver.minor, ver.patch, ver.build);
    return EXIT_SUCCESS;
}

However, when dealing with pointers, there is always some risk. While the compiler will catch passing in the wrong structure pointer, there are still ways the caller can screw it up. For instance, void pointers:

int main(void)
{
    void *nothing = (void*)0x1234;
    
    GetVersion (nothing);
    return EXIT_SUCCESS;
}

Yep. Crash.

...Program finished with exit code 139
Press ENTER to exit console.

Give someone access to a function in your DLL and they might find a way to crash the program as simply as using a void pointer.

It is a bit trickier when you pass the full structure:

typedef struct
{
    int x;
} BogusStruct;
int main(void)
{
    BogusStruct ver;
    
    ver = GetVersion ();
    
    return EXIT_SUCCESS;
}

Compiler don’t like:

main.c: In function ‘main’:
main.c:38:11: error: incompatible types when assigning to type ‘BogusStruct’ from type ‘VersionStruct’
38 | ver = GetVersion ();
| ^~~~~~~~~~
main.c:36:17: warning: variable ‘ver’ set but not used [-Wunused-but-set-variable]
36 | BogusStruct ver;
| ^~~

And you can’t really cast a return value like this:

int main(void)
{
BogusStruct ver;

(VersionStruct)ver = GetVersion ();

return EXIT_SUCCESS;
}

Compiler don’t like:

main.c: In function ‘main’:
main.c:38:5: error: conversion to non-scalar type requested
38 | (VersionStruct)ver = GetVersion ();
| ^

Though maybe you could cast it if it was passed in as a parameter:

void ShowVersion (VersionStruct ver)
{
    printf ("Version: %u.%u.%u.%u\n",
    ver.major, ver.minor, ver.patch, ver.build);
}
int main(void)
{
    BogusStruct ver;
    
    ShowVersion ((VersionStruct)ver);
    
    return EXIT_SUCCESS;
}

Compiler still don’t like:

main.c: In function ‘main’:
main.c:44:5: error: conversion to non-scalar type requested
44 | ShowVersion ((VersionStruct)ver);
| ^~~~~~~~~~~

Hmm. Is there a way to screw this up? Let me know in the comments.

Until then…

Old C dog, new C tricks part 3: char *line vs char line[]

You, of course, already knew this. But I learn from your comments, so please leave some. Thanks!

This may be the next “old dog, new trick” I adapt too.

When I started learning C in the late 1980s, I had a compiler manual (not very useful for learning the language) and a Pocket C Reference book — both for pre-ANSI K&R C. I may have had another “big” C book, but I mostly remember using the Pocket C Book.

Looking back at some of my early code, I find I was declaring “fixed” strings like this:

And this shows us:

char version[5]="0.00"; /* Version number... */

Odd. Did I really count the bytes (plus 0 at the end) for every string like that? Not always. I found this one:

char filename[28]="cocofest3.map";

…but I think I remember why 28. In the OS-9/6809 operating system, directory entries were 32 bytes. The first 28 were the filename (yep, back in the 80s there were operating systems with filenames longer than FILENAME.EXT), and then three at the end were the LSN (logical sector number) where the File ID sector was. (More or less accurate.)

I also found arrays:

int *days[] = { "Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat" };

But why is that an int??? That must have been a typo/bug. Other places, I did it more correctly:

char *items[] = { /* employee info prompt thingies */
   "Employee :",
   "Min/Week :",
   "Max/Week :",
   "Min/Shift:",
   "Max/Shift:"
};

At some point in my C history, I just started using pointers to strings like this:

char *version = "1.0.42b-delta";

I guess I got tired of [brackets]. I mean, “they work the same way”, don’t they?

void function (char *line)
{
    if (NULL != line)
    {
        printf ("Line: '%s'\n", line);
    }
}

…and…

void function (char line[])
{
    if (NULL != line)
    {
        printf ("Line: '%s'\n", line);
    }
}

…both end up with line pointing to the start of wherever the bytes to that string are in memory. I’ve seen main() done the same ways:

int main (int arc, char *argv[] )

…and…

int main ( int argc, char **argv )

For years, I’ve been doing it the second way, but all my early code was *argv[] so I suspect that is how I learned it from my early K&R C books.

I have no idea why I changed, or when, but probably in the mid-to-late 1990s. I started working for Microware Systems Corporation in Des Moines, Iowa in 1995. This was the first place I used an ANSI-C compiler. In code samples from the training courses I taught, some used “*argv[]” but ones I wrote used “**argv”.

Does it matter?

Not really. But let’s talk about it anyway…

There was a comment left on one my articles last year that pointed out something different I had no considered: sizeof

If you have “char *line” you cannot use sizeof() to give you anything but the size of the pointer (“sizeof(line)”) or the size of a character (or whatever data type used) that it points to (“sizeof(*line)”).

If you have “char line[]”, you can get the size of the array (number of characters, in this case) or the size of one of the elements in it:

#include <stdio.h>
#include <stdlib.h> // for EXIT_SUCCESS
int main(void)
{
    char *line1 = "1234567890";
    
    char line2[] = "1234567890";
    
    printf ("sizeof(line1)  = %zu\n", sizeof(line1));
    printf ("sizeof(*line1) = %zu\n", sizeof(*line1));
    printf ("\n");
    printf ("sizeof(line2)  = %zu\n", sizeof(line2));
    printf ("sizeof(*line2) = %zu\n", sizeof(*line2));
    return EXIT_SUCCESS;
}

This produces:

sizeof(line1)  = 8  <- size of a 64-bit pointer
sizeof(*line1) = 1 <- size of a char

sizeof(line2) = 11 <- size of the character array
sizeof(*line2) = 1 <- size of a char

I cannot remember ever using sizeof() on a string constant. You may recall I was surprised it worked when I learned about it a few months ago.

But, now that I am aware, I think I may start moving myself back to where I started and using the [brackets] when I have constant strings. Using sizeof() in the program just embeds a constant value, while strlen() is a function that walks through each byte looking for the end zero, thus adding more code space and more execution time.

If I wanted to copy some constant string into a buffer, I could try these two approaches:

// Copy message into buffer.
char *line1 = "This is a message.";
strncpy (buffer, line1, strlen(line1)); // strlen
printf ("Buffer: '%s'\n", buffer);
char line2[] = "This is a message.";
strncpy (buffer, line2, sizeof(*line2)); // sizeof
printf ("Buffer: '%s'\n", buffer);

And the results are the same:

Buffer: 'This is a message.'
Buffer: 'This is a message.'

I would use the second version since using sizeof(*line2) avoids the overhead of strlen() scanning through each byte in the string looking for the end zero.

NOTE: As was pointed out in the comments, strlen() returns the number of characters up to the zero. “Hello” is a strlen() of 5. But sizeof() is the full array or characters including the 0 at the end so “Hello” would have a sizeof() of 6.

char line[] = "1234567890";
printf ("strlen(line) = %u\n", strlen(line));
printf ("sizeof(line) = %u\n", sizeof(line));
strlen(line) = 10
sizeof(line) = 11

If you wanted them to be the same, it would be “sizeof(line)-1”.

It’s all fun and games until you pass a parameter…

This “benefit” of sizeof() is not useful if you are passing the string in to a function. It just ends up like a pointer to wherever the string is stored:

#include <stdio.h>
#include <stdlib.h> // for EXIT_SUCCESS
#include <string.h>
void function1 (char *line)
{
    printf ("function1():\n");
    printf ("sizeof(line)  = %zu\n", sizeof(line));
    printf ("sizeof(*line) = %zu\n", sizeof(*line));
    printf ("strlen(line)  = %zu\n", strlen(line));
}
void function2 (char line[])
{
    printf ("function2():\n");
    printf ("sizeof(line)  = %zu\n", sizeof(line));
    printf ("sizeof(*line) = %zu\n", sizeof(*line));
    printf ("strlen(line)  = %zu\n", strlen(line));
}
int main(void)
{
    char *line1 = "1234567890";
    printf ("Line 1: '%s'\n", line1);
    function1 (line1);
    function2 (line1);
    
    printf ("\n");
    char line2[] = "1234567890";
    printf ("Line 2: '%s'\n", line2);
    function1 (line2);
    function2 (line2);
    return EXIT_SUCCESS;
}

Above, I create a “*line” pointer to a string then pass it in to two functions. The first expects a *line as the parameter, and the second expects a line[].

Then I do a “line[]” array and pass it to the same two functions.

The results are the same:

Line 1: '1234567890'
function1():
sizeof(line) = 8
sizeof(*line) = 1
strlen(line) = 10
function2():
sizeof(line) = 8
sizeof(*line) = 1
strlen(line) = 10

Line 2: '1234567890'
function1():
sizeof(line) = 8
sizeof(*line) = 1
strlen(line) = 10
function2():
sizeof(line) = 8
sizeof(*line) = 1
strlen(line) = 10

And, if you use a “good compiler,” you may get a warning about doing sizeof() like this:

main.c: In function ‘function2’:
main.c:16:44: warning: ‘sizeof’ on array function parameter ‘line’ will return size of ‘char *’ [-Wsizeof-array-argument]
   16 |     printf ("sizeof(line)  = %zu\n", sizeof(line));

Notice that warning was from function2(), and not from function1(). This is one difference in using “*line” versus “line[]” in the functions. For function1(), no warning is given:

void function1 (char *line)
{
    printf ("function1():\n");
    printf ("sizeof(line)  = %zu\n", sizeof(line));
    printf ("sizeof(*line) = %zu\n", sizeof(*line));
    printf ("strlen(line)  = %zu\n", strlen(line));
}

Since the function takes a “pointer to one or more chars”, doing sizeof() what that pointer points to makes sense. It is what you asked for. The “C gibberish” website says:

declare line as pointer to char

https://cdecl.org/?q=char+*line

But for the second one…

void function2 (char line[])
{
    printf ("function2():\n");
    printf ("sizeof(line)  = %zu\n", sizeof(line));
    printf ("sizeof(*line) = %zu\n", sizeof(*line));
    printf ("strlen(line)  = %zu\n", strlen(line));
}

…a warning is given about the “sizeof(line)” because it cannot tell us the size of a line[] array — it became a pointer to the character memory when it went into the function. But because the function parameter was “line[]”.

declare line as array of char

https://cdecl.org/?q=char+line%5B%5D

Doing sizeof() an “array of char” is valid. But it was passed into the function, even though the parameter was a line[] it is passed as a pointer to the data. I guess this is one of those “I’m sorry, Dave. I’m afraid I can’t do that” moments ;-)

Is this useful? It certainly will let you use sizeof() instead of strlen() on a string if you have direct access to the string variable. But passing strings into functions? Not so much. (Or am I mistaken?)

But I do think I am going to try to go back to using “line[]” for my string declarations. I like retro.

Until next time…

Old C dog, new C tricks part 2: i won’t use i

See Also: part 1, part 2, part 3, part 4 and part 5.

When I learned BASIC, manuals always showed the FOR/NEXT loop using the variable “I”:

FOR I=1 TO 100:NEXT I

“I” had no idea why this was, back then, but since then I have been told there was some history with some programming language that had certain letters reserved for certain features, such as as I for loops.

But is this true? A quick Google search produced an “AI summary”:

The convention of using “i” as the loop counter variable originates from the mathematical notation where “i,” “j,” and “k” are commonly used as indices or subscripts to represent integers in sequences or arrays. This practice was adopted early in computer science and has persisted due to its conciseness and familiarity among programmers. While any valid variable name could technically be used, “i” serves as a readily recognizable and easily understood placeholder for the loop index, especially in simple iterations. In nested loops, “j” and “k” are conventionally used for the inner loop counters.

– Google AI Summary of my search result.

But that’s not important right now…

I suppose I had a bit of O.C.D. in me, because starting with “I” seemed weird.

I would do my loops starting with “A”.

FOR A=1 TO 100:NEXT A

When variables needed to mean something, I’d use the one or two-character variable name that made sense — NM$ for Name string, UC for User Count, etc. But for loops and other things, I’d use A, B, C, D, etc.

C books were similar, showing “i” for loops:

for (int i=0; i<100; i++)
{
    ...stuff...
}

For some reason, I always used “i” in C rather than a, b, c… My nested loops looked just like the AI summary described:

for (int i=0; i<10; i++)
{
    for (int j=0; j<10; j++)
    {
        for (int k=0; k<10; k++)
        {
            ....stuff....
        }
    }
}

It was at a previous job that another embedded programmer said something to me that changed this forever.

“You can’t search for i…”

– paraphrased quote from embedded programmer I used to work with.

While today there are modern editors that let you specify how a search works — full word or partial, ignore case or case sensitive, and even regular expressions — but you can never depend on having access to those tools. Some jobs are very restrictive about the software you are allowed to install on your work computer. Some simply don’t allow any installs by employees: you get the set of preconfigured apps for the position (Microsoft Office for some roles, compilers and such for others, etc.) and that might be it.

He told me he used “idx” for loops, rather than “I”. And that was enough to change my C coding habits instantly. Even since, when I do a loop, it’s like this…

for (idx=0; idx<100; idx++)
{
    ...stuf...
}

And when I’m looping through things of a given type, it do things like:

for (int cardIdx=0; cardIdx<52; cardIdx++)
{
    ...stuf...
}

Who says you can’t teach an old dog new tricks?

What do you use for your loops? And why?

Comments, if you have them…