Category Archives: C Programming

C has its limits. If you know where to look.

Thank you, Bing Copilot (ChatGPT), for giving me another “thing I just learned” to blog about.

In the early days of “K&R C”, things were quite a bit different. C was not nearly as portable as it is today. While the ANSI-C standard helped quite a bit, once it became a standard, there were still issues when moving C code from machines of different architectures — for example:

int x;

What is x? According to the C standard, and “int” is “at least 16 bits.” On my Radio Shack Color Computer, and int was 16-bits (0-65535). I expect on my friend’s Commodore Amiga, the int was 32-bits, though I really don’t know. And even when you “know”, assuming that to be the case is a “bad thing.”

I used a K&R C compiler on my CoCo, and later on my 68000-based MM/1 computer. That is when I became aware that an “int” was different. Code that worked on my CoCo would port fine to the MM/1, since it was written assuming an int was 16-bits. But trying to port anything from the MM/1 to the CoCo was problematic if the code had assumed an int was 32-bits.

When I got a job at Microware in 1995, I saw my first ANSI-C compiler: Ultra C. To deal with “what size is an int” issues, Microware created their own header file, types.h, which included their definitions for variables of specific sizes:

u_int32 x;
int32 y;

All the OS library calls were prototyped to use these special types, though if you know an “unsigned long” was the same as an “u_int32” or a “short” was the same as an “int16” you could still use those.

But probably shouldn’t.

In those years, I saw other compilers do similar things, such as “U32 x;” and “I16 y”. I expect there were many variations of folks trying to solve this problem.

Some years later, I used the GCC compiler for the first time and learned that the ANSI-C specification now had it’s own types.h — called stdint.h. That gave us things like:

uint32_t x;
int32_t y;

It was easy to adopt these new standard definitions, and I have tried to use them ever since.

I was also introduced in to the defines that specified the largest value that would fit in an “int” or “long” on a system – limits.h:

...
#define CHAR_MAX 255 /*unsigned integer maximum*/
#define CHAR_MIN 0 /*unsigned integer minimum*/

/* signed int properties */
#define INT_MAX 32767 /* signed integer minimum*/
#define INT_MIN (-32767-_C2) /*signed integer maximum*/

/* signed long properties */
#define LONG_MAX 2147483647 /* signed long maximum*/
#define LONG_MIN (-2147483647-_C2) /* signed long minimum*/
...

The values would vary based on if your system was 16-bits, 32-bits or 64-bits. It allowed you to do this:

int x = INT_MAX;
unsigned int y = UINT_MAX;

…and have code that would compile on a 16-bit or 64-bit system. If you had tried something like this:

unsigned int y = 4294967295; // Max 32-bit value.

…that code would NOT work as expected when compiled on a 16-bit system (like my old CoCo, or an Arduino UNO or the PIC24 processors I use at work).

I learned to use limits.h.

But this week, I was working on code that needed to find the highest and lowest values in a 32-bit number range. I had code like this:

uint32_t EarliestSequenceNumber = 4294967295;
uint32_t LatestSequenceNumber = 0;

And that works fine, and should work fine on any system where an int can hold a 32-bit value. (Though I used hex, since I know 0xffffffff is the max value, and always have to look up or use a calculator to find out the decimal version.)

Had I been using signed integers, I would be doing this:

int32_t LargestSignedInt = 2147483647;

Or I’d use 0x7fffffff.

As I looked at my code, I wondered if C provided similar defines for the stdint.h types.

stdint.h also has stdsizes!

And it does! Since all of this changed/happened after I already “learned” C, I never got the memo about new features being added. Inside stdint.h are also defines like this:

#define INT8_MAX  (127)
#define INT8_MIN (-128)
#define UINT8_MAX (255)

#define INT16_MAX (32767)
#define INT16_MIN (-32768)
#define UINT16_MAX (65535)

#define INT32_MAX (2147483647)
#define INT32_MIN (-2147483648)
#define UINT32_MAX (4294967295)

#define INT64_MAX (9223372036854775807)
#define INT64_MIN (-9223372036854775808)
#define UINT64_MAX (18446744073709551615)

…very similar to what limits.h offers for standard ints, etc. Neat!

Now mode code can do:

uint32_t EarliestSequenceNumber = UINT32_MAX;
uint32_t LatestSequenceNumber = 0;

…and that’s the new C thing I learned today.

And it may have even been there when I first learned about stdint.h and I just did not know.

And knowing is half the battle.

Once you go read-only…

…you can never go back.

After being shown that you can declare a global variable, as one does…

int g_globalVariable;

…and then make it be treated as a read-only variable to other files by extern-ing it as a const:

extern int const g_globalVariable;

…of course I wondered what the compiler did if you went the other way:

// main.c
#include <stdio.h>

void function (void);

int const c_Value = 0;

int main()
{
    printf("Hello World\n");
    
    printf ("c_Value: %d\n", c_Value);
    
    function ();
    
    printf ("c_Value: %d\n", c_Value);

    return 0;
}

// function.c
#include <stdio.h>

// Extern as a non-const.
extern int c_Value;

void function ()
{
    c_Value++;
}

Above, main.c contains a global const variable, but function.c tries to extern it as non-const.

But when I run the code…

Hello World
c_Value: 0


...Program finished with exit code 139
Press ENTER to exit console.

…the compiler does not complain, but we get a crash. Looking at this in a debugger shows more detail:

Program received signal SIGSEGV, Segmentation fault.
0x00005555555551ef in function () at Function.c:11
11 c_Value++;

I am unfamiliar with the inner workings on whatever compiler this Online C Compiler – online editor is using, but I suspect I’d see similar results doing this on any system with memory protection. Go back to the early days (like OS-9 on a 6809 computer, or even on a 68000 without an MMU) and … maybe it just allows it and it modifies something it shouldn’t?

We can file this away in the “don’t do this” category.

Until next time…

Modifying read-only const variables in C

This is a cool trick I just learned from commenter Sean Patrick Conner in a previous post.

If you want to have variables globally available, but want to have some control over how they are set, you can limit the variables to be static to a file containing “get” and “set” functions:

static int S_Number = 0;

void SetNumber (int number)
{
S_Number = number;
}

int GetNumber (void)
{
return S_Number;
}

This allows you to add range checking or other things that might make sense:

void SetPowerLevel (int powerLevel)
{
if ((powerLevel >= 0) || (powerLevel <= 100))
{
S_PowerLevel = powerLevel;
}
}

Using functions to get and set variables adds extra code, and also slows down access to those variables since it is having to jump in to a function each time you want to change the variable.

The benefit of adding range checking may be worth the extra code/speed, but just reading a variable has not reason to need that overhead.

Thus, Sean’s tip…

Variables declared globally in a file cannot be accessed anywhere else unless you use “extern” to declare them in any file that wants to use them. You might declare some globals in globals.c like this:

// Globals.c
int g_number;

…but trying to access “g_number” anywhere else will not work. You either need to add:

extern int g_number;

…in any file that wants access to it, or, better, make something like globals.h that contains all your extern references:

// Globals.h
extern int g_number;

Now any file that needs access to the globals can just include “globals.h” and use them:

#include "globals.h"

void function (void)
{
printf ("Number: %d\n", g_number);
}

That was not Sean’s tip.

Sean mentioned something that makes sense, but I do not think I’d ever tried: The extern can contain the “const” keyword, even if the declaration of the variable does not!

This means you could have a global variable like above, but in globals.h do this:

// Globals.h
extern int const g_number;

Now any file that includes “globals.h” has access to g_number as a read-only variable. The compiler will not let code build if there is a line trying to modify it other than globals.c where it was actually declared non-const.

Thus, you could access this variable as fast as any global, but not modify it. For that, you’d need a set routine:

// Globals.c
int c_number; // c_ to indicate it is const, which it really isn't.

// Set functions
void SetNumber (int number)
{
c_number = number;
}

Now other code can include “globals.h” and have read-only access to the variable directly, but can only set it by going through the set function, which could enforce data validation or other rules — something just setting it directly could not.

#include "Globals.h"

int main(int argc, char **argv)
{
printf ("Number: %d\n", c_number);

SetNumber (42);

printf ("Number: %d\n", c_number);

return 0;
}

That seems quite obvious now that I have been shown it. But I’ve never tried it. I have made plenty of Get/Set routines over the years (often to deal with making variable access thread-safe), but I guess it never dawns on me that, when not dealing with thread-safe variables, I could have direct read-only access to a variable, but still modify it through a function.

Global or static?

One interesting benefit is that any other code that needed direct access to this variable (for speed reasons or whatever) could just add its own extern rather than using the include “Globals.h”:

// Do this myself so I can modify it
extern int c_number;

void MyCode (void)
{
// It's my variable and I can do what I want with it!
c_number = 100;
}

By using the global, it opens up that as a possibility.

And since functions are used to set them, they could also exist to initialize them.

// Globals.c
// Declared as non-const, but named with "c_" to indicate the rest of the
// code cannot modify it.
int c_number;

// Init functions
void InitGlobals (void)
{
c_number = 42;
}

// Set functions.
void SetNumber (int number)
{
c_number = number;
}
// Globals.h

// Extern as a const so it is a read-only.
extern int const c_number;

// Prototypes
void InitGlobals (void);

void SetNumber (int number);
#include <stdio.h>

#include "Globals.h"

int main()
{
InitGlobals ();

printf ("c_number = %d\n", c_number);

// This won't work.
//c_number = 100;

SetNumber (100);

printf ("c_number = %d\n", c_number);

return 0;
}

Spiffy.

I had thought about using static to prevent the “extern” trick from working, but realize if you did that, there would be no read-only access outside of that file and a get function would be needed. And we already knew how to do that.

I love learning new techniques like this. The code I maintain in my day job has TONS of globals for various reasons, and often has duplicate code to do range checking and such. I could see using something like this to clean all of that up and still retain speed when accessing the variables.

Got any C tricks? Comment away…

const-ant confusion in C, revisited.

I do not know why this has confused me so much over the years. BING CoPilot (aka ChatGPT) explains it so clearly I do not know how I ever misunderstood it.

But I am getting ahead of myself.

Back in 2017, I wrote a bit about const in C. A comment made by Sean Patrick Conner on a recent post made me revisit the topic of const in 2024.

If you use const, you make a variable that the compiler will not allow to be changed. It becomes read-only.

int normalVariable = 42;
const int constVariable = 42;

normalVariable = 100; // This will work.

constVariable = 100; // This will not work.

When you try to compile, you will get this error:

error: assignment of read-only variable ‘constVariable’

That is super simple.

But let me make one more point-er…

But for pointers, it is a bit different. You can declare a pointer and change it, like this:

char *ptr = 0x0;

ptr = (char*)0x100;

And if you did not want the pointer to change, you might try adding const like this:

const char *ptr = 0x0;

ptr = (char*)0x100;

…but you would fine that compiles just fine, and you still can modify the pointer.

In the case of pointers, the “const” at the start means what the pointer points to, not the pointer itself. Consider this:

uint8_t buffer[10];

// Normal pointer.
uint8_t *normalPtr = &buffer[0];

// Modify what it points to.
normalPtr[0] = 0xff;

// Modify the pointer itself.
normalPtr++;

Above, without using const, you can change the data that ptr points to (inside the buffer) as well as the pointer itself.

But when you add const…

// Pointer to constant data.
const uint8_t *constPtr1 = &buffer[0];
// Or it can be written like this:
// uint8_t const *constPtr1 = &buffer[0];

// You can NOT modify the data the pointer points to:
constPtr1[1] = 1; // error: assignment of read-only location ‘*(constPtr1 + 2)

// But you can modify the pointer itself:
constPtr1++;

Some of my longstanding confusion came from where you put “const” on the line. In this case, “const uint8_t *ptr” is the same as “uint8_t const *ptr”. Because reasons?

Since using const before or after the pointer data type means “you can’t modify what this points to”, you have to use const in a different place if you want the pointer itself to not be changeable:

// Constant pointer to data.
// We can modify the data the pointer points to, but
// not the pointer itself.
uint8_t * const constPtr3 = &buffer[0];

constPtr3[3] = 3;

// But this will not work:
constPtr3++; // error: increment of read-only variable ‘constPtr3’

And if you want to make it so you cannot modify the pointer AND the data it points to, you use two consts:

// Constant pointer to constant data.

// We can NOT modify the data the pointer points to, or
// the pointer itself.
const uint8_t * const constPtr4 = &buffer[0];

// Neither of these will work:
constPtr4[4] = 4; // error: assignment of read-only location ‘*(constPtr4 + 3)’

constPtr4++; // error: increment of read-only variable ‘constPtr4’

Totally not confusing.

The pattern is that “const” makes whatever follows it read-only. You can do an integer variable both ways, as well:

const int constVariable = 42;

int const constVariable = 42;

Because reasons.

The cdecl: C gibberish ↔ English webpage will explain this and show them both to be the same:

const int constVariable
declare constVariable as const int

int const constVariable
declare constVariable as const int

Since both of those are the same, “const char *” and “char const *” should be the same, too.

const char *ptr
declare ptr as pointer to const char

char const *ptr
declare ptr as pointer to const char

However, when you place the const in front of the variable name, you are no longer referring to the pointer (*) but that variable:

char * const ptr
declare ptr as const pointer to char

Above, the pointer is constant, but not what it points to. Adding the second const:

const char * const ptr
declare ptr as const pointer to const char

char const * const ptr
declare ptr as const pointer to const char

…makes both the pointer and what it points to read-only.

Why do I care?

You probably don’t. However, any time you pass a buffer in to a function that is NOT supposed to modify it, you should make sure that buffer is read-only. (That was more or less the point of my 2017 post.)

#include <stdio.h>
#include <string.h>

void function (char *bufferPtr, size_t bufferSize)
{
    // I can modify this!
    bufferPtr[0] = 42;
}

int main()
{
    char buffer[80];
    
    strncpy (buffer, "Hello, world!", sizeof(buffer));
    
    printf ("%s\n", buffer);
    
    function (buffer, sizeof(buffer));
    
    printf ("%s\n", buffer);


    return 0;
}

When I run that, it will print “Hello, world!” and then print “*ello, world!”

If we do not want the function to be able to modify/corrupt the buffer (easily), adding const solves that:

#include <stdio.h>
#include <string.h>

void function (const char *bufferPtr, size_t bufferSize)
{
    // I can NOT modify this!
    bufferPtr[0] = 42;
}

int main()
{
    char buffer[80];
    
    strncpy (buffer, "Hello, world!", sizeof(buffer));
    
    printf ("%s\n", buffer);
    
    function (buffer, sizeof(buffer));
    
    printf ("%s\n", buffer);

    return 0;
}

But, because the pointer itself was not protected with const, inside the routine it could modify the pointer:

#include <stdio.h>
#include <string.h>

void function (const char const *bufferPtr, size_t bufferSize)
{
    // I can NOT modify this!
    //bufferPtr[0] = 42;
    
    while (*bufferPtr != '\0')
    {
        printf ("%02x ", *bufferPtr);
        
        bufferPtr++; // Increment the pointer
    }
    
    printf ("\n");
}

int main()
{
    char buffer[80];
    
    strncpy (buffer, "Hello, world!", sizeof(buffer));
    
    printf ("%s\n", buffer);
    
    function (buffer, sizeof(buffer));
    
    printf ("%s\n", buffer);

    return 0;
}

In that example, the pointer is passed in, and can be changed. But, since it was passed in, what gets changed is the temporary variable used by the function, similarly to when you pass in a variable and modify it inside a function and the variable can be changed in the function without affecting the variable that was passed in:

void numberTest (int number)
{
    printf ("%d -> ", number);
    
    number++;

    printf ("%d\n", number);
}

int main()
{
    int number = 42;
    
    printf ("Before function: %d\n", number);
    
    numberTest (number);
    
    printf ("After function: %d\n", number);

    return 0;
}

Because of that temporary nature, I don’t see any reason to restrict the pointer to be read-only. Any changes made to it within the function will be to a copy of the pointer.

In fact, even if you declare that pointer as a const, the temporary copy inside the function can still be modified:

void function (const char const *bufferPtr, size_t bufferSize)
{
// I can NOT modify this!
//bufferPtr[0] = 42;

while (*bufferPtr != '\0')
{
printf ("%02x ", *bufferPtr);

bufferPtr++; // Increment the pointer
}

printf ("\n");
}

Offhand, I cannot think of any reason you would want to pass a pointer in to a function and then not let the function use that pointer by changing it. Maybe there are some? Leave a comment…

The moral of the story is…

The important takeaway is to always use const when you are passing in a buffer you do not want to be modified by the function. And leave it out when you DO want the buffer modified:

#include <stdio.h>
#include <string.h>
#include <ctype.h>

// Uppercase string in buffer.
void function (char *bufferPtr, size_t bufferSize)
{
    while ((*bufferPtr != '\0') && (bufferSize > 0))
    {
        *bufferPtr = toupper(*bufferPtr);
        
        bufferPtr++; // Increment the pointer
        bufferSize--; // Decrement how many bytes left
    }
}

int main()
{
    char buffer[80];
    
    strncpy (buffer, "Hello, world!", sizeof(buffer));
    
    printf ("%s\n", buffer);
    
    function (buffer, sizeof(buffer));
    
    printf ("%s\n", buffer);

    return 0;
}

And if you pass that a non-modifiable string (like a real read-only constant string stored in program space or ROM or whatever), you might have a different issue to deal with. In the case of the PIC24 compiler I use, it flat out won’t let you pass in a constant string like this:

function ("CCS PIC compiler will not allow this", 80);

They have a special compiler setting which will generate code to copy any string literals into RAM before calling the function (at the tradeoff of extra code space, CPU time, and memory);

#device PASS_STRINGS=IN_RAM

But I digress. This was just about const.

Oddly, when I do the same thing in the GDB online Debugger, it happily does it. I don’t know why — surely it’s not modifying program space? Perhaps it is copying the string in to RAM behind the scenes, much like the CCS compiler can do. Or perhaps it is blindly writing to program space and there is no exception/memory protection stopping it.

Well, it crashes if I run the same code on a Windows machine using the Code::Blocks IDE (GCC compiler).

One more thing…

You could, of course, try to cheat. Inside the function that is passed a const you can make a non-const and just assign it:

// Uppercase string in buffer.
void function (const char *bufferPtr, size_t bufferSize)
{
char *ptr = bufferPtr;

while ((*ptr != '\0') && (bufferSize > 0))
{
*ptr = toupper(*ptr);

putchar (*ptr);

ptr++; // Increment the pointer
bufferSize--; // Decrement how many bytes left
}

putchar ('\n');
}

This will work if your compiler is not set to warn you about it. On GCC, mine will compile, but will emit a warning:

main.c: In function ‘function’:
main.c:16:17: warning: initialization discards ‘const’ qualifier from pointer target type [-Wdiscarded-qualifiers]
16 | char *ptr = bufferPtr;

For programmers who ignore compiler warnings, you now have code that can corrupt/modify memory that was designed not to be touched. So keep those warnings cranked up and pay attention to them if your code is important.

Comment away. I learn so much from all of you.

C coding standard recomendations?

Only one of the programming jobs I have had used a coding standard. Their standard, created in-house, is more or less the standard I follow today. It includes things like:

  • Prefix global variables with g_
  • Prefix static variables with s_ (for local statics) or S_ (for global statics)

It also required the use of braces, which I have blogged about before, even in single-line instances such as:

if (fault == true)
{
    BlinkScaryRedLight();
}

Many of these took me a bit to get used to because they are different than how I do things. Long after that job, I have adopted many/most of that standard in my own personal style due to accepting the logic behind it.

I thought I’d ask here: Are there any good “widely accepted” C coding standards out there you would recommend? Adopting something widely used might make code easier for a new hire to adapt to, versus “now I have to learn yet another way to format my braces and name my variables.”

Comments appreciated.

C and the case of the missing globals…

Even though I first started learning C using a K&R compiler on a 1980s 8-bit home computer, I still feel like I barely know the language. I learn something new regularly, usually by seeing code someone else wrote.

There are a few common “rules” in C programming that most C programmers I know agree with:

  • Do not use goto, even if it is an intentional supported part of the language.
  • Do not use globals.

I have seen many cases for both. In the case of goto, I have seen code that would otherwise be very convoluted with nested braces and comparisons solved simply by jumping out of the block of code with a goto. I still can’t bring myself to use goto in C, even though as I type this I feel like I actually did at some point. (Do I get a pass on using that, since it was a silly experiment where I was porting BASIC — which uses GOTO — to C, as literally as possible?)

But I digress…

A case for globals – laziness

Often, globals are used out of sheer laziness. Like, suppose you have a function that does something and you don’t want to have to update every use of it to deal with a parameter. I am guilty of this when I needed to make a function more flexible, and did not have time to go update every instance and use of the function to pass in a variable:

void InitializeCommunications ()
{
     InitI2C (g_Kbps);
}

In that case, there would be some global (I put a “g_” the variable name so it would be easy to spot as a global later) containing a baud rate, and any place that called that function could use it. Changing the global would make subsequent calls to the function use the new baud rate.

Bad example, but it is what it is.

A case for globals – speed

I have also resorted to using globals to speed things up. One project I worked on had dozens of windows (“panels”) and the original programmer had created a lookup function to return that handle based on a based in #define value:

int GetHandle (int panelID)
{
   int panelHandle = -1;

   switch (panelID)
   {
      case MAIN_PANEL:
         panelHandle = xxxx;
         break;

      case MAIN_OPTIONS:
         panelHandle = xxxx;
         break;
...etc...

Every function that used them would get the ID first by calling that routine:

handle = GetHandle (PANEL_MAIN);

SetPanelColor (handle, COLOR_BLUE); // or whatever

As the program grew, more and more panels were added, and it would take more and more time to look up panels at the bottom of the list. As an optimization I just decided to make al the panel handles global, so any use could just be:

SetPanelColor (g_MainPanel, COLOR_BLUE); // or whatever

This quick-and-dirty change ended up having about a 10% reduction in CPU usage — this thing uses a ton of panel accesses! And it was pretty quick and simple to do.

Desperate times.

An alternative to globals

The main replacement I see for globals are structures, declared during startup, then passed around by pointer. I’ve seen these called “context” or “runtime” structures. For example, some code I work on creates a big structure of “things” and then any place that needs one of those things accesses it:

InitI2C (runTime.baudRate);

But as you might guess, “runTime” is a global structure so any part of the code could access it (or manipulate it, or mess it up). The main benefit I see of making things a global structure is you have to know what you are doing. If you had globals like this:

// Globals
int index = 0;
int baudRate = 0;

…you might be surprised if you tried to use a local variable “index” or “baudRate” and got it confused with the global. (I actually ran in to a bug where there was a global named simply “index” and there was some code that had meant to have a local variable called “index” but forgot to declare it, thus it was always screwing with the global index which was used elsewhere in the code. This was a simple accident that caused alot of weird problems before it was identified and fixed.

Prepending something like “g_index” at least makes it clear you are using a global, so you could have a local “index” and not risk messing up the global “g_index”.

To me, using that global runtime structure is just a slower way to do that, since in embedded compilers I have tested, accessing a global something like “foo.x” is slower than just accessing a global “x”. I have also seen it to take more code space, and I had to remove all such references in one tightly restrained product to save just enough bytes to add some needed new code.

Yes, I have ran in to many situations where a tiny bit of extra memory space or a tiny bit of extra code space made the difference between getting something done, or not.

A cleaner approach?

Ideally, code could pass around a “context” structure, and then nothing could ever access it without specifically being handed it. Consider this:

int main ()
{
   int status = SUCCESS;

   // Allocate out context:
   RunTimeStruct runTime;

   ...

   status = StartProgram (&runTime);

   return status;
}

int BeginProgram (RunTimeStruct *runTime)
{
    InitializeCommunications (runTime->baudRate);

    status = DoSomething (runTime);

    return status;
}

The idea seems to be that once you had the runTime structure, you could pass in specific elements to a function (such as the baud rate), or pass along the entire context for routines that needed full access.

This feels like a nice approach to me since passing one pointer in is fast, and it still offers protection when you decide to pass in just one (or a few) specific items to a function. No code can legally touch those variables if it doesn’t have the context structure.

But what about globals that aren’t globals?

And now the point of this article. Something I learned from this project was an interesting use of “globals” that were not globals. There were functions that declared static structures, and would return the address of the structure:

RunTimeStruct *GetRunTimeData (void)
{
   static RunTimeStruct runTimeDate;

   return &runTimeData;
}

Now any place that needed it could just ask for it:

RunTimeStruct *runTime = GetRunTimeData (); 

runTime->baudRate = 300;

This seems like a hybrid approach. You can never accidentally use them, like you might with just a global “int index” or whatever, but if you did, you could get to them without needing a context passed in. It seems like a good compromise between safety and laziness.

It also means those functions could easily be adapted to return blocks of thread-safe variables, with a “Release” function at the end. (This is actually how the thread-safe variables work in the LabWindows/CVI environment I use at my day job.)

RunTimeStruct *runTime = GetRunTimeData (); 

runTime->baudRate = 300;

ReleaseRunTimeData ();

What do you do?

Since I like learning, I thought I’d write this up and ask you what YOU do. Show me your superior method, and why it is superior. I’ve seen so many different approaches to passing data around, so share yours in a comment.

Until next time…

C and concatenating strings…

Imagine running across a block of C code that is meant to create a comma separated list of items like this:

2024/10/18,12:30:06,100,0.00,0,0,0,902.0,902.0,928.0,31.75,0,0,100 ...
2024/10/18,12:30:07,100,0.00,0,0,0,902.0,902.0,928.0,31.75,0,0,100 ...
2024/10/18,12:30:08,100,0.00,0,0,0,902.0,902.0,928.0,31.75,0,0,100 ...
2024/10/18,12:30:09,100,0.00,0,0,0,902.0,902.0,928.0,31.75,0,0,100 ...

And the code that does it was modular, so new items could be inserted anywhere, easily. This is quite flexible:

snprintf (buffer, BUFFER_LENGTH, "%u", value1);
strcat (outputLineBuffer, buffer);
strcat (outputLineBuffer, ",");

snprintf (buffer, BUFFER_LENGTH, "%u", value2);
strcat (outputLineBuffer, buffer);
strcat (outputLineBuffer, ",");

snprintf (buffer, BUFFER_LENGTH, "%.1f", value3);
strcat (outputLineBuffer, buffer);
strcat (outputLineBuffer, ",");

snprintf (buffer, BUFFER_LENGTH, "%.1f", value4);
strcat (outputLineBuffer, buffer);
strcat (outputLineBuffer, ",");

snprintf is used to format the variable into a temporary buffer, then strcat is used to append that temporary buffer to the end of the line output buffer that will be written to the file. Then strcat is used again to append a comma… and so on.

Let’s ignore the use of the “unsafe” strcat which can trash past the end of a buffer is the NIL (“\0”) zero byte is not found. We’ll just say strncat exists for a reason and can prevent buffer overruns crashing a system.

Many C programmers never think about what goes on “behind the scenes” when a function is called. It just does what it does. Only if you are on a memory constrained system may you care about how large the code is, and only on a slow system may you care about how slow the code is. As an embedded C programmer, I care about both since my systems are often slow and memory constrained.

strcat does what?

The C string functions rely on finding a zero byte at the end of the string. strlen, for example, starts at the beginning of the string then counts until if finds a 0, and returns that as the size:

size_t stringLength = strlen (someString);

And strcat would do something similar, starting at the address of the string passed in, then moving forward until a zero is found, then copying the new string bytes there up until it finds a zero byte at the end of the string to be copied. (Or, in the case of strncat, it might stop before the zero if a max length is reached.)

I have previously written about these string functions, including showing implementations of “safe” functions that are missing from the standard C libraries. See my earlier article series about these C string functions. And this one.

But what does strcat look like? I had assumed it might be implemented like this:

char *myStrcat (char *dest, const char *src)
{
// Scan forward to find the 0 at the end of the dest string.
while (*dest != 0)
{
dest++;
}

// Copy src string to the end.
do
{
*dest = *src;

dest++;
src++;
} while (*src != 0);

return dest;
}

That is a very simple way to do it.

But why re-invent the wheel? I looked to see what GCC does, and their implementation of strcat makes use of strlen and strcpy:

/* Append SRC on the end of DEST.  */
char *
STRCAT (char *dest, const char *src)
{
strcpy (dest + strlen (dest), src);
return dest;
}

When you use strcat on GCC, it does the following:

  1. Call strlen() to count all the characters in the destination string up to the 0.
  2. Call strcpy() to copy the source string to the destination string address PLUS the length calculated in step 1.
  3. …and strcpy() is doing it’s own thing where it copies characters until it finds the 0 at the end of the source string.

Reusing code is efficient. If I were to write a strcat that worked like GCC, it would be different than my quick-and-dirty implementation above.

This is slow, isn’t it?

In the code I was looking at…

snprintf (buffer, BUFFER_LENGTH, "%u", value1);
strcat (outputLineBuffer, buffer);
strcat (outputLineBuffer, ",");

…there is alot going on. First snprint does alot of work to convert the variable in to string characters. Next, strcat is calling strlen on the outputLineBuffer, then calling strcpy. Finally, strcat calls strlen on the outputLineBuffer again, then calls strcpy to copy over the comma character.

That is alot of counting from the start of the destination string to the end, and each step along the way is more and more work since there are more characters to copy. Suppose you were going to write out ten five-digit numbers:

11111,22222,33333,44444,55555

The first number is quick because nothing is in the destinations string yet so strcat has nothing to count. “11111” is copied. Then, there is a scan of five characters to get to the end, then the comma is copied.

For the second number, strcat has to scan past SIX characters (“11111,”) and then the process continues.

The the third number has to scan past TWELVE (“11111,22222,” characters.

Each entry gets progressively slower and slower as the string gets longer and longer.

Can we make it faster?

If things were set in stone, you could do this all with one snprint like this:

snprintf ("%u,%u,%.1f,%.1f\n", value1, value2, value3, value4);

Since printf is being used to format all the values in to a buffer, then doing the whole string with one call to snprintf may be the smallest and fastest way to do this.

But if you are dealing with something with dozens of values, when you go in later to add one in the middle, there is great room for error if you get off by a comma or a variable in the parameter list somewhere.

I suspect this is why the code I am seeing was written the way it was. It makes adding something in the middle as easy as adding three new lines:

snprintf (buffer, BUFFER_LENGTH, "%u", newValue);
strcat (outputLineBuffer, buffer);
strcat (outputLineBuffer, ",");

Had fifty parameters been inside some long printf, there is a much greater chance of error when updating the code later to add new fields. The “Clean Code” philosophy says we spend more time maintaining and updating and fixing code than we do writing it, so writing it “simple and easy to understand” initially can be a huge time savings. (Spend a bit more time up front, save much time later.)

So since I want to leave it as-is, this is my suggestion which will cut the string copies in half: just put the comma in the printf:

snprintf (buffer, BUFFER_LENGTH, "%u,", value1);
strcat (outputLineBuffer, buffer);

snprintf (buffer, BUFFER_LENGTH, "%u,", value2);
strcat (outputLineBuffer, buffer);

snprintf (buffer, BUFFER_LENGTH, "%.1f,", value3);
strcat (outputLineBuffer, buffer);

snprintf (buffer, BUFFER_LENGTH, "%.1f,", value4);
strcat (outputLineBuffer, buffer);

And that is a very simple way to reduce the times the computer has to spend starting at the front of the string and counting every character forward until a 0 is found. For fifty parameters, instead of doing that scan 100 times, now we only do it 50.

And that is a nice savings of CPU time, and also saves some code space by eliminating all the extra calls to strcat.

You know better, don’t you?

But I bet some of you have an even better and/or simpler way to do this.

Comment away…

printf portability problems persist… possibly.

TL:DNR – You all probably already knew this, but I just learned about inttypes.h. (Well, not actually “just”; I found out about it last year for a different reason, but I re-learned about it now for this issue…)

I was today years old when I learned that there was a solution to a bothersome warning that most C programmers probably never face: printing ints or longs in code that will compile on 16-bit or 32/64-bit systems.

For example, this code works fine on my 16-bit PIC24 compiler and a 32/64-bit compiler:

int x = 42;
printf ("X is %dn", x);

long y = 42;
printf ("Y is %ldn", y);

This is because “%d” represents and “int“, whatever that is on the system — 16-bit, 32-bit or 64-bit — and “%ld” represents a “long int“, whatever that is on the system.

On my 16-bit PIC24 compiler, “int” is 16-bits and “long int” is 32-bits.

On my PC compiler “int” is 32-bits, and “long int” is 64-bits.

But int isn’t portable, is int?

As far as I recall, the C standard says an int is “at least 16-bits.” If you want to represent a 16-bit value in any compliant ANSI-C code, you can use int. It may be using 32 or 64 bits (or more?), but it will at least hold 16-bits.

What if you need to represent 32 bits? This code works fine on my PC compiler, but would not work as expected on my 16-bit system:

unsigned int value = 0xaabbaabb;

printf ("value: %u (0x%x) - ", value, value);

for (int bit = 31; bit >= 0; bit--)
{
    if ( (value & (1<<bit)) == 0)
    {
        printf ("0");
    }
    else
    {
        printf ("1");
    }
}
printf ("n");

On a 16-bit system, an “unsigned int” only holds 16-bits, so the results will not be what one would expect. (A good compiler might even warn you about that, if you have warnings enabled… which you should.)

stdint.h, anyone?

In my embedded world, writing generic ANSI-C code is not always optimal. If we must have 32-bits, using “long int” works on my current system, but what if that code gets ported to a 32-bit ARM processor later? On that machine, “int” becomes 32-bits, and “long” might be 64-bits.

Having too many bits is not as much of an issue as not having enough, but the stdint.h header file solves this by letting us request what we actually want to use. For example:

#include <stdio.h>
#include <stdint.h> // added

int main()
{
    uint32_t value = 0xaabbaabb; // changed
    
    printf ("value: %u (0x%x) - ", value, value);
    
    for (int bit = 31; bit >= 0; bit--)
    {
        if ( (value & (1<<bit)) == 0)
        {
            printf ("0");
        }
        else
        {
            printf ("1");
        }
    }
    printf ("n");

    return 0;
}

Now we have code that works on a 16-bit system as well as a 32/64-bit system.

Or do we?

There is a problem, which I never knew the solution to until recently.

printf ("value: %u (0x%x) - ", value, value);

That line will compile without warnings on my PC compiler, but I get a warning on my 16-bit compiler. On a 16-bit compiler, “%u” is for printing an “unsigned int”, as is “%x”. But on that compiler, the “uint32_t” represents a 32-bit value. Normal 16-bit compilers would probably call this an “unsigned long”, but my PIC24 compiler has its own internal variable types, so I see this in stdint.h:

typedef unsigned int32 uint32_t;

On the Arduino IDE, it looks more normal:

typedef unsigned long int uint32_t;

And a “good” compiler (with warnings enabled) should alert you that you are trying to print a variable larger than the “%u” or “%x” handles.

So while this works fine on my 32-bit compiler…

// For my 32/64-bit system:
uint32_t value32 = 42;
printf ("%u", value32);

…it gives a warning on the 16-bit ones. To make it compile on the 16-bit compiler, I change it to use “%lu” like this:

// For my 16-bit system:
uint32_t value32 = 42;
printf ("%lu", value32);

…but then that code will generate a compiler warning on my 32/64-bit system ;-)

There are some #ifdefs you can use to detect architecture, or make your own using sizeof() and such, that can make code that compiles without warnings, but C already solved this for us.

Hello, inttypes.h! Where have you been all my C-life?

On a whim, I asked ChatGPT about this the other day and it showed me define/macros that are in inttypes.h that take care of this.

If you want to print a 32-bit value, instead of using “%u” (on a 32/64-bit system) or “%lu” on a 16-bit, you can use PRIu32 which represents whatever print code is needed to print a “u” that is 32-bits:

#define PRIu32 "lu"

Instead of this…

uint32_t value = 42;
printf ("value is %u\n", value);

…you do this:

uint32_t value = 42;
printf ("value is %" PRIu32 "\n", value);

Because of how the C preprocessor concatenates strings, that ends up creating:

printf ("value is %lu\n", value); // %lu

But on a 32/64-bit compiler, that same header file might represent it as:

#define PRIu32 "u"

Thus, writing that same code using this define would produce this on the 32/64-bit system:

printf ("value is %u\n", value); // %u

Tada! Warnings eliminated.

And now I realize I have used this before, for a different reason:

uintptr_t
PRIxPTR

If you try to print the address of something, like this:

void *ptr = 0x1234;
printf ("ptr is 0x%x\n", ptr);

…you should get a compiler warning similar to this:

warning: format ‘%x’ expects argument of type ‘unsigned int’, but argument 2 has type ‘void *’ [-Wformat=]

%x is for printing an “unsigned int”, and ptr is a “void *”. Over the years, I made this go away by casting:

printf ("ptr is 0x%x\n", (unsigned int)ptr);

But, on my 32/64-bit compiler, the “unsigned int” is a 32-bit value, and %x is not for 32-bit values. Thus, I still get a warning. There, I would use “%lx” for a “long int”.

To make that go away, last year I learned about using PRIxPTR to represent the printf code for printing a pointer as hex:

printf ("pointer is 0x%" PRIxPTR "\n",

On my 16-bit compiler, it is:

#define PRIxPTR "lx"

This is because pointers are 32-bit on a PIC24 (even though an “int” on that same system is 16-bits).

On the 32/64-bit compiler (GNU-C in this case), it changes depending on if the system:

#ifdef _WIN64
...
#define PRIxPTR "I64x" // 64-bit mode
...
else
...
#define PRIxPTR "x" // 32-bit mode
...
#endif

I64 is something new to me since I never write 64-bit code, but clearly this shows there is some extended printf formatting for 64-bit values, versus just using “%x” for the default int size (32-bits) and “%lx” for the long size.

Instead of casting to an “(unsigned int)” or “(unsigned long int)” before printing, there is a special “uintptr_t” type that will be “whatever size a pointer is.

This gives me a warning:

printf ("ptr is 0x%" PRIxPTR "\n", (unsigned int)ptr);

warning: format ‘%lx’ expects argument of type ‘long unsigned int’, but argument 2 has type ‘unsigned int’ [-Wformat=]

But I can simply change the casting of the pointer:

printf ("ptr is 0x%" PRIxPTR "\n", (uintptr_t)ptr);

You may have also noticed I still have a warning when declaring the pointer with a value:

void *ptr = 0x1234;

warning: initialization of ‘void *’ from ‘int’ makes pointer from integer without a cast [-Wint-conversion]

Getting rid of this is as simple as making sure the value is cast to a “void *”:

void *ptr = (void*)0x1234;

This is what happens when you learn C on a K&R compiler in the late 1980s and go to sleep for awhile without keeping up with all the subsequent standards, including one from 2023 that I just found out about while typing this up!

Per BING CoPilot…

  • C89/C90 (ANSI X3.159-1989): The first standard for the C programming language, published by ANSI in 1989 and later adopted by ISO as ISO/IEC 9899:1990.
  • C95 (ISO/IEC 9899/AMD1:1995): A normative amendment to the original standard, adding support for international character sets.
  • C99 (ISO/IEC 9899:1999): Introduced several new features, including inline functions, variable-length arrays, and new data types like long long int.
  • C11 (ISO/IEC 9899:2011): Added features like multi-threading support, improved Unicode support, and type-generic macros.
  • C17 (ISO/IEC 9899:2018): A bug-fix release that addressed defects in the C11 standard without introducing new features.
  • C23 (ISO/IEC 9899:2023): The latest standard, which includes various improvements and new features to keep the language modern and efficient.

The more you know…

Though, I assume all the younguns that grew up in the ANSI-C world already know this. I grew up when you had to write functions like this:

/* Function definition */
int add(x, y)
int x, y;
{
return x + y;
}

Now to get myself in the habit of never using “%u”, “%d”, etc. when using stdint.h types…

Until then…

Splitting up strings in C source code.

When printing out multiple lines of text in C, it is common to see code like this:

printf ("+--------------------+\n");
printf ("| Welcome to my BBS! |\n");
printf ("+--------------------+\n");
printf ("| C)hat    G)oodbye  |\n");
printf ("| E)mail   H)elp     |\n");
printf ("+--------------------+\n");

That looks okay, but is calling a function for each line. You could just as easily combine multiple lines and embed the “\n” new line escape code in one long string.

printf ("+--------------------+\n| Welcome to my BBS! |\n+--------------------+\n| C)hat    G)oodbye  |\n| E)mail   H)elp     |\n+--------------------+\n");

Not only does it make the code a bit smaller (no overhead of making the printf call multiple times), it should be a bit faster since it removes the overhead of going in and out of a function.

But man is that ugly.

At some point, I learned about the automatic string concatenation that the C preprocessor (?) does. That allows you to break up quoted lines like this:

const char *message = "This is a very long message that is too wide for "
    "my source code editor so I split it up into separate lines.\n";

“Back in the day” if you had C code that went to the next line, you were supposed to put a \ at the end of the line.

if ((something == true) && \
    (somethingElse == false) && \
    (somethingCompletelyDifferent == banana))
{

…but modern compilers do not seem to care about source code line length, so you can usually do this:

printf ("+--------------------+\n"
        "| Welcome to my BBS! |\n"
        "+--------------------+\n"
        "| C)hat    G)oodbye  |\n"
        "| E)mail   H)elp     |\n"
        "+--------------------+\n");

That looks odd if you aren’t aware of it, but makes for efficient code that is easy to read.

However, not all compilers are created equally. A previous job used a compiler that did not allow constant strings any longer than 80 characters! If you did something like this, it would not compile:

printf ("12345678901234567890123456789012345678901234567890123456789012345678901234567890x");

I had to contact their support to have them explain the weird error it gave me. On that compiler, trying to do this would also fail:

printf ("1234567890"
        "1234567890"
        "1234567890"
        "1234567890"
        "1234567890"
        "1234567890"
        "1234567890"
        "1234567890x");

But that is not important to the story. I just mention it to explain that my background as an embedded C programmer has me limited, often, by sub-standard C compilers that do not support all the greatness you might get on a PC/Mac compiler.

These days, I tend to break all my multi-line prints up like that, so the source code resembles the output:

printf ("This is the first line.\n"
        "\n"
        "And we skipped a line above and below.\n"
        "\n"
        "The end.\n");

I know that may look odd, but it visually indicates that there will be a skipped line between those lines of text, where this does not:

printf ("This is the first line.\n\n"
        "And we skipped a line above and below.\n\n"
        "The end.\n");

Do any of you do this?

And, while today any monitor will display more than 80 columns, printers still default to this 80 column text. Sure, you can downsize the font (but the older I get, the less I want to read small print). Some coding standards I have worked under want source code lines to be under 80 characters, which does make doing a printout code review much easier.

And this led me to breaking up long lines like this…

printf ("This is a very long line that is too long for our"
        "80 character printout\n");

That code would print one line of text, but the source is short enough to fit within the 80 column width preferred by that coding standard.

And here is why I hate it…

I have split lines up like this in the past, and created issues when I later tried to find where in the code some message was generated. For example, if I wanted to find “This is a very long line that is too long for our 80 character printout” and searched for that full string, it would not show up. It does not exist in the source code. It has a break in between.

Even searching for “our 80 character” would not be found due to this.

And that’s the downside of what I just presented, and why you may not want to do it that way.

Thank you for coming to my presentation.

Fantastic C buffers and where to find them.

In my early days of learning C on the Microware OS-9 C compiler running on a Radio Shack Color Computer, I learned about buffers.

char buffer[80];

I recall writing a “line input” routine back then which was based on one I had written in BASIC and then later BASIC09 (for OS-9).

Thirty-plus years later, I find I still end up creating that code again for various projects. Here is a line input routine I wrote for an Arduino project some years ago:

LEDSign/LineInput.ino at master · allenhuffman/LEDSign (github.com)

Or this version, ported to run on a PIC24 using the CCS compiler:

https://www.ccsinfo.com/forum/viewtopic.php?t=58430

That routine looks like this:

byte lineInput(char *buffer, size_t bufsize);

In my code, I could have an input buffer, and call that function to let the user type stuff in to it:

char buffer[80];

len = lineInput (buffer, 80); // 80 is max buffer size

Though, when I first learned this, I was always passing in the address of the buffer, like this:

len = lineInput (&buffer, 80); // 80 is max buffer size

Both work and produce the same memory location. Meanwhile, for other variable types, it is quite different:

int x;

function (x);
function (&x);

I think this may be why one of my former employers had a coding standard that specified passing buffers like this:

len = lineInput (&buffer[0], 80); // 80 is max buffer size

By writing it out as “&buffer[0]” you can read it as “the address of the first byte in this buffer. And that does seem much more clear than “buffer” of “&buffer”. Without more context, these don’t tell you what you need to know:

process (&in);
process (in);

Without looking up what “in” is, we might assume it is some numeric type. The first version passes the address in, so it can be modified, while the second version passes the value in, so if it is modified by the function, it won’t affect the variable outside of that function.

But had I seen…

process (&in[0]);

…I would immediately think that “in” is some kind of array of objects – char? int? floats? – and whatever they were, the function was getting the address of where that array was located in memory.

So thank you, C, for giving us multiple ways to do the same thing — and requiring programmers to know that these are all the same:

#include <stdio.h>

void showAddress (void *ptr)
{
    printf ("ptr = %p\n", ptr);
}

int main()
{
    char buffer[80];
    
    showAddress (buffer);
    
    showAddress (&buffer);
    
    showAddress (&buffer[0]);

    return 0;
}

How do you handle buffers? What is your favorite?

Comments welcome…