Category Archives: Programming

Old C dog, new C tricks part 5: inline prototypes?

See Also: part 1, part 2, part 3, part 4 and part 5.

This post is a departure from what most of the others are like. I am most certainly not going to be using this “trick” I just learned.

Background

Recently in my day job, I was doing a code review and came across something that was most certainly not legal C code. In fact, I was confident that line wouldn’t even compile without issuing a warning. Yet, the developer said he did not see a warning about it.

The bit of code was supposed to be calling a function that returns a populated structure. Consider this silly example:

#include <stdio.h>

// typedefs
typedef struct {
    unsigned int major;
    unsigned int minor;
    unsigned int patch;
} VersionStruct;

// prototypes
VersionStruct GetVersion (void);

// main
int main()
{
    VersionStruct foo;
    
    foo = GetVersion ();
    
    printf ("Version %u.%u.%u\n", foo.major, foo.minor, foo.patch);

    return 0;
}

// functions
VersionStruct GetVersion ()
{
    VersionStruct ver;
    
    ver.major = 1;
    ver.minor = 0;
    ver.patch = 42;
    
    return ver;
}

But in the code, the call to the function was incomplete. There was no return variable, and even had “void” inside the parens. It looked something like this:

int main()
{
    VersionStruct GetVersion (void);

    return 0;
}

I took one look at that and said “no way that’s working.” But there had been no compiler warning.

So off I went to the Online GDB Compiler to type up a quick example.

And it built without warning.

Well, maybe the default is to ignore this warning… So I added “-Wall” and “-Wextra” to the build flags. That should catch it :)

And it built without warning.

“How can this work? It looks like a prototype in the middle of a function!” I asked.

Yes, Virginia. You can have inline prototypes.

A brief bit of searching told me that, yes, inline prototypes were a thing.

This should give a compiler warning:

#include <stdio.h>

int main()
{
function ();

return 0;
}

void function (void)
{
printf ("Inside function.\n");
}

When I built that, I received two compiler warnings:

main.c: At top level:
main.c:10:6: warning: conflicting types for ‘function’; have ‘void(void)’
10 | void function (void)
| ^~~~~~~~
main.c:5:5: note: previous implicit declaration of ‘function’ with type ‘void(void)’
5 | function ();
| ^~~~~~~~

The first warning is not about the missing prototype, but about “conflicting types”. In C, a function without a prototype is assumed to be a function that returns an int.

Had I made function like this…

int function (void)
{
    printf ("Inside function.\n");
    return 0;
}

…I’d see only one, but different, warning:

main.c: In function ‘main’:
main.c:5:5: warning: implicit declaration of function ‘function’ [-Wimplicit-function-declaration]
5 | function ();
| ^~~~~~~~

For the first example, the compiler makes an assumption about what this function should be, then finds code using it the wrong way. It warns me that I am not using it like the implied prototype says it should be used. Sorta.

For the next, my function matches the implied prototype, so those warnings go away, but a real “implicit declaration” warning is given.

Going back to the original “void” code, I can add an inline prototype in main() to make these all go away:

#include <stdio.h>

int main()
{
    void function(void); // Inline prototype?
    
    function ();

    return 0;
}

void function (void)
{
    printf ("Inside function.\n");
}

I had no idea that was allowed.

I have no idea why one would do that. BUT, I suppose if you wanted to get to one function without an include file with a prototype for it, you could just stick that right before you call the function…

But Why would you want to do that?

I learned this is possible. I do not think I want to ever do this. Am I missing some great benefit for being able to have a prototype inside a function like this? Is there some “clean code” recommendation that might actually say this is useful?

“It wouldn’t be in there if it didn’t have a reason.”

Let me know what you know in the comments. Until next time…

Old C dog, new C tricks part 4: no more passing buffers?

In the previous installment, I rambled on about the difference of “char *line;” and “char line[];”. The first is a “pointer to char(s)” and the second is “an array of char(s)”. But, when you pass them into a function, they both are treated as a pointer to those chars.

One “benefit” of using “line[]” was that you could use sizeof(line) on it and get the byte count of the array. This is faster than using strlen().

But if you pass it into a function, all you have is a pointer so strlen() is what you have to use.

While you can’t pass an “array of char” into a function as an array of char, you can pass a structure that contains an “array of char” and sizeof() will work on that:

#include <stdio.h>
#include <stdlib.h> // for EXIT_SUCCESS
#include <string.h>
typedef struct
{
    char buffer[80];
} MyStruct;
void function (MyStruct test)
{
    printf ("sizeof(test.buffer) = %zu\n", sizeof(test.buffer));
}
int main(void)
{
    MyStruct test;
    
    strncpy (test.buffer, "This is a test", sizeof(test.buffer));
    
    function (test);
    return EXIT_SUCCESS;
}

You may notice that was passing a copy of the structure in, but stay with me for a moment.

If you have a function that is supposed to copy data into a buffer:

#define VERSION_STRING "1.0.42b-alpha"
void GetVersion (char *buffer)
{
    if (NULL != buffer)
    {
        strcpy (buffer, VERSION_STRING);
    }
}

…you can easily have a buffer overrun problem if the function writes more data than is available in the caller’s buffer. Because of this potential problem, I add a size parameter to such functions:

void GetVersion (char *buffer, size_t bufferSize)
{
    if (NULL != buffer)
    {
        // Copy up to bufferSize bytes.
        strncpy (buffer, VERSION_STRING, bufferSize);
    }
}

As long as the caller passes the correct parameters, this is safe:

char buffer[20];
GetVersion (buffer, 20);

But the caller could still screw up:

char buffer[20];
GetVersion (buffer, 200); // oops, one too many zeros

But if you use a structure, it is impossible for the caller to mess it up (though, of course, they could mess up the structure on their side before calling your function). The compiler type checking will flag if the wrong data type is passed in. The “buffer” will always be the “buffer.” No chance of a “bad pointer” or “buffer overrun” crashing the program.

To allow the buffer inside the structure to be modified, pass it in by reference:

#include <stdio.h>
#include <stdlib.h> // for EXIT_SUCCESS
#include <string.h>
#define VERSION_STRING "1.0.42b-alpha"
typedef struct
{
    char buffer[80];
} MyStruct;
void GetVersion (MyStruct *test)
{
    strncpy (test->buffer, VERSION_STRING, sizeof(test->buffer));
}
int main(void)
{
    MyStruct test;
    
    GetVersion (&test);
    
    printf ("Version: %s\n", test.buffer);
    return EXIT_SUCCESS;
}

Using this approach, you can safely pass a “buffer” into functions and they can get the sizeof() the buffer to ensure they do not overwrite anything.

But wait, there’s more…

It is pretty easy for a function to get out of control if you are trying to get back more than one thing. If you just want an “int”, that’s easy…

int GetCounter ()
{
    static int s_count = 0;
    return s_count++;
}

But if you wanted to get the major, minor, patch and build version, you end up passing in ints by reference to get something like this:

void GetVersion (int *major, int *minor, int *patch, int *build)
{
   if (NULL != major)
   {
      *major = MAJOR_VERSION;
   }
   if (NULL != minor)
   {
      *major = MINOR_VERSION;
   }
   if (NULL != patch)
   {
      *major = PATCH_VERSION;
   }
   if (NULL != build)
   {
      *major = BUILD_VERSION;
   }
}

Of course, anytime pointers are involved, the caller could pass in the wrong pointer and things could get screwed up. Plus, look at all those NULL checks to make sure the pointer isn’t 0. (This does not help if the pointer is pointing to some random location in memory.)

#include <stdio.h>
#include <stdlib.h> // for EXIT_SUCCESS
#define MAJOR_VERSION 1
#define MINOR_VERSION 0
#define PATCH_VERSION 0
#define BUILD_VERSION 42
typedef struct
{
    int major;
    int minor;
    int patch;
    int build;
} VersionStruct;
VersionStruct GetVersion ()
{
    VersionStruct ver;
    
    ver.major = MAJOR_VERSION;
    ver.minor = MINOR_VERSION;
    ver.patch = PATCH_VERSION;
    ver.build = BUILD_VERSION;
    
    return ver;
}
int main(void)
{
    VersionStruct ver;
    
    ver = GetVersion ();
    
    printf ("Version: %u.%u.%u.%u\n",
        ver.major, ver.minor, ver.patch, ver.build);
    return EXIT_SUCCESS;
}

If you are concerned about overhead of passing structures, you can pass them by reference (pointer) and the compiler should still catch if a wrong pointer type is passed in:

#include <stdio.h>
#include <stdlib.h> // for EXIT_SUCCESS
#define MAJOR_VERSION 1
#define MINOR_VERSION 0
#define PATCH_VERSION 0
#define BUILD_VERSION 42
typedef struct
{
    int major;
    int minor;
    int patch;
    int build;
} VersionStruct;
void GetVersion (VersionStruct *ver)
{
    if (NULL != ver)
    {
        ver->major = MAJOR_VERSION;
        ver->minor = MINOR_VERSION;
        ver->patch = PATCH_VERSION;
        ver->build = BUILD_VERSION;
    }
}
int main(void)
{
    VersionStruct ver;
    
    GetVersion (&ver);
    
    printf ("Version: %u.%u.%u.%u\n",
        ver.major, ver.minor, ver.patch, ver.build);
    return EXIT_SUCCESS;
}

However, when dealing with pointers, there is always some risk. While the compiler will catch passing in the wrong structure pointer, there are still ways the caller can screw it up. For instance, void pointers:

int main(void)
{
    void *nothing = (void*)0x1234;
    
    GetVersion (nothing);
    return EXIT_SUCCESS;
}

Yep. Crash.

...Program finished with exit code 139
Press ENTER to exit console.

Give someone access to a function in your DLL and they might find a way to crash the program as simply as using a void pointer.

It is a bit trickier when you pass the full structure:

typedef struct
{
    int x;
} BogusStruct;
int main(void)
{
    BogusStruct ver;
    
    ver = GetVersion ();
    
    return EXIT_SUCCESS;
}

Compiler don’t like:

main.c: In function ‘main’:
main.c:38:11: error: incompatible types when assigning to type ‘BogusStruct’ from type ‘VersionStruct’
38 | ver = GetVersion ();
| ^~~~~~~~~~
main.c:36:17: warning: variable ‘ver’ set but not used [-Wunused-but-set-variable]
36 | BogusStruct ver;
| ^~~

And you can’t really cast a return value like this:

int main(void)
{
BogusStruct ver;

(VersionStruct)ver = GetVersion ();

return EXIT_SUCCESS;
}

Compiler don’t like:

main.c: In function ‘main’:
main.c:38:5: error: conversion to non-scalar type requested
38 | (VersionStruct)ver = GetVersion ();
| ^

Though maybe you could cast it if it was passed in as a parameter:

void ShowVersion (VersionStruct ver)
{
    printf ("Version: %u.%u.%u.%u\n",
    ver.major, ver.minor, ver.patch, ver.build);
}
int main(void)
{
    BogusStruct ver;
    
    ShowVersion ((VersionStruct)ver);
    
    return EXIT_SUCCESS;
}

Compiler still don’t like:

main.c: In function ‘main’:
main.c:44:5: error: conversion to non-scalar type requested
44 | ShowVersion ((VersionStruct)ver);
| ^~~~~~~~~~~

Hmm. Is there a way to screw this up? Let me know in the comments.

Until then…

Old C dog, new C tricks part 3: char *line vs char line[]

You, of course, already knew this. But I learn from your comments, so please leave some. Thanks!

This may be the next “old dog, new trick” I adapt too.

When I started learning C in the late 1980s, I had a compiler manual (not very useful for learning the language) and a Pocket C Reference book — both for pre-ANSI K&R C. I may have had another “big” C book, but I mostly remember using the Pocket C Book.

Looking back at some of my early code, I find I was declaring “fixed” strings like this:

And this shows us:

char version[5]="0.00"; /* Version number... */

Odd. Did I really count the bytes (plus 0 at the end) for every string like that? Not always. I found this one:

char filename[28]="cocofest3.map";

…but I think I remember why 28. In the OS-9/6809 operating system, directory entries were 32 bytes. The first 28 were the filename (yep, back in the 80s there were operating systems with filenames longer than FILENAME.EXT), and then three at the end were the LSN (logical sector number) where the File ID sector was. (More or less accurate.)

I also found arrays:

int *days[] = { "Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat" };

But why is that an int??? That must have been a typo/bug. Other places, I did it more correctly:

char *items[] = { /* employee info prompt thingies */
   "Employee :",
   "Min/Week :",
   "Max/Week :",
   "Min/Shift:",
   "Max/Shift:"
};

At some point in my C history, I just started using pointers to strings like this:

char *version = "1.0.42b-delta";

I guess I got tired of [brackets]. I mean, “they work the same way”, don’t they?

void function (char *line)
{
    if (NULL != line)
    {
        printf ("Line: '%s'\n", line);
    }
}

…and…

void function (char line[])
{
    if (NULL != line)
    {
        printf ("Line: '%s'\n", line);
    }
}

…both end up with line pointing to the start of wherever the bytes to that string are in memory. I’ve seen main() done the same ways:

int main (int arc, char *argv[] )

…and…

int main ( int argc, char **argv )

For years, I’ve been doing it the second way, but all my early code was *argv[] so I suspect that is how I learned it from my early K&R C books.

I have no idea why I changed, or when, but probably in the mid-to-late 1990s. I started working for Microware Systems Corporation in Des Moines, Iowa in 1995. This was the first place I used an ANSI-C compiler. In code samples from the training courses I taught, some used “*argv[]” but ones I wrote used “**argv”.

Does it matter?

Not really. But let’s talk about it anyway…

There was a comment left on one my articles last year that pointed out something different I had no considered: sizeof

If you have “char *line” you cannot use sizeof() to give you anything but the size of the pointer (“sizeof(line)”) or the size of a character (or whatever data type used) that it points to (“sizeof(*line)”).

If you have “char line[]”, you can get the size of the array (number of characters, in this case) or the size of one of the elements in it:

#include <stdio.h>
#include <stdlib.h> // for EXIT_SUCCESS
int main(void)
{
    char *line1 = "1234567890";
    
    char line2[] = "1234567890";
    
    printf ("sizeof(line1)  = %zu\n", sizeof(line1));
    printf ("sizeof(*line1) = %zu\n", sizeof(*line1));
    printf ("\n");
    printf ("sizeof(line2)  = %zu\n", sizeof(line2));
    printf ("sizeof(*line2) = %zu\n", sizeof(*line2));
    return EXIT_SUCCESS;
}

This produces:

sizeof(line1)  = 8  <- size of a 64-bit pointer
sizeof(*line1) = 1 <- size of a char

sizeof(line2) = 11 <- size of the character array
sizeof(*line2) = 1 <- size of a char

I cannot remember ever using sizeof() on a string constant. You may recall I was surprised it worked when I learned about it a few months ago.

But, now that I am aware, I think I may start moving myself back to where I started and using the [brackets] when I have constant strings. Using sizeof() in the program just embeds a constant value, while strlen() is a function that walks through each byte looking for the end zero, thus adding more code space and more execution time.

If I wanted to copy some constant string into a buffer, I could try these two approaches:

// Copy message into buffer.
char *line1 = "This is a message.";
strncpy (buffer, line1, strlen(line1)); // strlen
printf ("Buffer: '%s'\n", buffer);
char line2[] = "This is a message.";
strncpy (buffer, line2, sizeof(*line2)); // sizeof
printf ("Buffer: '%s'\n", buffer);

And the results are the same:

Buffer: 'This is a message.'
Buffer: 'This is a message.'

I would use the second version since using sizeof(*line2) avoids the overhead of strlen() scanning through each byte in the string looking for the end zero.

NOTE: As was pointed out in the comments, strlen() returns the number of characters up to the zero. “Hello” is a strlen() of 5. But sizeof() is the full array or characters including the 0 at the end so “Hello” would have a sizeof() of 6.

char line[] = "1234567890";
printf ("strlen(line) = %u\n", strlen(line));
printf ("sizeof(line) = %u\n", sizeof(line));
strlen(line) = 10
sizeof(line) = 11

If you wanted them to be the same, it would be “sizeof(line)-1”.

It’s all fun and games until you pass a parameter…

This “benefit” of sizeof() is not useful if you are passing the string in to a function. It just ends up like a pointer to wherever the string is stored:

#include <stdio.h>
#include <stdlib.h> // for EXIT_SUCCESS
#include <string.h>
void function1 (char *line)
{
    printf ("function1():\n");
    printf ("sizeof(line)  = %zu\n", sizeof(line));
    printf ("sizeof(*line) = %zu\n", sizeof(*line));
    printf ("strlen(line)  = %zu\n", strlen(line));
}
void function2 (char line[])
{
    printf ("function2():\n");
    printf ("sizeof(line)  = %zu\n", sizeof(line));
    printf ("sizeof(*line) = %zu\n", sizeof(*line));
    printf ("strlen(line)  = %zu\n", strlen(line));
}
int main(void)
{
    char *line1 = "1234567890";
    printf ("Line 1: '%s'\n", line1);
    function1 (line1);
    function2 (line1);
    
    printf ("\n");
    char line2[] = "1234567890";
    printf ("Line 2: '%s'\n", line2);
    function1 (line2);
    function2 (line2);
    return EXIT_SUCCESS;
}

Above, I create a “*line” pointer to a string then pass it in to two functions. The first expects a *line as the parameter, and the second expects a line[].

Then I do a “line[]” array and pass it to the same two functions.

The results are the same:

Line 1: '1234567890'
function1():
sizeof(line) = 8
sizeof(*line) = 1
strlen(line) = 10
function2():
sizeof(line) = 8
sizeof(*line) = 1
strlen(line) = 10

Line 2: '1234567890'
function1():
sizeof(line) = 8
sizeof(*line) = 1
strlen(line) = 10
function2():
sizeof(line) = 8
sizeof(*line) = 1
strlen(line) = 10

And, if you use a “good compiler,” you may get a warning about doing sizeof() like this:

main.c: In function ‘function2’:
main.c:16:44: warning: ‘sizeof’ on array function parameter ‘line’ will return size of ‘char *’ [-Wsizeof-array-argument]
   16 |     printf ("sizeof(line)  = %zu\n", sizeof(line));

Notice that warning was from function2(), and not from function1(). This is one difference in using “*line” versus “line[]” in the functions. For function1(), no warning is given:

void function1 (char *line)
{
    printf ("function1():\n");
    printf ("sizeof(line)  = %zu\n", sizeof(line));
    printf ("sizeof(*line) = %zu\n", sizeof(*line));
    printf ("strlen(line)  = %zu\n", strlen(line));
}

Since the function takes a “pointer to one or more chars”, doing sizeof() what that pointer points to makes sense. It is what you asked for. The “C gibberish” website says:

declare line as pointer to char

https://cdecl.org/?q=char+*line

But for the second one…

void function2 (char line[])
{
    printf ("function2():\n");
    printf ("sizeof(line)  = %zu\n", sizeof(line));
    printf ("sizeof(*line) = %zu\n", sizeof(*line));
    printf ("strlen(line)  = %zu\n", strlen(line));
}

…a warning is given about the “sizeof(line)” because it cannot tell us the size of a line[] array — it became a pointer to the character memory when it went into the function. But because the function parameter was “line[]”.

declare line as array of char

https://cdecl.org/?q=char+line%5B%5D

Doing sizeof() an “array of char” is valid. But it was passed into the function, even though the parameter was a line[] it is passed as a pointer to the data. I guess this is one of those “I’m sorry, Dave. I’m afraid I can’t do that” moments ;-)

Is this useful? It certainly will let you use sizeof() instead of strlen() on a string if you have direct access to the string variable. But passing strings into functions? Not so much. (Or am I mistaken?)

But I do think I am going to try to go back to using “line[]” for my string declarations. I like retro.

Until next time…

Old C dog, new C tricks part 2: i won’t use i

See Also: part 1, part 2, part 3, part 4 and part 5.

When I learned BASIC, manuals always showed the FOR/NEXT loop using the variable “I”:

FOR I=1 TO 100:NEXT I

“I” had no idea why this was, back then, but since then I have been told there was some history with some programming language that had certain letters reserved for certain features, such as as I for loops.

But is this true? A quick Google search produced an “AI summary”:

The convention of using “i” as the loop counter variable originates from the mathematical notation where “i,” “j,” and “k” are commonly used as indices or subscripts to represent integers in sequences or arrays. This practice was adopted early in computer science and has persisted due to its conciseness and familiarity among programmers. While any valid variable name could technically be used, “i” serves as a readily recognizable and easily understood placeholder for the loop index, especially in simple iterations. In nested loops, “j” and “k” are conventionally used for the inner loop counters.

– Google AI Summary of my search result.

But that’s not important right now…

I suppose I had a bit of O.C.D. in me, because starting with “I” seemed weird.

I would do my loops starting with “A”.

FOR A=1 TO 100:NEXT A

When variables needed to mean something, I’d use the one or two-character variable name that made sense — NM$ for Name string, UC for User Count, etc. But for loops and other things, I’d use A, B, C, D, etc.

C books were similar, showing “i” for loops:

for (int i=0; i<100; i++)
{
    ...stuff...
}

For some reason, I always used “i” in C rather than a, b, c… My nested loops looked just like the AI summary described:

for (int i=0; i<10; i++)
{
    for (int j=0; j<10; j++)
    {
        for (int k=0; k<10; k++)
        {
            ....stuff....
        }
    }
}

It was at a previous job that another embedded programmer said something to me that changed this forever.

“You can’t search for i…”

– paraphrased quote from embedded programmer I used to work with.

While today there are modern editors that let you specify how a search works — full word or partial, ignore case or case sensitive, and even regular expressions — but you can never depend on having access to those tools. Some jobs are very restrictive about the software you are allowed to install on your work computer. Some simply don’t allow any installs by employees: you get the set of preconfigured apps for the position (Microsoft Office for some roles, compilers and such for others, etc.) and that might be it.

He told me he used “idx” for loops, rather than “I”. And that was enough to change my C coding habits instantly. Even since, when I do a loop, it’s like this…

for (idx=0; idx<100; idx++)
{
    ...stuf...
}

And when I’m looping through things of a given type, it do things like:

for (int cardIdx=0; cardIdx<52; cardIdx++)
{
    ...stuf...
}

Who says you can’t teach an old dog new tricks?

What do you use for your loops? And why?

Comments, if you have them…

Build a (marginally) better malloc and free

This is a dumb one, but maybe someone else will find it useful.

I have been working on some C code that uses dynamically allocated linked lists. There are index structures and record structures and individual elements of different kinds in the records, all malloc()’d and then (hopefully) free()’d at the end.

Since my background is low-resource embedded systems (one system has a mere 8K of RAM), I have never really done much with malloc(). In fact, on some of the environments I have worked in they did not even provide a malloc()/free(). And this is probably a good thing. Without an OS watching over you, any memory allocated that did not get properly freed (a “memory leak“) can be big trouble for an embedded system meant to “run forever without a reboot.”

What I am writing right now is for a PC, and my understanding is if you allocate a bunch of memory then exit, Windows will clean it up for you:

int main()
{
    char *ptr = malloc (65535); // Allocate 64K

    // And now just exit without freeing it.
    return 0;
}

But no one should be writing code like that intentionally. This is the same mentality that has people who throw their trash on the ground because “someone else will clean it up for me.” Just because you have a garbage collector doesn’t mean you should rely on it to clean up after your mistakes.

But I digress.

In my program, I was wondering if I was getting it right. Debug “printf” messages can only go so far in to seeing what is going on. Did I free every last record? Did all the elements inside the records get freed as well?

I have no idea.

MAUI to the rescue!

Then a memory popped into my head. When I worked for Microware Systems Corp., we had a New Media division that worked on digital TV set-top boxes. (Yes, Virginia, I saw a demo of streaming video-on-demand with pause, fast forward and rewind back in the summer of 1995. But that’s a story for another day…)

The D.A.V.I.D. product (Digital Audio Video Interactive Decoder) used various APIs to handle things like networking, MPEG video decoding, sound, and graphics. The graphics API was called M.A.U.I. (Multimedia Application User Interface).

MAUI had various APIs such as GFX (graphics device), DRW (drawing API), ANM (animation), CDB (configuration database?), BLT (a blitter) and more. There was even a MEM (memory) API that was a special way to allocate and free memory.

I did not understand much of this at the time, beyond the entry level stuff I sometimes taught in a training course.

But the memory API had some interesting features. The manual introduced it as follows:

“Efficient memory management is a requirement for any graphical environment. Graphical environments tend to be very dynamic when it comes to allocating and freeing memory segments. Therefore, unless an application takes specific steps to avoid it, memory fragmentation can become a serious problem.

The Shaded Memory API provides the facilities applications (and other APIs) required to manage multiple pools of memory.

– Microware MAUI manual, 2000

One interesting feature of this API was the ability to detect memory overflows and underflows. For example, if you allocate 50 bytes and get a pointer to that memory, then copy out 60 bytes, you have overflowed that memory by 10 bytes (“buffer overrun“). Likewise, if the pointer started at 0x1000 in memory, and you wrote to memory before that pointer, that would be a buffer underflow.

The manual describes it as follows:

To print the list of overflows/underflows call mem_list_overflows(). When a shade is created with the check overflows option true, safe areas are created at the beginning and the end of the segment. If these safe areas are overwritten, the overflow/underflow situation is reported by mem_list_overflows().

– Microware MAUI manual, 2000

This gave me an idea on how to verify I was allocating and freeing everything properly. I could make my own malloc() and free() wrapper that tracked how much memory was allocated and freed, and have a function that returned the current amount. Check it at startup and it should be zero. Check it after all the allocations and it should have some number. Free all the memory and it should be back at zero. (Malloc already tracks all of the internally, but C does not give us a legal way to get to that information.)

Simple!

Sounds simple!

At first, you might think it could be as simple as something like this:

static int S_memAllocated = 0;

void *MyMalloc (size_t size)
{
    S_memAllocated += size;

    return malloc (size);
}

Simple! But, when it comes time to free(), there is no way to tell how big that memory block is. All free() gets is a pointer.

Sounds almost simple!

To solve this problem, we can simply store the size of the memory allocated in the block of allocated memory. When it comes time to free, the size of that block will be contained in it.

To do this, if the user wanted to malloc(100) to get 100 bytes, you would allocate 100 + the size of an integer. You would then copy an integer containing the size of this allocated segment into the first bytes of the block (and increment the memory counter by that amount). After that, the pointer returned to the user should be after that copied integer. Like this:

malloc (100 + sizeof(int));
+---+----------------------+
|int| the user's 100 bytes |
+---+----------------------+
^
|_ return this location

When this memory is free()’d, the passed-in pointer would be adjusted back past the integer. Those bytes could be copied into an int (so you know how much to subtract from the counter) and then the block free()’d.

Sounds sorta simple?

Here is what I quickly came up with…

// MyMalloc.h
#ifndef MYMALLOC_H_INCLUDED
#define MYMALLOC_H_INCLUDED
size_t GetSizeAllocated (void);
void *MyMalloc (size_t size);
void MyFree (void *ptr);
#endif // MYMALLOC_H_INCLUDED

// MyMalloc.c
#include <stdlib.h> // for malloc()/free();
#include <string.h> // for memcpy()

#include "MyMalloc.h"

static size_t S_bytesAllocated = 0;

size_t GetSizeAllocated (void)
{
    return S_bytesAllocated;
}

void *MyMalloc (size_t size)
{
    // Allocate room for a "size_t" plus user's requested bytes.
    void *ptr = malloc (sizeof(size) + size);
    
    if (NULL != ptr)
    {
        // Add this amount.
        S_bytesAllocated = S_bytesAllocated + size;
        
        // Copy size into start of memory.
        memcpy (ptr, &size, sizeof (size));

        // Move pointer past the size.
        ptr = ((char*)ptr + sizeof (size));
    }

    return ptr;
}

void MyFree (void *ptr)
{
    if (NULL != ptr)
    {
        size_t size = 0;

        // Move pointer back to the size.
        ptr = ((char*)ptr - sizeof (size));
        
        // Copy out size.
        memcpy (&size, ptr, sizeof(size));

        // Subtract this amount.
        S_bytesAllocated = S_bytesAllocated - size;
        
        // Release the memory.
        free (ptr);
    }
}

Then, as a test, I wrote this program that randomly allocates ‘x’ blocks of memory of random sizes… then frees all those blocks.

#include <stdio.h>
#include <stdlib.h>

#include "MyMalloc.h"

#define NUM_ALLOCATIONS     100
#define LARGEST_ALLOCATION  1024

int main()
{
    char *ptr[NUM_ALLOCATIONS];
    
    printf ("Memory Allocated: %zu\n", GetSizeAllocated());

    // Allocate    
    for (int idx=0; idx<NUM_ALLOCATIONS; idx++)
    {
        ptr[idx] = MyMalloc ( rand() % LARGEST_ALLOCATION + 1);
        
    }

    printf ("Memory Allocated: %zu\n", GetSizeAllocated());

    // Free    
    for (int idx=0; idx<NUM_ALLOCATIONS; idx++)
    {
        MyFree (ptr[idx]);
    }

    printf ("Memory Allocated: %zu\n", GetSizeAllocated());

    return EXIT_SUCCESS;
}

When I run this, I see the memory count before the allocation, after the allocation, then after the free.

Memory Allocated: 0
Memory Allocated: 45464
Memory Allocated: 0

Since it is randomly choosing sizes, the number in the middle may* be different when you run it.

I then plugged this code into my program (I did a search/replace of malloc->MyMalloc and free->MyFree) and added the same memory prints at the start, after allocation, and after freeing.

And it worked. Whew! I guess I did not need to spend time writing MyMalloc() or this post after all.

But I had fun doing it.

Additional thoughts…

Thinking back to the MAUI memory API, extra code could be added to put a pattern at the start and end of the block. A function could be written to verify that the block still had those patterns intact, else it could report a buffer overflow or underflow.

Also, I chose “size_t” for this example just to match the parameter that malloc() takes. But, if you knew you would never be allocating more than 255 bytes at a time, you could change the value you store in the buffer to a uint8_t. Or if you knew 65535 bytes was your upper limit, use a uint16_t. This would prevent wasting 8 bytes (on a 64-bit compiler) at the start of each malloc’d buffer.

But why would you want to do that? If you were on a PC, you wouldn’t need to worry about a few extra bytes each allocation. And if you were on a memory constrained embedded system, you probably shouldn’t be doing dynamic memory allocations anyway! (But if you did, maybe uint8_t would be more than enough.)

I suppose there are plenty of enhanced memory allocation routines in existence that do really useful and fancy things. Feel free to share any suggestions in the comments.

Until next time…

Bonus Tip

If you want to integrate this code in your program without having to change all the “malloc” and “free” instances, try this:

// Other headers
#include "MyMalloc.h"
#define malloc MyMalloc
#define free MyFree

That will cause the C preprocessor to replace instances of “malloc” and “free” in your code to “MyMalloc” and “MyFree” and then it will compile referencing those functions instead.


* Or you may see the same number over and over again each time you run it. But that’s a story for another time…

Old C dog, new C tricks part 1: NULL != ptr

See Also: part 1, part 2, part 3, part 4 and part 5.

Updates:

  • 2025-02-19 – “new information has come to light!”

As someone who learned C back in the late 1980s, I am constantly surprised by all the “new” things I learn about this language. Back then, it was a K&R-era compiler, so there were no prototypes, and functions looked like this:

main(argc,argv)
int argc;          /* argc = # of arguments on command line */
char *argv[];      /* argv[1-?] = argurments */
{
    ...stuf...
} 

…and this…

MallocError(wpath)
int wpath;
{
   ShutDown(wpath);
   fputs("\nFATAL ERROR:  Towel was unable to allocate necessary memory to process\n",stderr);
   fputs(  "              this directory.\n",stderr);
   sleep(0);
   exit(0);
}

Today’s article is not about how old I am, but about something I just started doing, and wish I had done long ago.

Yoda would be happy…

When I learned to program BASIC, I learned how to compare a variable:

IF A=42 THEN PRINT "DON'T PANIC!"

When I learned C, the thing I had to get used to was double equals “==” for compare and single equal “=” for assignment:

int a = 42;

if (a == 42)
{
    printf ("Don't Panic!\n");
}

This, of course, leads to a common mistake that I have stumbled on many, many times over the past decades: Sometimes a programmer misses one of those equals:

if (a = 42)
{
    printf ("Don't Panic!\n");
}

This will cause the code to always enter that section and run it, regardless of what you think “a” is set to. Why? Because it is basically saying “if a can be set to 42, then…”


Or does it?

Normally, I wait for a follow up to discuss corrections and additional details I learn from the comments, but this one deserves an immediate revision. Aidan Hall left this tidbit:

It’s even worse than what you suggest! Assignment expressions evaluate to the value that was assigned (on the RHS), so this if block wouldn’t run:

if (a = 0) {
puts(“zeroed”);
}

– Aidan Hall

I had mistakenly thought it was testing the result of “can a be assigned” and assuming this would always be true. I did not realize it was the value of the assignment that was used. Wowza. Thanks, Aidan! And now back to the original content…


By leaving out that second equal, it now becomes an assignment. It might as well be saying:

if (1)
{
    a = 42;
    printf ("Don't Panic!\n");
}

I have caught this type of thing in code I have worked on at several jobs. And, I’ve caught it in code I wrote as well. Even recently…

But Yoda would be proud. Smarter programmers already figured out that you can write those comparisons backwards, like this:

if (42 == a)
{
    printf ("Don't Panic!\n");
}

The first time I ever saw that was at a former job, and it was code from a team over in India. I thought this was very odd, and wondered if it was some odd convention in that country, similar to how in America we would write “$5” for five dollars, but in Europe it might be “5 €” for five Euros.

Honestly, as backwards as that looks to me, phonetically it makes more sense when you read it ;-)

And don’t get me started on America’s Month/Day/Year and how confusing OS-9’s “backwards” time of Year/Month/Day was… but I quickly adopted that, since you can sort dates that way, but not in the “normal” way.

But I digress…

By reversing these comparisons, you now eliminate the possibility of forgetting an equal. This won’t give an error (but a good compiler might give a warning):

if (a = 42)

…but this cannot be compiled:

if (42 = a)

When I started working on some new code this past weekend, I just decided to start doing things that way. It quickly becomes second nature:

if (NULL != ptr)
{
}

if (false == status)
{
}

But it still looks weird.

Now to fire up that old 1980s compiler and see if that was even possible back then…

Until next time…

Early 1980s BBSes and spinning cursors.

There is a whole generation that has no idea how much cool stuff folks did with text and backspace.

One of my favorites was the “spinning cursor.” Thanks to slow speeds of 300 baud modems, you could get some interesting effects by printing a letter, then printing a character like a slash (“/”), then a backspace, then a dash (“-“), then a backspace, then a backslash (“\”), then a backspace, then a vertical bar (“|”) or exclamation mark (“!”) if your system did not have the vertical bar. Then a backspace and the next letter of the message.

Apparently I got nostalgic about this effect some time ago. I just found this “Spinning Cursor” C project I wrote on the Online GDB compiler:

https://onlinegdb.com/56zozL_gRp

Go there and you can RUN the project and see it in all its glory…

C program build date and time.

This one is just for fun, though I suppose I might not be the only one on the planet that ever needed to do this…

At my day job, I have a board that had a realtime clock, but no battery backup to retain the time. During startup, the system sends the current PC date and time (actually, it sends it in GMT, I believe, so looking at logs captured in different parts of the world will be easier — GMT is GMT anywhere on the planet ;-)

On startup, the board wants to log some things, but does not yet know the time. It had been using a hard-coded default time (like 1/1/2000). I wondered if the C compiler build date and time could be used to at least set the time based on when the firmware-in-use was compiled.

A quick chat with Bing’s AI (ChatGPT) and some experiments to make what it gave me far less bulky provided me with this:

int main()
{
    // Initialize time to when this firmware was built.
    const char *c_months = "JanFebMarAprMayJunJulAugSepOctNovDec";
    char monthStr[4];
    int year = 0;
    int month = 0;
    int day = 0;
    int hour = 0;
    int minute = 0;
    int second = 0;

    // “Mmm dd yyyy”
    strncpy(monthStr, __DATE__, 3);
    monthStr[3] = '\0';
    month = (strstr(c_months, monthStr) - c_months) / 3 + 1;

    day = atoi (&__DATE__[4]);
    year = atoi (&__DATE__[7]);

    printf ("%04d-%02d-%02d\n", year, month, day);

    // “hh:mm:ss”
    hour = atoi (&__TIME__[0]);
    minute = atoi (&__TIME__[3]);
    second = atoi (&__TIME__[6]);

    printf ("%02d:%02d:%02d\n", hour, minute, second);

    return 0;
}

This works by taking the compiler-generated macros of “__DATE__” and “__TIME__” and parsing out the values we want so they can be passed to a realtime clock routine or whatever.

In my case, this is not the code I am using since our embedded compiler handled __DATE__ in a different format. (It uses “dd-Mmm-yy” for some reason, while the standard C formatting appears to be “Mmm dd yyyy”.) But, the concept is similar.

Of course, as soon as I tested this, I found another issue. My board would power up and set to the build date (which is central standard time) and then when the system is connected, a new date/time is sent in GMT, which is currently 5 (or is it 6?) hours different, setting the clock back in time.

This makes log entries a bit odd ;-) but that’s a problem for another day.

Until then…

When a+b+c is not the same as b+a+c plus the Barr coding standard

DISCLAIMER: All compilers are not created equal. Different compilers may achieve the same result, but may take different steps to achieve that result. Optimizers and code generators can do wonderful things. Thus, if you want to leave a comment and say “compiler XYZ does not do that,” that is fine, but that is not the point of this. This is for those “other” compilers you don’t use, that do not behave that way…

During my embedded C programming career, there are some interesting optimizations I have been taught. Most of these are things I would never consider on a modern C compiler running on a system that has ample memory and CPU resources. But when you are on a microcontroller with 4K or RAM or 16K of program storage, sometimes you have to do things oddly to make it fit, or, if the CPU is slow, make it run fast enough.

True, False, or Not True or Not False?

Consider this:

bool flag = false;

if (flag)
{
// Do something
}

And “if” like this will be looking for a true result. Now, one compiler I work with has its own “TRUE” and “FALSE”, in uppercase, which all their code uses. Why? Maybe because they originated before the stdbool.h header file was added to the C standard and defined an official “true” and “false” in lowercase. Fortunately, they currently provide a stdbool which will undefine the uppercase ones (if the compiler is set to NON-case sensative — yep, by default “foo” and “FOO” and “else” and “Else” are processed the same) and define lowercase ones:

#if !getenv("CASE")
// remove TRUE and FALSE added by CCS's device .h file, only if
// compiler has case sensitivty off.

#if defined(TRUE)
#undef TRUE
#endif

#if defined(FALSE)
#undef FALSE
#endif
#endif

typedef int1 bool;
#define true 1
#define false 0
#define __bool_true_false_are_defined

With 0 representing false, and 1 representing true, the “if” works — anything that is not 0 will be processed. In a normal compiler:

if (0)
{
printf ("This will not print.\n");
}

if (1)
{
printf ("This will print\n");
}

if (42)
{
printf ("This will print\n");
}

On my Radio Shack Color Computer’s 6809 microprocessor, I expect such an “if” test compiles into assemble code that represents something like “branch if not zero”. I would expect every CPU has a similar instruction.

So checking for true (not 0) should be as fast as checking for false (0), assuming there is a similar instruction for “branch if zero.”

However, what if the CPU uses a different number of instruction cycles for a “branch if zero” versus “branch if not zero”? If that were the case, these might have different execution speeds:

if (flag == true)
{
// Do something...
}

if (flag == false)
{
// Do something...
}

But that seems unlikely, and is not the point of this post. (If you are aware of any CPU where this would be the case, please leave a comment.)

Some company coding standards I have used said to never use just “if (x)” but instead write out what it actually means. While you and I are experts and clearly know what the “if (x)” does, as should any programmer who knows programming, what if they don’t? In that case “if (x == true)” and “if (x == false)” are impossible to misunderstand, and should generate the same code as “if (x)” and “if (!x)”.

Right?

But suppose you used a crappy “C-like” compiler, and it had a “test for zero” which is used for “if (flag == false)” but used something dumb like “compare against a number” when you did “if (flag == true)” or “if (flag)”… Like, the compiler saw a check for 0 and knew it could efficiently do that… but if it was not zero, it did a compare against a number, resulting in something like…

load #1 in to some accumulator
compare register holding "flag" against accumulator
branch if equal (or if not equal)

That can generate some extra code each and every time you check for “true”, so checking for “not false” might save a few bytes every time.

Because of that, I often just default to doing this:

if (flag != false)
{
// Do something...
}

And this looks stupid. But might save enough bytes to make something compile that otherwise would not fit.

Hopefully you have never had to work in such a constrained environment with such a crappy C-like compiler.

The good news is, by changing to doing this, it works the same on “real” compilers but “might” make smaller or faster code on bad compilers.

But I digress…

Adding it all up…

I really wanted to write this about something I had never considered:

#define HEADER_LENGTH 5
#define CRC_LENGH 2

unsigned int messageSize = HEADER_LENGTH + payloadLength + CRC_LENGTH;

If the message protocol uses a format like “[HEADER][PAYLOAD][CRC]”, writing out the C code like that makes it easy to visualize what the message bytes look like.

The compiler would be seeing that code as:

unsigned int messageSize = 5 + payloadLength + 2;

A compiler might be doing…

  • Set messageSize to 5
  • Add payloadLength to messageSize
  • Add 2 to messageSize

But if you grouped the #define values together:

unsigned int messageSize = HEADER_LENGTH + CRC_LENGTH + payloadLength;

A good compiler might be changing that to:

unsigned int messageSize = 5 + 2 + payloadLength;
...
unsigned int messageSize = 7 + payloadLength;

…which results in:

  • Set messageSize to 7
  • Add payloadLength to messageSize

And if you deal with hundreds of messages where this might be calculated, that savings can really add up.

I would hope a real/smart compiler might be able to detect this and optimize the constants together … but I know this is not guaranteed to be the case.

The best thing about standards…

And as a bonus, earlier I posted asking about C coding standards trying to find one my employer could adopt, instead of rolling our own. Bing CoPilot led me to a few, including this one specifically for embedded C:

Embedded C Coding Standard | Barr Group

This “Barr C” standard has many things I have already forced myself to start doing, and does look promising. You can but a paperback book for the standard for $6 on Amazon, or download the book free as a PDF. I plan to go through it and see what all it discusses.

One thing I like about the approach is gives a reason for each of the coding standard things is presents. For example, braces:

Rules:

a. Braces shall always surround the blocks of code (a.k.a., compound
statements), following if, else, switch, while, do, and for statements; single statements and empty statements following these keywords shall also always be surrounded by braces.

b. Each left brace ({) shall appear by itself on the line below the start of the block it opens. The corresponding right brace (}) shall appear by itself in the same position the appropriate number of lines later in the file.

Reasoning:

There is considerable risk associated with the presence of empty
statements and single statements that are not surrounded by braces. Code constructs like this are often associated with bugs when nearby code is changed or commented out. This risk is entirely eliminated by the consistent use of braces. The placement of the left brace on the following line allows for easy visual checking for the corresponding right brace.

barr_c_coding_standard_2018.pdf

When I started learning C back in the late 1980s, it was the pre-ANSI K&R C. Thus, I learned C the way the books I had showed it:

if (something) {
// Do something
} else {
// Do something else
}

The placement of the “{” on the first line seems to be referred to as “line saver” in some of the code editors I use. It was at a job where their standard says “line them up so you can see what goes to what” that I had to change my style:

if (something)
{
// Do something
}
else
{
// Do something else
}

Now the start of each code block has the start brace and end brace on the same column, making it much easier to spot rather than having to look at the ends of lines or some characters in to a line.

I hated that at first, but now I am used to it.

I also used to do things like this:

if (something)
DoSomething();
else
DoSomethingElse();

Somewhere on this site, I have written about this at least once or twice. This breaks when someone adds something without thinking about the braces:

if (something)
DoSomething();
WriteToLog(); // added this
else
DoSomethingElse();

Without the braces, trying to compile this would at least give an error:

main.c: In function ‘main’:
main.c:31:5: error: ‘else’ without a previous ‘if’
31 | else
| ^~~~

BUT, if you did not have the else…

if (something)
DoSomething();
WriteToLog();

That code might “look” good, but running it would do something if the case was true, but would then ALWAYS write to the log… Because C is seeing it like this:

if (something)
{
DoSomething();
}

WriteToLog();

And I have now seen a modern programmer, brought up on scripting languages that made use of tabs rather than braces, make this mistake working on C code they were not really familiar with.

But I digress. Again.

More to come when my book arrives and I start reading through it. Unless someone presents me a better alternative, I think this one may suffice. The book is cheap, it can be downloaded free (so it is searchable) and the items I have spot checked seemed reasonable.

If you have ever worked with the Barr-C coding standard, I’d love to hear your thoughts in the comments.

Until then…

C has its limits. If you know where to look.

Thank you, Bing Copilot (ChatGPT), for giving me another “thing I just learned” to blog about.

In the early days of “K&R C”, things were quite a bit different. C was not nearly as portable as it is today. While the ANSI-C standard helped quite a bit, once it became a standard, there were still issues when moving C code from machines of different architectures — for example:

int x;

What is x? According to the C standard, and “int” is “at least 16 bits.” On my Radio Shack Color Computer, and int was 16-bits (0-65535). I expect on my friend’s Commodore Amiga, the int was 32-bits, though I really don’t know. And even when you “know”, assuming that to be the case is a “bad thing.”

I used a K&R C compiler on my CoCo, and later on my 68000-based MM/1 computer. That is when I became aware that an “int” was different. Code that worked on my CoCo would port fine to the MM/1, since it was written assuming an int was 16-bits. But trying to port anything from the MM/1 to the CoCo was problematic if the code had assumed an int was 32-bits.

When I got a job at Microware in 1995, I saw my first ANSI-C compiler: Ultra C. To deal with “what size is an int” issues, Microware created their own header file, types.h, which included their definitions for variables of specific sizes:

u_int32 x;
int32 y;

All the OS library calls were prototyped to use these special types, though if you know an “unsigned long” was the same as an “u_int32” or a “short” was the same as an “int16” you could still use those.

But probably shouldn’t.

In those years, I saw other compilers do similar things, such as “U32 x;” and “I16 y”. I expect there were many variations of folks trying to solve this problem.

Some years later, I used the GCC compiler for the first time and learned that the ANSI-C specification now had it’s own types.h — called stdint.h. That gave us things like:

uint32_t x;
int32_t y;

It was easy to adopt these new standard definitions, and I have tried to use them ever since.

I was also introduced in to the defines that specified the largest value that would fit in an “int” or “long” on a system – limits.h:

...
#define CHAR_MAX 255 /*unsigned integer maximum*/
#define CHAR_MIN 0 /*unsigned integer minimum*/

/* signed int properties */
#define INT_MAX 32767 /* signed integer minimum*/
#define INT_MIN (-32767-_C2) /*signed integer maximum*/

/* signed long properties */
#define LONG_MAX 2147483647 /* signed long maximum*/
#define LONG_MIN (-2147483647-_C2) /* signed long minimum*/
...

The values would vary based on if your system was 16-bits, 32-bits or 64-bits. It allowed you to do this:

int x = INT_MAX;
unsigned int y = UINT_MAX;

…and have code that would compile on a 16-bit or 64-bit system. If you had tried something like this:

unsigned int y = 4294967295; // Max 32-bit value.

…that code would NOT work as expected when compiled on a 16-bit system (like my old CoCo, or an Arduino UNO or the PIC24 processors I use at work).

I learned to use limits.h.

But this week, I was working on code that needed to find the highest and lowest values in a 32-bit number range. I had code like this:

uint32_t EarliestSequenceNumber = 4294967295;
uint32_t LatestSequenceNumber = 0;

And that works fine, and should work fine on any system where an int can hold a 32-bit value. (Though I used hex, since I know 0xffffffff is the max value, and always have to look up or use a calculator to find out the decimal version.)

Had I been using signed integers, I would be doing this:

int32_t LargestSignedInt = 2147483647;

Or I’d use 0x7fffffff.

As I looked at my code, I wondered if C provided similar defines for the stdint.h types.

stdint.h also has stdsizes!

And it does! Since all of this changed/happened after I already “learned” C, I never got the memo about new features being added. Inside stdint.h are also defines like this:

#define INT8_MAX  (127)
#define INT8_MIN (-128)
#define UINT8_MAX (255)

#define INT16_MAX (32767)
#define INT16_MIN (-32768)
#define UINT16_MAX (65535)

#define INT32_MAX (2147483647)
#define INT32_MIN (-2147483648)
#define UINT32_MAX (4294967295)

#define INT64_MAX (9223372036854775807)
#define INT64_MIN (-9223372036854775808)
#define UINT64_MAX (18446744073709551615)

…very similar to what limits.h offers for standard ints, etc. Neat!

Now modern code can do:

uint32_t EarliestSequenceNumber = UINT32_MAX;
uint32_t LatestSequenceNumber = 0;

…and that’s the new C thing I learned today.

And it may have even been there when I first learned about stdint.h and I just did not know.

And knowing is half the battle.