Category Archives: C Programming

DIRVERT.EXE: Windows 95, long file names, and MS-DOS 8.3 filenames.

Today we set the wayback machine (kids, ask your grandparents about Rocky and Bullwinkle) to the year of 1996

But first, I digress.

Some of this will be wrong, but I hope it is at least “directionally accurate.” Feel free to correct (or nit pick, if you prefer) me in the comments.

In 1981, IBM entered the personal computer market with the original IBM-PC. You know the one. It had a cassette port on the back and booted into a Microsoft BASIC. If you were rich enough, you had one with a floppy disk drive and used PC-DOS.

PC-DOS was “created” by Microsoft (kid’s, ask your parents about how Microsoft made a deal with IBM to create a disk operating system for their new computer, then bought an existing disk operating system and sold it to IBM). A bit later, PC-DOS was sold by Microsoft under the name MS-DOS for use with non-IBM-made IBM PC clones.

Thank you for coming to my computer history talk.

8.3: The number that defined a (computer) generation

PC and MS-DOS used 8 characters for the filename with a 3 character extension: FILENAME.BAS, FILENAME.TXT, FILENAME.EXE, etc. This matched the filename convention that the Radio Shack Color Computer Disk Extended Color BASIC (aka, RS-DOS) used, though before PC-DOS, Microsoft used “BIN” for the extension of a binary versus “EXE” and “COM” on the PC-DOS machines. (Maybe BIN was Microsoft’s choice, and EXE was the choice of the original designers of what became known as PC-DOS? I’d look it up, but this has no relevance to this topic at hand.)

Microsoft Windows started out as basically a graphical program that ran from MS-DOS and gave mouse support and a way to run programs. The PCs still booted up into MS-DOS then “ran” the windows program.

Windows 95 and “LFNs”

Windows 95 was a huge change. It altered how “multitasking” would work and, for the first time, allowed long file names on a PC. Instead of being limited to 8.3 for a filename, they could now be up to 255 characters long, with mIxEd CaSe. From the wiki, I see this is called LFN:

https://en.wikipedia.org/wiki/Long_filename

If you created a long filename from Windows 95, it created a backwards compatible 8.3 filename for legacy DOS programs to use. “LongFileName.txt” would be written out as “LONGFI~1.TXT”. The first 6 characters were used, then a tilde and a number.

If you made an 8.3 filename under Windows 95, it would (mostly) work like you expect. Creating “FILENAME.TXT” resulted in “FILENAME.TXT” (and be shown as “Filename.txt” in Windows 95, even if you typed it in all uppercase, for some reason)…

However, some filenames got the “~1” that did not need to. The “~1” should have one been necessary if there was another conflicting file in the directory that matched the first 8 characters and three character extension.

At the time, I was using an MS-DOS C compiler called Power-C. It was not LFN-aware so it only dealt with the 8.3 short filenames. A file like “INDEX.HTML” would become “INDEX~1.HTM” since it had a 4-character extension. But, it could have just become “INDEX.HTM” for the short name (provided there was not another filename that conflicted with).

I got tired of having to deal with “~1” in my website filenames and figured out you I rename them away and make that “INDEX~1.HTM” show up as “INDEX.HTM” (while the long filename was still “INDEX.HTML).

  • Long filename “index.html” shows up as “INDEX~1.HTM”. rename index~1.htm index.htm
  • New filename is “index.htm”. rename index.htm index.html
  • New new filename is “index.html” for long filenames, and “index.htm” for short ;)

Today I bet the wiki entry would explain why it worked that way, but back in 1996 it was just a frustrating mystery.

DIRVERT.EXE

Of course I automated this process… I wrote DIRVERT.EXE, a simple command line program that would loop through a directory and RENAME things to remove the “~1” where possible. Here is what the comments said:

Windows 95 DOS directory converter thingie filter. Takes a Win 95 | directory and renames the 8.3 extentions like they should be… Maybe.

The problem: Pre-Win95 DOS programs don’t know how to deal with long filenames, which end up as “FILENA~1.TXT” to the DOS side. The only time this should be needed is _if_ the DOS name contains spaces or is a duplicate of another truncated filename. ie, “filename1.txt” could be shown as “FILENAME.TXT” but if a “filename2.txt” exists in the same dir, it would have to be “FILENA~2.TXT”. There is no reason for files such as “index.html” to have to be “INDEX~1.HTML”, so this program tries to fix those by renaming them.

To test: “copy con filename.txt”. If you then rename this to a long name with the same first 8.3 characters the same, the DOS name WILL stay the same. ie, “FILENAME.TXT … filenamethatislonger.txt_sothere” in the DIR listing. If you rename and change any of the characters, such as renaming to “filename.text” it will appear as “FILENA~1.TEX” even though it _could_ be “FILENAME.TEX” just fine. This is, in my opinion, a bug in the approach Microsoft took to the filenames. In the above example, this filter will do two renames to try to fix it back. First, it finds the short version of the Win95 name (ie, “filename.tex”) and it renames the DOS name to that. Then it renames it back to the long version. As long as there are no duplicates (those will generate rename errors and just be left alone), this should work just fine. For example, assume:

INDEX~1.HTM – index.html <- current name

this filter does: rename index~1.htm index.htm

INDEX.HTM – index.htm <- new name

then it does: rename index.htm index.html

INDEX.HTM – index.html <- final name, ~1 now gone :)

This won’t work 100% of the time. If there was also a file in there that was called “index.html2”, it would try to make it “index.htm” which would fail since there is already one. So, it would be left alone. Also, any files that might need a space in them ( AFILE.TXT — “a file.txt”) can not be fixed either. BUT, it seems that most anything else can. Note that when you run this, it may be normal to see some errors return from failed renames. No big deal.

Since this is just a filter, it poses no threat to damaging the directory other than renaming files, which is the only shell command it forks.

I used this program for years – right up until the end of my PC era (I switched full-time to a Macintosh in 2001). It is pointless and useless today, but I thought I’d share it. Here it is on GitHub, along with various other “stupid” utilities I wrote for my own use:

https://github.com/allenhuffman/oldcstuff/tree/main/DIRVERT

It appears I updated until 1998.

As I look at it today, I see it had a conditional compile for OS-9000, which was Microware’s OS-9 RTOS that ran on x86 PC hardware. I am unsure why this would have even been useful — perhaps for dealing with MS-DOS disks when reading them under OS-9000 using the PCF (PC File System) file manager? Ah, the things I have forgotten…

These were fun times. Until next time…

/*---------------------------------------------------------------------------|
| Dirvert V1.08 by Allen Huffman (allenh@pobox.com)                          |
| Copyright (C) 1996,97 by Sub-Etha Software                                 |
|----------------------------------------------------------------------------|
| Syntax: dir {directory} | dirvert -z                                       |
|         dirvert {directory}                                                |
|         dirvert -z < {dirfile.txt}                                         |
| Usage : Win95 directory long filename fixer (and filter).                  |
| Opts  : -? or /? = display this message                                    |
|         -L or /L = lowercase directory filenames.
|         -Z or /Z = read directory output from standard input (filter).     |
|----------------------------------------------------------------------------|
| NOTE:  It seems this does not work under OSR2!  Any ideas why it doesn't?  |
|                                                                            |
|       Windows 95 DOS directory converter thingie filter.  Takes a Win 95   |
| directory and renames the 8.3 extentions like they should be...  Maybe.    |
|                                                                            |
| The problem:  Pre-Win95 DOS programs don't know how to deal with long      |
| filenames, which end up as "FILENA~1.TXT" to the DOS side.  The only time  |
| this should be needed is _if_ the DOS name contains spaces or is a         |
| duplicate of another truncated filename.  ie, "filename1.txt" could be     |
| shown as "FILENAME.TXT" but if a "filename2.txt" exists in the same dir,   |
| it would have to be "FILENA~2.TXT".  There is no reason for files such as  |
| "index.html" to have to be "INDEX~1.HTML", so this program tries to fix    |
| those by renaming them.                                                    |
|                                                                            |
| To test:  "copy con filename.txt".  If you then rename this to a long      |
| name with the same first 8.3 characters the same, the DOS name WILL stay   |
| the same.  ie, "FILENAME.TXT ... filenamethatislonger.txt_sothere" in the  |
| DIR listing.  If you rename and change any of the characters, such as      |
| renaming to "filename.text" it will appear as "FILENA~1.TEX" even though   |
| it _could_ be "FILENAME.TEX" just fine.  This is, in my opinion, a bug in  |
| the approach Microsoft took to the filenames.  In the above example, this  |
| filter will do two renames to try to fix it back.  First, it finds the     |
| short version of the Win95 name (ie, "filename.tex") and it renames the    |
| DOS name to that.  Then it renames it back to the long version.  As long   |
| as there are no duplicates (those will generate rename errors and just be  |
| left alone), this should work just fine.  For example, assume:             |
|                                                                            |
| INDEX~1.HTM - index.html     <- current name                               |
|      this filter does:   rename index~1.htm index.htm                      |
| INDEX.HTM   - index.htm      <- new name                                   |
|      then it does:       rename index.htm index.html                       |
| INDEX.HTM   - index.html     <- final name, ~1 now gone :)                 |
|                                                                            |
| This won't work 100% of the time.  If there was also a file in there that  |
| was called "index.html2", it would try to make it "index.htm" which would  |
| fail since there is already one.  So, it would be left alone.  Also, any   |
| files that might need a space in them ( AFILE.TXT --- "a file.txt") can    |
| not be fixed either.  BUT, it seems that most anything else can.  Note     |
| that when you run this, it may be normal to see some errors return from    |
| failed renames.  No big deal.                                              |
|                                                                            |
| Since this is just a filter, it poses no threat to damaging the directory  |
| other than renaming files, which is the only shell command it forks.       |
|                                                                            |
| COMPILING NOTE:  This source defaults to compile in "debug mode" unless    |
| you pass in/define "NODEBUG", ie "cc -dNODEBUG dirver.c" to generate the   |
| actual "working" version.                                                  |
|----------------------------------------------------------------------------|
| Ed #    Date      What Happened                                       Who  |
|  --   --------    ------------------------------------------------    ---  |
|  01   96/09/25    Created                                             ach  |
|  02   96/10/06    Bug fix for filenames with spaces in them           ach  |
|  03   96/10/09    Fixed function style mistakes (see comments)        tc   |
|                   Added proper include file for string functions      tc   |
|                   Fixed no command line option bug                    tc   |
|  04   96/10/09    Fixed "fix" to no command line option bug to make        |
|                   it work again like it did in the first place :)     ach  |
|  05   96/10/23    Debug updates, startup output, and bacon.           ach  |
|  06   96/12/19    Updated comments, "-z" or "auto" operation.         ach  |
|  07   97/08/20    Added "-l" lowercase mode.                          ach  |
|  08   98/01/29    Use "stdout" instead of "stderr" now for DOS.       ach  |
|---------------------------------------------------------------------------*/
 
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
 
void usage(void);
 
#define MAXLEN 80 /* max dir line length to accept */
#define WINLEN 80 /* won't work on win names longer than this :) */

#ifndef TRUE
#define TRUE 1
#endif
#ifndef FALSE
#define FALSE 0
#endif
 
int main( int argc, char *argv[] )
{
    char    line[MAXLEN];
    char    dosname[13];
    char    newname[13];
    char    win95name[WINLEN];
    char    cmd[160];
    int     count = 0;
    int     i,j;
    char    *ptr;
    char    doit = FALSE;
    char    lowerit = FALSE;

    for ( i=1 ; i<argc ; i++ ) {
        if ( argv[i][0] == '-' || argv[i][0] == '/' ) { /* option found! */
            switch( toupper(argv[i][1]) ) {
                case '?': /* request for help? */
                case 'H':
                    usage();
		case 'L':
                    lowerit = TRUE;
                    break;
                case 'Z': /* filter mode? */
                    doit = TRUE;
                    break;
                default:
                    fputs( "\nUnrecognized option: ", stdout );
                    fputs( argv[i], stdout );
                    fputs( "\n", stdout );
                    usage();
            }
        } else { /* must be a directory path? */
            strcpy( cmd, "dir " );
            strcat( cmd, argv[i] );
#ifdef _OS9000
            strcat( cmd, " ! " );
#else
            strcat( cmd, " | " );
#endif
            strcat( cmd, argv[0] );
            strcat( cmd, " -z" );
            if ( lowerit == TRUE ) strcat( cmd, " -l" );
#ifndef NODEBUG
            fprintf( stdout, "--- fork: %s\n", cmd );
#endif
            i = system( cmd ); /* forgive me, father, for I have sinned... */
            return( i );    /* get outta here */
        }
    }
    if ( doit != TRUE ) { /* if we aren't in auto mode, we can bail */
        usage();
    }

    fputs( "dirvert:  Processing directory output.\n", stdout );
 
    /*--------------------------------------------------|
    | Read as many lines as we can from standard input. |
    |--------------------------------------------------*/
    while( !feof( stdin ) ) {
        fgets( line, MAXLEN+1, stdin );
 
        if ( strlen(line)<44 ) continue;        /* ignore short lines */
 
        line[strlen(line)-1] = '\0';            /* convert CR into a NULL */
 
        /*----------------------------------------------------|
        | Check to see if the line contains filename entries. |
        |----------------------------------------------------*/
        if (( line[39] == ':' ) && ( line[15] != '<' )) {

            if ( lowerit != TRUE ) { /* skip bad names only if not lowerit mode */
                i = strcspn( line, "~" );           /* extended name? */
                if ( i == strlen(line) ) continue;  /* no, so skip */
            }
 
            /*--------------------------------|
            | Build existing DOS name string. |
            |--------------------------------*/
            i = strcspn( line, " " );           /* find first space */
            strncpy( dosname, line, i );        /* copy up to the space */
            dosname[i] = '\0';                  /* NULL terminate */
            strcat( dosname, "." );             /* append "." */
            strncat( dosname, line+9, 3 );      /* append extension */

            /*---------------------------|
            | Build the new name string. |
            |---------------------------*/
            ptr = line + 44;                    /* point to long filename */
            strncpy( win95name, ptr, WINLEN );  /* copy it over */

            if ( lowerit == TRUE ) {
                for ( i=0; i<strlen(win95name) ; i++ ) {
		    win95name[i] = (char)tolower( win95name[i] );
                }
            }

            i = strcspn( ptr, "." );            /* find extension */
            if ( i > 8 ) {                      /* extension past first 8? */
                j = 8;                          /* yes, crop there */
            } else {
                j = i;                          /* no, crop at extension */
            }
            strncpy( newname, ptr, j );         /* copy filename */
            newname[j] = '\0';                  /* NULL terminate */
 
            strcat( newname, "." );             /* append "." */
 
            if ( i < strlen(ptr) ) {            /* if it has an extension, */
                strncat( newname, ptr+i+1, 3 ); /* append it */
            }
#ifndef NODEBUG
            fprintf( stdout, "DOSname : %s\n", dosname );
            fprintf( stdout, "WINname : %s\n", win95name );
            fprintf( stdout, "NEWname : %s\n", newname );
#endif
 
/* this check could and should be done earlier somehow since none of this
   needs to be done of the resulting 8.3 filename is still invalid for DOS,
   such as containing spaces */
 
            /* if new name contains a space, it's not valid so ignore */
            if ( strcspn( newname, " " ) != strlen(newname) ) {
#ifndef NODEBUG
                fprintf( stdout, "--- This file cannot be changed.\n\n");
#endif
                continue;
            }
 
            /*--------------------------|
            | Fork the rename commands. |
            |--------------------------*/
            strcpy( cmd, "rename " );           /* build command line */
            strcat( cmd, dosname );
            strcat( cmd, " " );
            strcat( cmd, newname );
#ifndef NODEBUG
            fprintf( stdout, "--- fork: %s\n", cmd );
#else
            system( cmd ); /* forgive me, father, for I have sinned... */
#endif
            strcpy( cmd, "rename ");            /* build command line */
            strcat( cmd, newname );
            strcat( cmd, " \x22" );
            strcat( cmd, win95name );
            strcat( cmd, "\x22" );
#ifndef NODEBUG
            fprintf( stdout, "          %s\n\n", cmd );
#else
            system( cmd );
#endif
            count++;                            /* increment counter :) */
        }
    }
#ifndef NODEBUG
    fprintf( stdout, "%d directory entries processed.\n", count );
#endif
 
    return 0;
}
 
void usage(void)
{
    fputs( "\nDirvert V1.08 by Allen Huffman (allenh@pobox.com)\n", stdout );
    fputs( "Copyright (C) 1996-1998 by Sub-Etha Software\n\n", stdout );

    fputs( "Syntax: dir {directory} | dirvert -z\n", stdout );
    fputs( "        dirvert {directory}\n", stdout );
    fputs( "        dirvert -z < {dirfile.txt}\n", stdout );
    fputs( "Usage : Win95 directory long filename fixer (and filter).\n", stdout );
    fputs( "Opts  : -? or /? = display this message.\n", stdout );
    fputs( "        -L or /L = lowercase directory filenames.\n", stdout );
    fputs( "        -Z or /Z = read directory output from standard input (filter).\n", stdout );
#ifndef NODEBUG
    fputs( "\n*** DEBUG VERSION ***\n", stdout );
#endif

    exit(0);
}

C DEBUG printf macros and while(0) loops?

Trigger warning: This post will show some examples that “just work” and “just work fine” but may not be correct. I am posting this because I’d like to hear how you do this. Show us all a much better way.


I was today years old when I “learned” (possibly incorrectly) something (possibly) wrong that I have seen “always” done in the C debug print macros I’ve encountered.

Debug Macros: Basic

To enabled or disable debugging printf output, I often see macros like this:

#if defined(DEBUG_ENABLED)
    #define DEBUG_PRINTF(...) printf(__VA_ARGS__)
#else
    #define DEBUG_PRINTF(...)
#endif

This allows debug printfs to appear in code when debug is enabled or not exist at all when debugging is not enabled:

DEBUG_PRINTF ("Starting up system...\n");

Debug Macros: Intermediate

There is a fancier version of this macro. Instead of just using a #define DEBUG_ENABLED (present means enabled), the code uses a #define set to a number. That number is used to tell when to include the debug print:

#if (DEBUG_LEVEL > 1)
    DEBUG_PRINTF ("tlv_parse_ptr (0x%x, %u, 0x%x)\r\n",
                  (unsigned int)((uintptr_t)p_buf),
                  buf_size, (unsigned int)(uintptr_t)p_tlv_table);
#endif

If you “#define DEBUG_LEVEL 1”, the above code will not be part of the program. If the DEBUG_LEVEL is defined to be greater than 1, it will.

That also “just works” but makes the source code messier with all those extra “#if (DEBUG_LEVEL > x)” and #endif” lines.

Debug Macros: Advanced

This leads to the even fancier version which builds the level check into the macro itself. Those get used like:

DEBUG_PRINTF (2, "This will only print at level 2 and above.\n");

The downside of that approach is that a #define macro cannot contain preprocessor #if to check another #define macro inside of it. You can NOT do this:

#define DEBUG_PRINTF(level, ...) \
    #if (level < DEBUG_LEVEL)     \
        printf (__VA_ARGS__);    \
    #endif

Instead, the macro must have actual C code with an “if” and such, which embeds that extra code into your program:

#define DEBUG_PRINTF(level, ...) \
    if (level < DEBUG_LEVEL)     \
    {                            \
        printf(__VA_ARGS__);     \
    }

The problem with this is that even with debugging disabled (where you may not want the bulk of printf included in your program) it will still include all that macro C code in your program. With DEBUG_LEVEL enabled:

#define DEBUG_LEVEL 1
DEBUG_PRINTF(0, "Starting system up...\n")

…you get:

if (0 < 1)
{
    printf ("Starting system up...\n");
}

But if you do not want debugging, and set the debug level to 0, you’d get this:

#define DEBUG_LEVEL 0
DEBUG_PRINTF(0, "Starting system up...\n")

if (0 < 0)
{
    printf ("Starting system up...\n");
}

That is more like using some syslog() library that is always present, even if you are logging things below the level that go to the system log. But the point of using macros is so you can have the code NOT included at all when you don’t want to use it.

A bit more work is needed, which leads to something like this:

#define DEBUG_LEVEL 1

#if (DEBUG_LEVEL > 0)
    #define DEBUG_PRINTF(level, ...) \
        if (level < DEBUG_LEVEL)     \
        {                            \
            printf(__VA_ARGS__);     \
        }
#else
    #define DEBUG_PRINTF(level, ...)
#endif

Now we get what we want when we want it, and get nothing when we don’t.

But that’s really not important to this story…

The above macros are things that “just work” which may explain why I see things like that often.

But today I learned they are actually not fine. Our good robot friend Copilot “taught” me that the macro above should really look like this, with the code wrapped in something like a do/while:

#define DEBUG_LEVEL 1

#if (DEBUG_LEVEL > 0)
    #define DEBUG_PRINTF(level, ...) \
        do {                         \
            if (level < DEBUG_LEVEL) \
            {                        \
                printf(__VA_ARGS__); \
            }                        \
        } while (0)
#else
    #define DEBUG_PRINTF(level, ...) \
        do { } while (0)
#endif

I found it odd that it suggested an empty “do / while(0)” and had to research this a bit. You may already know this, but since I had never seen it, I was unaware of the issue(s). Here is what Copilot says about this:

  1. It forces the macro to behave like ONE statement
  2. It makes the trailing semicolon safe
  3. It prevents `if/else` breakage
  4. It prevents partial execution
  5. It compiles away to nothing when disabled

I could argue with the robot about #5 since I have certainly used compiler that where happy to leave in unused variables and functions without even a warning about them. ;-)

Of that list, #3 is really the only oneI think that applies to this specific debug macro. As Copilot points out, this example will not work with the non-do/while version of the macro:

if (flag)
    DEBUG_PRINTF(1, "msg");
else
    do_other_thing();

That would translate into this:

if (flag)
    if (level < DEBUG_LEVEL) \
    {                        \
        printf(__VA_ARGS__); \
    }
    ;
else
    do_other_thing();

And that won’t compile.

However, the coding standards where I have worked all forbid if/else usage like that. Without having curly braces around the statements, someone might later introduce a bug like this:

if (flag)
    DEBUG_PRINTF(1, "msg");
else
    do_other_thing();
    this_will_always_run();

…because without the braces, the compiler is basically seeing the code like this:

if (flag) DEBUG_PRINTF(1, "msg"); else do_other_thing();
this_will_always_run();

The curly braces ensure the code runs the statements as intended.

If you use that original “bad” macro with the same code but add braces:

if (flag)
{
    DEBUG_PRINTF(1, "msg");
}
else
{
    do_other_thing();
}

It looks like this, and works fine:

if (flag)
{        
    if (level < DEBUG_LEVEL) \
    {                        \
        printf(__VA_ARGS__); \
    }
    ;
}
else
{
    do_other_thing();
}

But, the addition of the “useless” do/while loop (or similar logic) around the macro would make it work with the non-curly brace version:

if (flag)
    do {                         \
        if (level < DEBUG_LEVEL) \
        {                        \
            printf(__VA_ARGS__); \
        }                        \
    } while (0)
    ;
else
    do_other_thing();

Try it yourself.

Here is a sample you can test in a web browser:

https://onlinegdb.com/KpTEGVsg7

Or, here is the code:

#include <stdbool.h>
#include <stdio.h>

#define DEBUG_LEVEL 1

#if (DEBUG_LEVEL > 0)
    #define DEBUG_PRINTF(level, ...) \
        do {                         \
            if (level < DEBUG_LEVEL) \
            {                        \
                printf(__VA_ARGS__); \
            }                        \
        } while (0)
#else
    #define DEBUG_PRINTF(level, ...) \
        do { } while (0)
#endif

void do_other_thing () { return; }

int main()
{
    bool flag = true;

    if (flag)
        DEBUG_PRINTF (0, "Hello\n");
    else
        do_other_thing();

    return 0;
}

Here’s where you come in…

What have you seen? I know modern PC/Linux/Mac environments don’t care about a few extra K of code space, so far fancier debug logging is likely common. But for embedded space, have you ran into the macros like I have, or do you always see the “do / while(0)” ones or something else?

Comment away…

Until next time…

My C compiler is not like your C compiler.

I have worked with a variety of C compilers at my day jobs over the years. Although they try to be “real” compilers, they often have limitations that would drive “real” C programmers crazy.

Like, one tool I used did not allow constant strings longer than 80 characters. The error it gave was unhelpful, but eventually support was able to figure out what was going on.

Imagine my surprise to find that this would simply not build:

char *msg = "123456789012345678901234567890123456789012345678901234567890123456789012345678901";

…but removing one character allowed it to build just fine.

Another compiler did not allow constants to be passed as arguments to a function. You could not do this:

strcpy (buffer, "Hello"); // This wouldn't work.

…and had to do this instead:

char *temp = "Hello";
strcpy (buffer, temp); // This is the way.

And that was just the start of it. In that world, the “const” keyword was basically off-limits to use as you might be used to using it. A function that is not supposed to mess with a buffer passed in might look like this:

void function (const char *buffer_ptr, size_t buffer_size)
{
    // stuff...
}

…but since you cannot pass const data into functions on that compiler, that code will not compile.

And if you are trying to use code between a modern system (like a PC) and this compiler, removing “const” then has the modern compiler — with strict warnings turned on — will issue warnings if you pass a “constant string” into that function because it removes that “const” protection since it is a “char *” instead of a “const char *”.

My career is fun. Let’s use say I use “#ifdef” more than most C programmers ever have to…

If you don’t have to deal with this type of stuff, I hope you appreciate that. Embedded programmers have to do so much more work with so much less resources to get something done ;-)

Until next time…

A bug that almost made me quit my career.

…and there have been many like this.

I try to wrote gooder code, I really do. I am happy to own up to one of my own bugs, but when someone elses’ bug wastes hours or even days of my time, I am … not so happy.

And when “someone else” is whoever wrote the compiler I am using, and it created a bug that defies logic… well, that’s when I blog about it.

Consider this:

unsigned int type = 1;

printf ("Is %u == %u? ", p_tlv_table[table_entry].type, type);

if (p_tlv_table[table_entry].type == type)
{
    printf ("Yes.\r\n");
}
else
{
    printf ("No.\r\n");
}

The table is a structure that contains sets of Type-Length-Value numbers, but that is not important. What is important is that the debug output prints the following:

Is 1 == 1? No.

What if I told you that 1 == 1 is not true?

At the time of the printf, the value of “type” in the table at this entry is 1. It prints 1. It is 1.

And, at the time of the printf, the value of “type” is 1. It prints 1. It is 1.

Yet … the simple “if” fails, reporting that 1 does not equal 1.

This, my friends, is an S.C.T. – Strange Compiler Thing. (I just made that up, but I have been using S.W.T – Strange Window Thing – for way too many years.)

The values in the table are:

typedef struct
{
   uint8_t  type;
   uint8_t  length;
   uint16_t offset; // or uint8_t if not struct is > 255 bytes.
} tlv_offset_entry_t;

So “type” from that table entry is a U8, while “type” in the local variable is an unsigned int. But that should not matter. Test code like this in the same program works as you would expect:

unsigned int val1 = 1;
uint8_t      val2 = 1;

printf ("%u == %u is %d\r\n", val1, val2, (val1 == val2));

That will print 1, indicating that “val1 == val2” was true. They are equal.

So that is not the issue. I mean, you couldn’t do much if you can’t compare variables of different sizes like that.

And if I compare the same structure outside of the function it was passed into, the comparison works as expected.

A compiler quirk strikes again.

And when you can’t check “if (1 == 1)”, this is a good one ;-)

Welcome to my world. (Within minutes of me finally getting code down to be easily reproducible, the company is already working with me to identify why this happens. They rock.)

POKE versus PRINT

See Also: part 1 and part 2 (coming soon).

This is a followup to a recent post I made about making a string in Color BASIC contain data from the text screen memory.

ASCII me no questions…

Color BASIC deals with a version of ASCII where specific numbers represent specific characters/letters:

https://en.wikipedia.org/wiki/ASCII

On the old school 8-bit home computers, not all of them used ASCII. Commodore used a variation called PETSCII, and the Atari 8-bits used ATASCII. While the trick discussed in this article might work on other systems that have a VARPTR or similar command, this discussion will be specifically about the character set in the Radio Shack Color Computer.

ASCII 65 is the uppercase letter ‘A’

PRINT CHR$(65)
A

If you POKE the value of 65 to the first position on the 32×16 text screen (location 1024), you will also see an uppercase 65.

POKE 1024,65

However, the embedded font data in the MC6847 VDG video generator chip does not follow ASCII for all of its characters. For example, CHR$(0) to CHR(31) are non printable characters. On the CoCo, two of them do something special — CHR$(8) will print a backspace and CHR$(13) will print an ENTER:

PRINT "HELLO";CHR$(8);"THERE";CHR$(13);"HOWDY"
HELLTHERE
HOWDY

It would have been nice if the CoCo could have done a beep for CHR$(7) like Apple 2s did, or clear the screen with CHR$(12) like many other systems did, but those are the only two that do anything other than “print nothing” on the CoCo.

If you POKE around a bit…

While you will not see anything if you PRINT those characters, if you POKE those values to the screen memory you will see something. For example, you could POKE characters 0 to 31 to the first row of the 32 column text screen like this:

FOR A=0 TO 31:POKE 1024+A,A:NEXT

The character set in the video chip has 0-31 representing reverse video characters “@” (AT sign) to “<-” (left arrow). We can expand that loop to POKE the first 128 characters onto the video screen:

FOR A=0 TO 127:POKE 1024+A,A:NEXT

But for PRINTing the ASCII characters, we have already established nothing shows up for characters 0-31, but things do PRINT when for 32-128:

FOR A=32 TO 127:PRINT CHR$(A);:NEXT

I put together this sloppy program that will show the differences, 32 characters at a time, of what you get when you PRINT the character values versus POKE the character values:

0 'POKEPRNT.BAS
10 CLS

20 PRINT@0,"PRINT 0-31:"
30 FOR A=0 TO 31:PRINT CHR$(A);:NEXT
40 PRINT@64,"POKE 0-31:"
50 FOR A=0 TO 31:POKE 1120+A,A:NEXT

60 PRINT@128,"PRINT 32-63:"
70 FOR A=32 TO 63:PRINT CHR$(A);:NEXT
80 PRINT@192,"POKE 32-63:"
90 FOR A=0 TO 31:POKE 1248+A,32+A:NEXT

100 PRINT@256,"PRINT 64-95:"
110 FOR A=64 TO 95:PRINT CHR$(A);:NEXT
120 PRINT@320,"POKE 64-95:"
130 FOR A=0 TO 31:POKE 1376+A,64+A:NEXT

140 PRINT@384,"PRINT 96-127:"
150 FOR A=96 TO 127:PRINT CHR$(A);:NEXT
160 PRINT@448,"POKE 96-127:"
170 FOR A=0 TO 31:POKE 1504+A,96+A:NEXT

999 GOTO 999

Looking at this, you can see only the characters 64-95 match between PRINT and POKE.

This means that the “copy screen to a string” concept from my earlier post doesn’t really do what we might expect. It does copy the data, but if we PRINT it back, we do not get back exactly what we started with.

This is the same thing that would happen if you tried to build a string by using PEEK from screen memory. This example prints stuff on the first line of the screen, then builds a string made of up characters using the PEEK value of that first line:

0 'PEEK2STR.BAS
10 CLS
20 PRINT "HELLO, WORLD! THIS IS A TEST."
30 FOR A=1024 TO 1024+31
40 A$=A$+CHR$(PEEK(A))
50 NEXT
60 PRINT "PEEKED STRING:"
70 PRINT A$

And running that shows this awfulness…

Yuck!

But that’s okay since there is not much use to copying TEXT data and then putting it back with PRINT. PRINT is fast, and we can easily PRINT that text data. Sure, there could be benefits if stuff being PRINTed is doing calculations and such to generate the output, but this trick won’t help there.

However, the semi graphics characters (128-256) are the same between PRINT and POKE.

0 'POKEPRT2.BAS
10 CLS
20 FOR A=0 TO 255
30 PRINT@A,CHR$(A);
40 POKE 1280+A,A
50 NEXT
60 GOTO 60

The top half is the PRINT CHR$ and the bottom half is the POKE:

Since there is no way on the CoCo to type those semi graphics characters into a string (pity, the later MC-10 could do this), we are forced to PRINT them like this:

PRINT CHR$(128);CHR$(128);CHR$(128)

That would print three black blocks. To speed things up, we could pre-generate a string of those three black blocks then we can PRINT that string very fast later:

A$=CHR$(128);CHR$(128);CHR$(128)
PRINT A$

And now you know why I chose to do a “splash screen” example for my demo in part 1. I initially tried it using the TEXT characters and quickly remembered why that can’t work (as explained here).

But it’s still a neat trick.

Bonus: This is stupid

For dumb fun, here is a program that makes A$ be whatever is on the first 32 character line of the screen.

0 'DUMBSTRN.BAS
10 A$="":A=VARPTR(A$):POKEA,32:POKEA+2,4:POKEA+3,0

When you RUN that, doing a PRINT A$ will show a 32 character line that is whatever was on the first line of the screen. If you do a “CLS” to clear the screen and show “OK” on the top line, then PRINT A$, you will see “OK” followed by 30 reverse @ symbols, which is CHR$(96) — but in video memory, a 96 is an empty block (space).

And with that, I’m going to stop now.

Unit next time…

I may never use a #define in C again. Except when I need to.

I have had a few glorious weeks of C coding at my day job. It is some of the best C code I have ever written, and I am quite proud of it.

During this project, I learned something new. In the past, I would commonly use #define for values such as:

#define NONE 0
#define WARNING 1
#define ERROR 2

There might be some code somewhere that made us of those values:

void HandleError (int error)
{
   case NONE:
      // All fine.
      break;

   case WARNING:
      // Handle warning.
      break;

   case ERROR:
      // Handle error
      break;

    default:
      // Handle unknown error.
      break;
}

When you use #define like that, the C pre-processor will just do simple substitutions before compiling the code — “NONE” will become “0”. Because of this, those labels mean nothing special to the compiler — they are just numbers.

But, I had learned you could use an enum and ensure a function only allows passing the enum rather than “whatever int”:

typedef enum
{
   NONE = 0,
   WARNING = 1,
   ERROR = 2
} ErrorEnum;

Now the routine can take an “ErrorEnum” as a parameter type.

void HandleError (ErrorEnum error)
{
   case NONE:
      // All fine.
      break;

   case WARNING:
      // Handle warning.
      break;

   case ERROR:
      // Handle error
      break;

    default:
      // Handle unknown error.
      break;
}

This looks nicer, since it is clear what is being passed in (whatever “ErrorEnum” is). It does not necessarily give any extra compiler warnings if you pass in an int instead, since an enum is just an int, by default.

In C, an enum is a user-defined data type that represents a set of named values. By default, the underlying data type of an enum is int, but it can be explicitly specified to be any integral type using a colon : followed by the desired type.

– ChatGPT

Thus, the compiler I am using for this test (Visual Studio, building a C program) does not complain if I do something like this:

int error = 1;
HandleError (error);

However, I just learned of an interesting benefit to using an enum for a switch/case. The compiler can warn you if you don’t have a case for all the items in the enum!

typedef enum {
	NONE,
	WARNING,
	ERROR
} ErrorEnum;

void HandleError(ErrorEnum error)
{
	switch (error)
	{
	case NONE:
		// All fine.
		break;

	case WARNING:
		// Handle warning.
		break;

}

Building that with warnings enabled reports:

enumerator 'ERROR' in switch of enum 'ErrorEnum' is not handled	CTest	

How neat! I actually ran in to this when I had added some extra error types, but a routine that converted them to a text string did not have cases for the new errors.

const char *GetErrorString (ErrorEnum error)
{
   const char *errorStringPtr = "Unknown";

   switch NONE:
      errorStringPtr = "None";
      break;
}

That is a useful warning, and it found a bug/oversight.

Because of this, I will never use #defines for things like this again. Not only does a data type look more obvious in the code, the compiler can give you extra warnings.

Until next time…

Parsing message bytes in to a C structure (and back)

In a recent article, I discussed methods to copy variables (or structure elements) in to a packed buffer, as well as parse buffer bytes back in to variables.

Doing it manually was clunky. Using my GetPutData routines made it easier to be clunky.

But today, we’ll make it super simple and far less clunky.

Defining the message bytes

Let’s start with a simple message. It will be represented by a C structure that we’d use in the program. For example, here are the elements that might be part of a “Set Date and Time” message:

typedef struct
{
    uint16_t year;
    uint8_t month;
    uint8_t day;
    uint8_t hour;
    uint8_t minute;
    uint8_t second;
    uint8_t timeZone;
    bool isDST;
} SetDateTimeMessageStruct;

The packed message would be nine bytes and look like this:

[ 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 ]
  \  /    |   |   |   |   |   |   |
  year   mon day hr  min sec tz  dst

To automate this process, a lookup table is created that represents the size of each element and the offset to where it is located within the structure.

offsetof()

Inside stddef.h is a macro that is used to calculate where in the structure’s memory a particular element exists. It is a relative offset from wherever the structure variable starts in memory. Here is an example showing offsetof():

#include <stdio.h>
#include <stdlib.h>  // for EXIT_SUCCESS
#include <stdint.h>
#include <stdbool.h>
#include <stddef.h>  // for offsetof()

typedef struct
{
    uint8_t     a;
    uint16_t    b;
    bool        c;
    uint32_t    d;
    float       e;
    double      f;
} MyStruct;

int main()
{
    MyStruct test;
    
    printf ("sizeof(test) = %ld\n", sizeof(test));
    
    printf ("a is %lu bytes at offset %lu\n",
            sizeof(test.a), offsetof(MyStruct, a));
    printf ("b is %lu bytes at offset %lu\n",
            sizeof(test.b), offsetof(MyStruct, b));
    printf ("c is %lu bytes at offset %lu\n",
            sizeof(test.c), offsetof(MyStruct, c));
    printf ("d is %lu bytes at offset %lu\n",
            sizeof(test.d), offsetof(MyStruct, d));
    printf ("e is %lu bytes at offset %lu\n",
            sizeof(test.e), offsetof(MyStruct, e));
    printf ("f is %lu bytes at offset %lu\n",
            sizeof(test.f), offsetof(MyStruct, f));

    return EXIT_SUCCESS;
}

Try it: https://onlinegdb.com/GDvhi9TLq

It produces the following output:

sizeof(test) = 24
a is 1 bytes at offset 0
b is 2 bytes at offset 2
c is 1 bytes at offset 4
d is 4 bytes at offset 8
e is 4 bytes at offset 12
f is 8 bytes at offset 16

This is an example of how elements in a structure may be padded to ensure each one starts on an even-byte in memory. ‘a’ is one byte starting at offset 0, but ‘b’ doesn’t start at 1. As a 16-bit value, it has to skip a byte and start at offset 2. There are more skipped bytes after ‘c’ at offset 4. Since the next value is a 32-bit value, it skips three bytes.

If we just wrote the structure memory out (in this architecture), it would look like this:

[ 0| 1| 2| 3| 4| 5| 6| 7| 8| 9|10|11|12|13|14|15|16|17|18|19|20|21|22|23]
  |     |     |           |           |           |
  a     b--b  c           d--d--d--d  e--e--e--e  f--f--f--f--f--f--f--f

That would waste 4 bytes in that 24 byte message. Plus, if the message were parsed by a system that had different padding/alignment requirements, they wouldn’t be able to read the buffer back in to their structure variable and get the correct data.

Using offsetof() to determine where in a structure an element exists, a lookup table could be created. A C function could be written to use that table to know where to copy the data. Here’s a rough concept:

typedef struct
{
    size_t size;   // size of structure element
    size_t offset; // offset from start of structure to that element
} ElementOffsetStruct;

MyStruct test;

ElementOffsetStruct testTable[] =
{
    { sizeof(test.a), offsetof(MyStruct, a) },
    { sizeof(test.b), offsetof(MyStruct, b) },
    { sizeof(test.c), offsetof(MyStruct, c) },
    { sizeof(test.d), offsetof(MyStruct, d) },
    { sizeof(test.e), offsetof(MyStruct, e) },
    { sizeof(test.f), offsetof(MyStruct, f) },
    { 0, 0 } // end of table
};

for (int idx=0; testTable[idx].size != 0; idx++)
{
    printf ("? is %u bytes at offset %u\n",
            testTable[idx].size, testTable[idx].offset);
}

Knowing where items are in the structure allows the specific element bytes to be copied to a buffer. Something like this:

uint8_t buffer[80];
unsigned int offset;

offset = 0;
for (int idx=0; testTable[idx].size != 0; idx++)
{
    printf ("Copy %lu bytes at structure offset %lu to buffer offset %u\n",
            testTable[idx].size, testTable[idx].offset, offset);

    memcpy (buffer + offset,
        (uint8_t*)&test + testTable[idx].offset,
        testTable[idx].size);
        
    offset = offset + testTable[idx].size; 
}

“And it’s just that easy.” Running the following test program dumps the memory contents of the C structure “test” and then writes them out to a buffer using this code and then dumps the memory contents of the buffer.

#include <stdio.h>
#include <stdlib.h>  // for EXIT_SUCCESS
#include <stdint.h>
#include <stdbool.h>
#include <stddef.h>  // for offsetof()
#include <string.h>  // for memcpy()

typedef struct
{
    uint8_t     a;
    uint16_t    b;
    bool        c;
    uint32_t    d;
    float       e;
    double      f;
} MyStruct;

typedef struct
{
    size_t size;   // size of structure element
    size_t offset; // offset from start of structure to that element
} ElementOffsetStruct;

int main()
{
    MyStruct test = { 0x11, 0x2222, true, 0x33333333, 42.42, 42.42 };
    
    ElementOffsetStruct testTable[] =
    {
        { sizeof(test.a), offsetof(MyStruct, a) },
        { sizeof(test.b), offsetof(MyStruct, b) },
        { sizeof(test.c), offsetof(MyStruct, c) },
        { sizeof(test.d), offsetof(MyStruct, d) },
        { sizeof(test.e), offsetof(MyStruct, e) },
        { sizeof(test.f), offsetof(MyStruct, f) },
        { 0, 0 } // end of table
    };
    
    for (int idx=0; idx<sizeof(test); idx++)
    {
        printf ("%02x ", ((uint8_t*)&test)[idx]);
    }
    printf ("\n");

    uint8_t buffer[80];
    unsigned int offset;
    
    offset = 0;
    for (int idx=0; testTable[idx].size != 0; idx++)
    {
        printf ("Copy %lu bytes at structure offset %lu to buffer offset %u\n",
                testTable[idx].size, testTable[idx].offset, offset);
    
        memcpy (buffer + offset,
            (uint8_t*)&test + testTable[idx].offset,
            testTable[idx].size);
            
        offset = offset + testTable[idx].size; 
    }

    for (int idx=0; idx<offset; idx++)
    {
        printf ("%02x ", buffer[idx]);
    }
    printf ("\n");

    return EXIT_SUCCESS;
}

I get the following output:

11 00 22 22 01 00 00 00 33 33 33 33 14 ae 29 42 f6 28 5c 8f c2 35 45 40 
Copy 1 bytes at structure offset 0 to buffer offset 0
Copy 2 bytes at structure offset 2 to buffer offset 1
Copy 1 bytes at structure offset 4 to buffer offset 3
Copy 4 bytes at structure offset 8 to buffer offset 4
Copy 4 bytes at structure offset 12 to buffer offset 8
Copy 8 bytes at structure offset 16 to buffer offset 12
11 22 22 01 33 33 33 33 14 ae 29 42 f6 28 5c 8f c2 35 45 40 

As I discovered when I figured this out, there are a few gotchas. First, the sample code here requires the structure variable to exist. This is because the macro sizeof() needs to have something to work on to determine the size:

sizeof(test.a)

I asked ChatGPT about this, and it provided me a clever workaround that worked on the structure itself rather than a variable:

sizeof(((MyStruct*)0)->a)

Looking at that, I think I get how it works. First, it is taking a memory location of 0 and casting that to be a pointer to a MyStruct. No such structure exists at memory 0, but you can cast a pointer to anywhere (even 0) as any kind of pointer you want. (Actually using a pointer that point to 0 would be bad, however.)

Then, it gets the offset of the “a” element from the structure we pretend is located at memory location 0. Let’s break that apart a bit:

sizeof(                 )
       (            )->a
        (MyStruct*)0

And this is enough to allow creating a table based on a structure, and not on a variable declared using that structure.

typedef struct
{
    uint8_t u8;
    uint32_t u32;
} SmallStruct;

ElementOffsetStruct smallTable[] =
{
    { sizeof(((SmallStruct*)0)->u8), offsetof(SmallStruct, u8) },
    { sizeof(((SmallStruct*)0)->u32), offsetof(SmallStruct, u32) },
    { 0, 0 }
};

This would allow an easy way to define the structure and a table of offsets for later use. I’d probably make the table structure const since it won’t need to change.

With this framework figured out, a reverse function could be done that would copy bytes from a buffer in to the correct offset inside the structure.

unsigned int offset;
    
offset = 0;
for (int idx=0; testTable[idx].size != 0; idx++)
{
    printf ("Copy %lu bytes at buffer offset %u to structure offset %lu\n",
            testTable[idx].size, offset, testTable[idx].offset);
    
    memcpy ((uint8_t*)&test + testTable[idx].offset,
            buffer + offset,
           testTable[idx].size);
            
    offset = offset + testTable[idx].size; 
}

And now, by creating a structure and a table of size/offsets, any structure can be written to a buffer, or decoded from a buffer.

Too. Much. Typing.

For the final touch, macros can be made to simplify building the table.

#define ENTRY(s, v) { sizeof(((s*)0)->v), offsetof(s, v) }
#define ENTRYEND    {0, 0}

Now, instead of having to type all that gibberish out, the table can be simplified:

ElementOffsetStruct testTable[] =
{
    ENTRY(MyStruct, a),
    ENTRY(MyStruct, b),
    ENTRY(MyStruct, c),
    ENTRY(MyStruct, d),
    ENTRY(MyStruct, e),
    ENTRY(MyStruct, f),
    ENTRYEND
};

And this is just what I have posted to my GitHub:

https://github.com/allenhuffman/StructureToFromBuffer

You can find a short README there that explains the two functions and how to use them. It greatly simplifies the process of creating message buffers or parsing them in to C structures that can easily be used in a program.

If you try it and find it useful, please let me know.

Until then…

strcpy versus strncpy and -Wstringop-truncation

I had originally envisioned this post to be another “Old C dog, new C tricks” article, but it really is more of a “how long has it done this? did I just never read the manual?” article.

In C, strcpy() is used to copy a C string (series of character bytes with a 0 byte marking the end of the string) to a new buffer. Some years ago, I wrote a series about the dangers of strcpy() and other functions that look for that 0 byte at the end. I later wrote a follow-up that discussed strcat().

char buffer[10];

strcpy (buffer, "Hello");

Which means something like this can crash your program:

char buffer[10];

strcpy (buffer, "This will cause a buffer overrun.);

strncpy(), on the other hand, takes a third parameter which is the maximum number of characters to copy. Once it reaches that number, it stops copying. If you are copying to a 20-byte buffer, setting this value to 20 will ensure you do not overwrite that 20 byte buffer.

char buffer[20];

strncpy (buffer, "Hello", sizeof(buffer));

But there’s a problem… While strncpy() will blindly copy the source to the destination and then add the terminating 0 byte (without any care or concern to how much room is available at the destination), strncpy() will only add the terminating 0 if the source is smaller than the maximum size value.

This part I was aware of. In my earlier articles, I suggested this as a workaround to ensure the destination is always 0 terminated, even if the source string is as large or larger than the destination buffer:

char buffer[10];

strncpy (buffer, "1234567890", sizeof(buffer)-1); // subtract one
buffer[sizeof(buffer)-1] = '\0'; // 0 terminate

Above, sizeof(buffer) is 10. The copy would copy up to 9 bytes, so it would copy over “123456789”. Then, in the buffer at position 9 (bytes 0 to bytes 9 is ten bytes) it would place the zero.

Problem solved.

But that’s not important to this post.

Instead, there is another behavioral difference I wanted to mention — one that you likely already know. I must have know this and just forgotten. Surely I knew it. It even explains this in the C resource I use when looking up parameters to various C functions:

strncpy() will copy the bytes of the source buffer, and then pad the destination with zeros up until the maximum size value.

#include <stdio.h>
#include <string.h>

#define BUFFER_SIZE 32

void HexDump (const char *label, void *dataPtr, size_t dataSize);

int main()
{
    char buffer1[BUFFER_SIZE] = { 0 };
    char buffer2[BUFFER_SIZE] = { 0 };

    memset (buffer1, 0xff, sizeof(buffer1));
    memset (buffer2, 0xff, sizeof(buffer1));
    
    HexDump ("buffer1", buffer1, sizeof(buffer1));
    HexDump ("buffer2", buffer1, sizeof(buffer1));

    strcpy (buffer1, "Hello");
    strncpy (buffer2, "Hello", sizeof(buffer2));

    HexDump ("buffer1", buffer1, sizeof(buffer1));
    HexDump ("buffer2", buffer2, sizeof(buffer2));
    
    return 0;
}

void HexDump (const char *label, void *dataPtr, size_t dataSize)
{
    printf ("--- %s ---\n", label);
    for (unsigned int idx=0; idx<dataSize; idx++)
    {
        printf ("%02x ", ((unsigned char*)dataPtr)[idx]);
        if (idx % 16 == 15) printf ("\n");
    }
    printf ("\n");
}

Above, this program makes a buffer and sets all the bytes in it to 0xff (just so we can see the results easier).

It then uses strcpy() to copy a string (which will copy up until the 0 byte at the end of the source string), and strncpy() to copy the string to a second butter, with the maximum size specified as the destination buffer size.

Here is the output:

--- buffer1 ---
ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff

--- buffer2 ---
ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff

--- buffer1 ---
48 65 6c 6c 6f 00 ff ff ff ff ff ff ff ff ff ff
ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff

--- buffer2 ---
48 65 6c 6c 6f 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

The first “buffer1” and “buffer2” shows the buffers initiated with 0xff’s.

The second “buffer1” shows the result of strcpy() – the five characters of “Hello” copied over, with a 0 byte added.

The second “buffer2” shows the same string copied using strncpy() with the size set to 16, the size of the buffer. You can see it copied the five characters of “Hello” and then filled the rest (up to the size) with 0 bytes.

Was this always the case with strncpy(), or did this get enhanced in later versions of C? I see this is fully documented at places like cplusplus.com:

Copy characters from string

Copies the first num characters of source to destination. If the end of the source C string (which is signaled by a null-character) is found before num characters have been copied, destination is padded with zeros until a total of num characters have been written to it.

No null-character is implicitly appended at the end of destination if source is longer than num. Thus, in this case, destination shall not be considered a null terminated C string (reading it as such would overflow).

– cplusplus.com entry on strncpy()

Why bring this up now?

The only reason I bring this up today is because I saw a new compiler warning I had never seen before recently.

warning: ‘strncpy’ output truncated before terminating nul copying X bytes from a string of the same length [-Wstringop-truncation]

That was the first time I’d ever seen this warning. It is a useful warning, since it informs you that the destination string will NOT be 0 terminated:

char buffer[10];
strncpy (buffer, "1234567890", 10);

And that reminded me of that strncpy() behavior, and made me change how I was using it in my program.

I also saw this variation:

warning: 'strncpy' output truncated copying X bytes from a string of length Y [-Wstringop-truncation]

This warning popped up when I had a source buffer that was longer than the destination. That should be fine, since strncpy() is told how much to copy over, max. I was puzzled why this would even be a warning. I mean, you specifically put the number in there that says “copy up to X bytes.”

I find it odd that the first message (exactly length) lets you know you don’t have a 0 terminated destination buffer, but the second just says “hey, we couldn’t copy as much as you request.”

Anyway, I found it interesting.

Until next time…

BARR-C coding standard and switch and case

At my day job (I just made a new category for these posts) we have been working on an official coding standard to use for our software projects. When I was hired five years ago, I was given a three-page document about “Software Best Practices” that served as a casual guide to how code should be formatted and how functions and variables should be named. From looking at the million+ lines of code I maintain, it is clear that some items were adhered to, while others were ignored completely (thankfully; I disliked the suggested variable naming).

BARR-C focuses on embedded C programming and, unlike the coding standards I have seen at other jobs, it focuses on bug reduction rather than making the code pretty. I purchased a physical copy of the book, but you can download a PDF of it for free:

https://barrgroup.com/sites/default/files/barr_c_coding_standard_2018.pdf

While I consider myself quite stubborn or stuck in my ways, I can flip on a dime if presented compelling new information. The BARR-C standard is making me rethink a few things. Here is one example, which I share with you to get your take on it.

switch and case

// The way I have been using
switch (color)
{
case RED:
stop();
break;

case YELLOW:
slow();
break;

case GREEN:
speed();
break;

default:
break;
}

I am very used to seeing the statements indented past the “case”. But, one of the editors I work with constantly moves those statements back to where they line up with the switch:

// The way one of my editors wants me to type it:
switch (color)
{
case RED:
stop();
break;

default:
break;
}

I certainly don’t like the idea of code being at the same level of the braces. This seems like an odd default to me. Have you seen this elsewhere? ‘prolly has some specific name for this convention.

And recently, I ran into something new that has been added by a later C standard than any I have used. It does not allow variables to be declared inside the switch! That seemed odd, since–at some point–the C standard moved away from “all variables must be declared at the top of the function” to “yeah, wherever you want, it’s fine.”

void function ()
{
int counter; // We used to have to do this only here.

...some lines later...

while (active)
{
int counter = 0; // But now we can do it here.

// ...stuff...
}
}

Today, I prefer “variables used just in this bit of code” to be declared around that code so it is much easier to see what that variable is used for. Of course, this wouldn’t matter if every function was small enough to completely fit on the screen at the time. Sadly, I always seem to work with legacy functions that are hundreds of lines long.

But I digress…

There is something new (to me) that now disallows declaring variables in the case. I have seen this done (and still do it myself) for many years. To make it work, you need an extra set of curly braces which causes switch/cases that look like this:

switch (color)
{
case RED:
{
int x; // Now works because braces.
// ...stuff...
break;
}

default:
break;
}

I suppose this has advantages. It is making a scoped (is that the term?) variable just inside those braces of the case. Each case could make its own “x” and use it, if it wanted to, I suppose.

Side Note: Of course I had to try this out. Indeed, by default, this works without a peep from the compiler, but if you enable the proper warnings you will at least get “warning: declaration of ‘x’ shadows a previous local [-Wshadow]”.

int main()
{
    int x = 42;
    
    switch (x)
    {
        case 1:
        {
            int x = 1;
            printf ("x = %d\n", x);
            break;
        }
        
        case 2:
        {
            int x = 2;
            printf ("x = %d\n", x);
            break;
        }
        
        default:
            printf ("x = %d\n", x);
            break;
    }

    return 0;  
}

But that’s not important to this story… The BARR-C is giving me a new formatting option, and a reason why I might want to do it. It lines up the “case” and “break” together:

switch (err)

{

 case ERR_A:

 ...

 break;



 case ERR_B:

 ...

 // Also perform the steps for ERR_C.

case ERR_C:

 ...

 break;



default:

...

break;

}

I have never encountered the case/default and its break lined up like that before. It looks odd to me, and feels wrong. But the reason for this is given:

Reasoning: C’s switch statements are powerful constructs, but prone to errors such as omitted break statements and unhandled cases. By aligning the case labels with their break statements it is easier to spot a missing break.

– Embedded C Coding Standard, Michael Barr

Interesting. I have, on a number of occasions (including again recently), found a bug where a break was missing, or something happened where it got backspaced to the line above it where it was now at the end of a comment:

    case GIVE_UP:
// Give up and return.break;

default:
// Never surrender!!!
break;

This is a trivial example, but if there had actually been lines of code there, you’d have to look at the last line of each case to verify a break was there. But if you line up the case/break like this…

    case GIVE_UP:
// Give up and return.break;

default:
// Never surrender!!!
break;

…you can immediately notice a problem where the “patterns don’t match,” which our brains seem to notice easier.

Using curly braces would not make a missing break stand out — in fact, it might make you assume it is all good because you see the closing brace there.

So I kinda like it.

Even if I hate it.

What say you? Comments if you got ’em.

Side Note 2: Since I originally typed this in, I have now fully converted to this formatting look. And, it has already helped me spot an issue like the one I mentioned earlier — without me having to scrounge line by line through the code trying to figure out what is going on. Nice.

Until next time…

C and and and iso646.h

During some research for my day job, I was pointed to a utility written by Guillaume Dargaud that converts LabWindows/CVI user interface files (.uir) over to HTML so they can run in a web browser instead of as a Windows UI app. The program, written in C, can be found here:

https://www.gdargaud.net/Hack/PanelToCgi.html

https://gitlab.com/dargaud/PanelToForm

But that is not really what this post is about. Instead, it is about seeing lines in his code (originally created in 2000) like this:

Active = Mode!=VAL_INDICATOR and Visible and !Dimmed;

if (Fmin<-1e30 and Fmax>1e30) {

if (Dmin<-1e30 and Dmax>1e30) {

That looked more like Visual BASIC or something than C to me. “and”? Shouldn’t that be “&&” in C for a logical and?

I mentioned this in the forum post where I learned of this utility, and received this response:

“It’s been standard C since the introduction of #include <iso646.h> in the 90s. I always found it more readable this way.”

– gdargaud, via https://forums.ni.com/t5/LabWindows-CVI/LabWindows-UI-to-ncurses-Linux-macOS-terminal/m-p/4472028

iso646.h

This one is new to me. Wikipedia has a page about it:

https://en.wikipedia.org/wiki/C_alternative_tokens

Here is what that header file looks like on the Microchip MPLAB PIC24 compiler:

#ifndef _ISO646_H
#define _ISO646_H

#ifndef __cplusplus

#define and &&
#define and_eq &=
#define bitand &
#define bitor |
#define compl ~
#define not !
#define not_eq !=
#define or ||
#define or_eq |=
#define xor ^
#define xor_eq ^=

#endif

#endif

Have you ever seen this used in the wild? I was surprised to see it even supported in a non-mainstream compiler, like that one.

Looking at the list, this line could have been changed further:

Active = Mode not_eq VAL_INDICATOR and Visible and not Dimmed;

And indeed, that works:

#include <stdio.h>
#include <iso646.h>

// #define and    &&
// #define and_eq &=
// #define bitand &
// #define bitor  |
// #define compl  ~
// #define not    !
// #define not_eq !=
// #define or     ||
// #define or_eq  |=
// #define xor    ^
// #define xor_eq ^=

#define VAL_INDICATOR 1

int main()
{
    int Active = 0;

    int Mode = (VAL_INDICATOR + 1);
    int Visible = 1; // Visible
    int Dimmed = 1; // Dimmed
    
    printf ("Mode: %d  Visible: %d  Dimmed: %d\n", Mode, Visible, Dimmed);
    
    Active = Mode not_eq VAL_INDICATOR and Visible and not Dimmed;
    
    printf ("Active: %d\n", Active);
    
    Mode = (VAL_INDICATOR + 1);
    Visible = 1; // Visible
    Dimmed = 0; // NOT Dimmed
    
    printf ("Mode: %d  Visible: %d  Dimmed: %d\n", Mode, Visible, Dimmed);
    
    Active = Mode not_eq VAL_INDICATOR and Visible and not Dimmed;
    
    printf ("Active: %d\n", Active);

    return 0;
}

Coding standards such as the BARR-C Embedded Coding Standard specifically say not do do things like this since it makes the code harder to figure out since the user has to go look up what those defines are really set to. Imagine how the code might look fine, but be complete wrong, if a define were messed ;-)

Have you seen this? Do you do this? Leave a comment…