Category Archives: C Programming

C and size_t and 64-bits

Do what I say and nobody gets hurt!

At my day job, I work on a Windows application that supervises high power solid state microwave generators. (The actual controlling part is done by multiple embedded PIC24-based boards, which is a good thing considering the issues Windows has given us over the years.)

At some point, we switched from building a 32-bit version of this application to a 64-bit version. The compiler started complaining about various things dealing with “ints” which were not 64-bits, so the engineer put in #ifdef code like this:

#ifdef _NI_mswin64_
    unsigned __int64 length = 0;
#else
    unsigned int length = 0;
#endif

That took care of the warnings since it would now use either a native “int” or a 64-bit int, depending on the target.

Today I ran across this and wondered why C wasn’t just taking care of things. A C library routine that returns an “int” should always expect an int, whether that int is 16-bits (like on Arduino), 32-bits or 64-bits on the system. I decided to look in to this, and saw the culprits were things like this:

length = strlen (pathName);

Surely if strlen() returned an int, it should not need to be changed to an “unsigned __int64” to work.

And indeed, C already does take care of this, if you do what it tells you to do. strlen does NOT return an int:

size_t strlen ( const char * str );

size_t is a special C data type that is “whatever type of number it needs to be.” And by simply changing all the #ifdef’d code to actually use the data type the C library call specifies, all the errors go away and the #ifdefs can be removed.

size_t length = 0;

A better, more stricter compiler might have complained about using an “int” to catch something coming back as “size_t.”

Oh wait. It did. We just chose to solve it a different way.

Until next time…

Redundant C variable initialization for redundant reasons.

The top article on this site for the past 5 or so years has been a simple C tidbit about splitting 16-bit values in to 8-bit values. Because of this, I continue to drop small C things here in case they might help when someone stumbles upon them.

Today, I’ll mention some redundant, useless code I always try to add.,,

I seem to recall that older specifications for the C programming language did not guarantee variables would be initialized to 0. I am not even sure if the current specification defines this, since one of the compilers I use at work has a specific proprietary override to enable this behavior.

You might find that this code prints non-zero on certain systems:

int i;

printf ("i = %d\n", i);

Likewise, trying to print a buffer that has not been initialized might produce non-empty data:

char message[32];

printf ("Message: '%s'\n", message);

Because of this, it’s a good habit to always initialize variables with at least something:

int i=0;

char message[42];
...
memset (message, 0x0, sizeof(message));

Likewise, when setting variables in code, it is also a good idea to always set an expected result and NOT rely on any previous initialization. For example:

int result = -1;

if (something == 1)
{
    result = 10;
}
else if (something == 2)
{
    result = 42;
}
else
{
    result = -1;
}

Above, you can clearly see that in the case none of the something values are met, it defaults to setting “result” to the same value it was just initialized to.

This is just redundant, wasteful code.

And you should always do it, unless you absolutely positively need those extra bytes of code space.

It is quite possible that at some point this code could be copy/pasted elsewhere, without the initialization. On first compile, the coder sees the undeclared “result” and just adds “int result;” at the top of the function. If the final else with “result = -1;” wasn’t there, the results could be unexpected.

The reverse of this is also true. If you know you are coding so you ALWAYS return a value and never rely on initialized defaults, it would be safe to just do “int result;” at the top of this code. But, many modern compilers will warn you of “possibly initialized variables.”

Because of this, I always try to initialize any variable (sometimes to a value I know it won’t ever use, to aid in debugging — “why did I suddenly get 42 back from this function? Oh, my code must not be running…”).

And I always try to have a redundant default “else” or whatever to set it, instead of relying on “always try.”

Maybe two “always tries” make a “did”?

Until next time…

while and if and braces … oh my.

Another day, another issue with a C compiler of questionable quality…

Consider this bit of C code, which lives in an infinite main loop and is designed to do something every now and then (based on the status of a variable being toggled by a timer interrupt):

while (timeToDoSomething == true)
{
    timeToDoSomething = false;

    // Do something.
}

The program in question was trying to Do Something every 25 ms. A timer interrupt was toggling a boolean to true. The main loop would check that flag and if it were true it would set it back to false then handle whatever it was it was supposed to handle.

While this would have worked with “while”, it would really be better as an “if” — especially if the code to handle whatever it was supposed to handle took longer than 25ms causing the loop to get stuck.

Thus, it was changed to an “if”, but a typo left the old while still in the code:

//
while (timeToDoSomething == true)
if (timeToDoSomething == true)
{
    timeToDoSomething = false;

    // Do something.
}

Since things were taking longer than 25ms, the new code was still getting stuck in that loop — and that’s when the while (which was supposed to be commented out) was noticed.

The while without braces or a semicolon after it generated no compiler warning. That seemed wrong, but even GCC with full error reporting won’t show a warning.

Because C is … C.

Curly braces! Foiled again.

In C, it is common to see code formatted using whitespace like this:

if (a == 1)
    printf("One!\n");

That is fine, since it is really just doing this:

if (a == 1) printf("One!\n");

…but is considered poor coding style these days because many programmers seem to be used to languages where indention actually means something — as opposed to C, where whitespace is whitespace. Thus, you frequently find bugs where someone has added more code like this:

if (a == 1)
    printf("One!\n");
    DoSomething ();
    printf("Done.\n");

Above, it feels like it should execute three things any time a is 1, but to C, it really looks like this:

if (a == 1) printf("One!\n");
DoSomething ();
printf("Done.\n");

Thus, modern coding standards often say to always use curly braces even if there is just one thing after the if:

if (a == 1)
{
    printf("One!\n");
}

With the braces in place, adding more statements within the braces would work as expected:

if (a == 1)
{
    printf("One!\n");
    doSomething ();
    printf("Done.\n");
}

This is something that was drilled in to my brain at a position I had many years ago, and it makes great sense. And, the same thing should be said about using while. But while has it’s own quirks. Consider these two examples:

// This way:
while (1);

// That way:
while (1)
{
}

They do the same thing. One uses a semicolon to mark the end of the stuff to do, and other uses curly braces around the stuff to do. That’s the key to the code at the start of this post:

while (timeToDoSomething == true)
if (timeToDoSomething == true)
{
    timeToDoSomething = false;

    // Do something.
}

Just like you could do…

while (timeToDoSomething == true) printf("I am doing something");

…you could also write it as…

while (timeToDoSomething == true)
{
    printf("I am doing something");
}

So when the “if” got added after the “while”, it was legit code, as if the user was trying to do this:

while (timeToDoSomething == true)
{
    if (timeToDoSomething == true)
    {
        timeToDoSomething = false;

        // Do something.
    }
}

Since while can be followed by braces or a statement, it can also be followed by a statement using just braces.

The compiler can’t easily warn about needing a brace, since it is not required to have braces. But if it braces were required, that would catch the issues mentioned here with if and while blocks.

Code that looks like it should at least generate a warning is completely valid and legal C code, and that same code can be formatted in a way that makes it clear(er):

while (timeToDoSomething == true)
    if (timeToDoSomething == true)
    {
        timeToDoSomething = false;

        // Do something.
    }

Whitespace makes things look pretty, but lack of it can also make things look wrong. Or correct when they aren’t.

I suppose the soapbox message of today is just to use braces. That wouldn’t have caught this particular typo (forgetting to comment something out), but its probably still good practice…

Until next time…

Short circuiting of C comparisons

In a language like C, you often have multiple ways to accomplish the same thing. In general, the method you use shouldn’t matter if the end result is the same.

For example:

// This way:

if (a == 1)
{
   function1();
}
else if (a == 2)
{
   function2();
}
else
{
   unknown();
}

// Versus that way:

switch (a)
{
   case 1:
      function1();
      break;

   case 2:
      function2();
      break;

   default:
      unknown();
      break;
}

Both of those do the same thing, though the code they generate to do the same thing may be different.

We might not care which one we use unless we are needing to optimize for code space, memory usage or execution speed.

Optimizing for these things can be done by trail-and-error testing, but there is no guarantee that the method that worked best on the Arduino 16-bit processor and GCC compiler will be the same for a 64-BIT ARM processor and CLANG compiler.

If you ever do make a choice like this, just be sure to leave a comment explaining why you did it in case your code ever gets ported to a different architecture or compiler.

Short circuiting

Many things in C are compiler-specific, and are not part of the C standard. Some compilers are very smart and do amazing optimizations, while others may be very dump and do everything very literally. Here is an example of something I encountered during my day job that may or may not help others.

I had code that was intended to adjust power levels, unless any of the four power generators hit a maximum level. It looked like this:

// Version 1: Do nothing if power limit exceeded.
if ((Power1 > Power1Max) ||
    (Power2 > Power2Max) ||
    (Power3 > Power3Max) ||
    (Power4 > Power4Max))
{
   // Max power hit. Do nothing.
}
else
{
   increasePower ();
}

Having conditionals lead to nothing seems odd. Wouldn’t it make more sense to check to see if we can do the thing we want to do?

// Version 2: Do something is power limit not exceeded.
if ((Power1 < Power1Max) &&
    (Power2 < Power2Max) &&
    (Power3 < Power3Max) &&
    (Power4 < Power4Max))
{
   increasePower ();
}

That looks much nicer. Are there any advantages to one versus the other?

For Version 1, the use of “OR” lets the compiler stop checking the moment any of those conditions is met. If Power1 is NOT above the limit, it then checks to see if Power2 is above the limit. If it is, we are done. We already know that one of these items is above, so no need to check the others. This works great for simple logic like this.

For Version 2, the use of “AND” requires all conditions to be met. If we check Power1 and it is below the limit, we then and check Power2. If that one is NOT below, we are done. We know there is no need to check any of the others.

Those sure look the same to me, and Version 2 seems easier to read.

The first example is basically saying “here is why we won’t do something” while the second example is “here is why we WILL do something.”

Does it matter?

To be continued…

16-bits don’t always add up.

Consider this simple program:

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>

int main(int argc, char **argv)
{
    uint16_t    val1;
    uint16_t    val2;
    uint32_t    result;

    val1 = 40000;
    val2 = 50000;

    result = val1 + val2;

    printf ("%u + %u = %u\n", val1, val2, result);

    return EXIT_SUCCESS;
}

What will it print?

On my Windows PC, I see the following:

40000 + 50000 = 90000

…but if I convert the printf() and run the same code on an Arduino:

void setup() {
  // put your setup code here, to run once:
  Serial.begin(9600);

    uint16_t    val1;
    uint16_t    val2;
    uint32_t    result;

    val1 = 40000;
    val2 = 50000;

    result = val1 + val2;

    //printf ("%u + %u = %u\n", val1, val2, result);
    Serial.print(val1);
    Serial.print(" + ");
    Serial.print(val2);
    Serial.print(" = ");
    Serial.println(result);
}

void loop() {
  // put your main code here, to run repeatedly:

}

This gives me:

40000 + 50000 = 24464

…and this was the source of a bug I introduced and fixed at my day job recently.

Tha’s wrong, int’it?

I tend to write alot of code using the GCC compiler since I can work out and test the logic much quicker than repeatedly building and uploading to our target hardware. Because of that, I had “fully working” code that was incorrect for our 16-bit PIC24 processor.

In this case, the addition of “val1 + val2” is being done using native integer types. On the PC, those are 32-bit values. On the PIC24 (and Arduino, shown above), they are 16-bit values.

A 16-bit value can represent 65536 values in the range of 0-65535. If you were to have a value of 65535 and add 1 to it, on a 16-bit variable it would roll over and the result would be 0. In my example, 40000 + 50000 was rolling over 65535 and producing 24464 (which is 90000 – 65536).

You can see this happen using the Windows calculator. By default, it uses DWORD (double word – 32-bit) values. You can do the addition just fine:

You see that 40,000 + 50,000 results in 90,000, which is 0x15F90 in hex. That 0x1xxxx at the start is the rollover. If you switch the calculator in to WORD mode you see it gets truncated and the 0x1xxxx at the start goes away, leaving the 16-bit result:

Can we fix it?

The solution is very simple. In C, any time there is addition which might result in a value larger than the native int type (if you know it), you simply cast the two values being added to a larger data type, such as a 32-bit uint32_t:

void setup() {
  // put your setup code here, to run once:
  Serial.begin(9600);

    uint16_t    val1;
    uint16_t    val2;
    uint32_t    result;

    val1 = 40000;
    val2 = 50000;

    // Without casting (native int types):
    result = val1 + val2;

    //printf ("%u + %u = %u\n", val1, val2, result);
    Serial.print(val1);
    Serial.print(" + ");
    Serial.print(val2);
    Serial.print(" = ");
    Serial.println(result);

    // Wish casting:
    result = (uint32_t)val1 + (uint32_t)val2;

    Serial.print(val1);
    Serial.print(" + ");
    Serial.print(val2);
    Serial.print(" = ");
    Serial.println(result);
}

void loop() {
  // put your main code here, to run repeatedly:

}

Above, I added a second block of code that does the same add, but casting each of the val1 and val2 variables to 32-bit values. This ensures they will not roll over since even the max values of 65535 + 65535 will fit in a 32-bit variable.

The result:

40000 + 50000 = 24464
40000 + 50000 = 90000

Since I know adding any two 16-bit values can be larger than what a 16-bit value can hold (i.e., “1 + 1” is fine, as is “65000 + 535”, but larger values present a rollover problem), it is good practice to just always cast upwards. That way, the code works as intended, whether the native int of the compiler is 16-bits or 32-bits.

As my introduction of this bug “yet again” shows, it is a hard habit to get in to.

Until next time…

I hate floating point.

See also: the 902.1 incident of 2020.

Once again, oddness from floating point values took me down a rabbit hole trying to understand why something was not working as I expected.

Earlier, I had stumbled upon one of the magic values that a 32-bit floating point value cannot represent in C. Instead of 902.1, a float will give you 902.099976… Close, but it caused me issues due to how we were doing some math conversions.

float value = 902.1;
printf ("value = %f\n", value);

To work around this, I switched these values to double precision floating point values and now 902.1 shows up as 902.1:

double value = 902.1;
printf ("value = %f\n", value);

That example will indeed show 902.100000.

This extra precision ended up causing a different issue. Consider this simple code, which took a value in kilowatts and converted it to watts, then converted that to a signed integer.

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>

int main(int argc, char **argv)
{
    double kw = 64.60;
    double watts = kw * 1000;

    printf ("kw   : %f\n", kw);

    printf ("watts: %f\n", watts);

    printf ("int32: %d\n", (int32_t)watts);

    return EXIT_SUCCESS;
}

That looks simple enough, but the output shows it is not:

kw   : 64.600000
watts: 64600.000000
int32: 64599

Er… what? 64.6 multiplied by 1000 displayed as 64600.00000 so that all looks good, but when converted to a signed 32-bit integer, it turned in to 64599. “Oh no, not again…”

I was amused that, by converting these values to float instead of double it worked as I expected:

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>

int main(int argc, char **argv)
{
    float kw = 64.60;
    float watts = kw * 1000;

    printf ("kw   : %f\n", kw);

    printf ("watts: %f\n", watts);

    printf ("int32: %d\n", (int32_t)watts);

    return EXIT_SUCCESS;
}
kw   : 64.599998
watts: 64600.000000
int32: 64600

Apparently, whatever extra precision I was gaining from using double in this case was adding enough extra precision to throw off the conversion to integer.

I don’t know why. But at least I have a workaround.

Until next (floating point problem) time…

C structures and padding and sizeof

This came up at my day job when two programmers were trying to get a block of data to be the size both expected it to be. Consider this example:

typedef struct
{
    uint8_t     byte1;  // 1
    uint16_t    word1;  // 2
    uint8_t     byte2;  // 1
    uint16_t    word2;  // 2
    uint8_t     byte3;  // 1
                        // 7 bytes
} MyStruct1;

The above structure represents three 8-bit byte values and two 16-bit word values for a total of 7 bytes.

However, if you were to run this code in GCC for Windows, and print the sizeof() that structure, you would see it returns 10:

sizeof(MyStruct1) = 10

This is due to the compiler padding variables so they all start on a 16-bit boundary.

The expected data storage in memory feels like it should be:

[..|..|..|..|..|..|..] = 7 bytes
 |  |  |  |  |  |  |
 |  |  |  |  \  /  byte3
 |  |  |  |  word2
 |   \ /  byte2
 |  word1
 byte1

But, using GCC on a Windows 10 machine shows that each value is stored on a 16-bit boundary, leaving unused padding bytes after the 8-bit values:

[..|xx|..|..|..|xx|..|..|..|xx] = 10 bytes
 |     |  |  |     |  |  |
 |     |  |  |     \  /  byte3
 |     |  |  |     word2
 |      \ /  byte2
 |     word1
 byte1

As you can see, three extra bytes were added to the “blob” of memory that contains this structure. This is being done so each element starts on an even-byte address (0, 2, 4, etc.). Some processors require this, but if you were using one that allowed odd-byte access, you would likely get a sizeof() 7.

Do not rely on processor architecture

To create portable C, you must not rely on the behavior of how things work on your environment. The same can/will could produce different results on a different environment.

See also: sizeof() matters, where I demonstrated a simple example of using “int” and how it was quite different on a 16-bit Arduino versus a 32/64-bit PC.

Make it smaller

One easy thing to do to reduce wasted memory in structures is to try to group the 8-bit values together. Using the earlier structure example, by simple changing the ordering of values, we can reduce the amount of memory it uses:

typedef struct
{
    uint8_t     byte1;  // 1
    uint8_t     byte2;  // 1
    uint8_t     byte3;  // 1
    uint16_t    word1;  // 2
    uint16_t    word2;  // 2
                        // 7 bytes
} MyStruct2;

On a Windows 10 GCC compiler, this will produce:

sizeof(MyStruct1) = 8

It is still not the 7 bytes we might expect, but at least the waste is less. In memory, it looks like this:

[..|..|..|xx|..|..|..|..] = 8 bytes
 |  |  |     |  |  \  /
 |  |  |     \  /  word2
 |  |  |     word1
 |  |  byte3
 |  byte2
 byte1

You can see an extra byte of padding being added after the third 8-bit value. Just out of curiosity, I moved the third byte to the end of the structure like this:

typedef struct
{
    uint8_t     byte1;  // 1
    uint8_t     byte2;  // 1
    uint16_t    word1;  // 2
    uint16_t    word2;  // 2
    uint8_t     byte3;  // 1
                        // 7 bytes
} MyStruct3;

…but that also produced 8. I believe it is just adding an extra byte of padding at the end (which doesn’t seem necessary, but perhaps memory must be reserved on even byte boundaries and this just marks that byte as used so the next bit of memory would start after it).

[..|..|..|..|..|..|..|xx] = 8 bytes
 |  |  |  |  |  |  |
 |  |  |  |  \  /  byte3
 |  |  \  /  word2
 |  |  word1
 |  byte2
 byte1

Because you cannot ensure how a structure ends up in memory without knowing how the compiler works, it is best to simply not rely or expect a structure to be “packed” with all the bytes aligned like the code. You also cannot expect the memory usage is just the values contained in the structure.

I do frequently see programmers attempt to massage the structure by adding in padding values, such as:

typedef struct
{
    uint8_t     byte1;    // 1
    uint8_t     padding1; // 1

    uint16_t    word1;    // 2

    uint8_t     byte2;    // 1
    uint8_t     padding2; // 1

    uint16_t    word2;    // 2

    uint8_t     byte3;    // 1
    uint8_t     padding3; // 1
                          // 10 bytes
} MyPaddedStruct1;

At least on a system that aligns values to 16-bits, the structure now matches what we actually get. But what if you used a processor where everything was aligned to 32-bits?

It is always best to not assume. Code written for an Arduino one day (with 16-bit integers) may be ported to a 32-bit Raspberry Pi Pico at some point, and not work as intended.

Here’s some sample code to try. You would have to change the printfs to Serial.println() and change how it prints the sizeof() values, but then you could see what it does on a 16-bit Arduino UNO versus a 32-bit PC or other system.

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>

typedef struct
{
    uint8_t     byte1;  // 1
    uint16_t    word1;  // 2
    uint8_t     byte2;  // 1
    uint16_t    word2;  // 2
    uint8_t     byte3;  // 1
                        // 7 bytes
} MyStruct1;

typedef struct
{
    uint8_t     byte1;  // 1
    uint8_t     byte2;  // 1
    uint8_t     byte3;  // 1
    uint16_t    word1;  // 2
    uint16_t    word2;  // 2
                        // 7 bytes
} MyStruct2;

typedef struct
{
    uint8_t     byte1;  // 1
    uint8_t     byte2;  // 1
    uint16_t    word1;  // 2
    uint16_t    word2;  // 2
    uint8_t     byte3;  // 1
                        // 7 bytes
} MyStruct3;

int main()
{
    printf ("sizeof(MyStruct1) = %u\n", (unsigned int)sizeof(MyStruct1));
    printf ("sizeof(MyStruct2) = %u\n", (unsigned int)sizeof(MyStruct2));
    printf ("sizeof(MyStruct3) = %u\n", (unsigned int)sizeof(MyStruct3));

    return EXIT_SUCCESS;
}

Until next time…

Arduino Serial output C macros

Here is a quickie.

In Arduino, instead of being able to use things like printf() and puchar(), console output is done by using the Serial library routines. It provides functions such as:

Serial.print();
Serial.println();
Serial.write();

These do not handle any character formatting like printf() does, but they can print strings, characters or numeric values in different formats. Where you might do something like:

int answer = 42;
printf("The answer is %d\r\n", answer);

…the Arduino version would need to be:

int answer = 42;
Serial.print("This answer is ");
Serial.print(answer);
Serial.println();

To handle printf-style formatting, you can us sprintf() to write the formatted string to a buffer, then use Serial.print() to output that. I found this blog post describing it.

I recently began porting my Arduino Telnet routine over to standard C to use on some PIC24 hardware I have at work. I decided I should revisit my Telnet code and try to make it portable, so the code could be built for Arduino or standard C. This would mean abstracting all console output, since printf() is not used on the Arduino.

I quickly came up with these Arduino-named macros I could use on C:

#include <stdio.h>
#include <stdlib.h>

#define SERIAL_PRINT(s)     printf(s)
#define SERIAL_PRINTLN(s)   printf(s"\r\n")
#define SERIAL_WRITE(c)     putchar(c)

int main()
{
    SERIAL_PRINT("1. This is a line");
    SERIAL_PRINTLN();
    SERIAL_PRINTLN();

    SERIAL_PRINTLN("2. This is a second line.");

    SERIAL_PRINT("3. This is a character:");
    SERIAL_WRITE('x');
    SERIAL_PRINTLN();

    SERIAL_PRINTLN("done.");

    return EXIT_SUCCESS;
}

Ignoring the Serial.begin() setup code that Arduino requires, this would let me replace console output in the program with these macros. For C, it would use the macros as defined above. For Arduino, it would be something like…

#define SERIAL_PRINT(s)     Serial.print(s)
#define SERIAL_PRINTLN(s)   Serial.println(s)
#define SERIAL_WRITE(c)     Serial.write(c)

By using output macros like that, my code would still look familiar to Arduino folks, but build on a standard C environment (for the most part).

This isn’t the most efficient way to do it, since Arduino code like this…

  Serial.print("[");
  Serial.print(val);
  Serial.println("]");

…would be one printf() in C:

printf ("[%d]\n", val);

But, if I wanted to keep code portable, C can certainly do three separate printf()s to do the same output as Arduino, so we code for the lowest level output.

One thing I don’t do, yet, is handle porting things like:

Serial.print(val, HEX);

On Arduino, that outputs the val variable in HEX. I’m not quite sure how I’d make a portable macro for that, unless I did something like:

#define SERIAL_PRINT_HEX(v) Serial.print(v, HEX)

#define SERIAL_PRINT_HEX(v) printf("%x, v)

That would let me do:

SERIAL_PRINT("[");
SERIAL_PRINT_HEX(val);
SERIAL_PRINTLN("]");

I expect to add more macros as-needed when I port code over. This may be less efficient, but it’s easier to make Arduino-style console output code work on C than the other way around.

Cheers…

C: (too) many happy returns…

Here’s another quick C thing…

One of the jobs I had used a pretty complete coding style guide for C. One of the things they insisted on was only one “return” in any function that returns values. For example:

int function(int x)
{
   if SOMETHING
   {
      return 100;
   }
   else SOMETHING ELSE
   {
      return 200;
   }
   else
   {
      return 0;
   }
}

The above function returns values 100, 200 or 0 based on the input (1, 2 or anything else). It has three different places where a value is returned. This saves code, compared to doing it like this:

int function(int x)
{
   int value;

   if SOMETHING
   {
      value = 100;
   }
   else if SOMETHING ELSE
   {
      value = 200;
   }
   else
   {
      value = 0;
   }

   return value;
}

Above, you see we use a variable, and then have three places where it could be set, and then we return that value in one spot at the end of the function. This probably generates larger code and would take longer to run than the top example.

But if you can afford those extra bytes and clock cycles, it is a much better way to do this — at least form a maintenance and debugging standpoint.

I have accepted this, but only today did I run in to a situation where this approach would have saved me some time and frustration. In my case, I was encounter a compiler warning about a function not returning a value where it was defined to return a value. I looked and confirmed the function was indeed returning a value. What was going on?

The problem was that it used multiple returns, and did something like this:

int function(int x)
{
   int value;

   if (!ValueIsValid(x)) return;

   if SOMETHING
   {
      value = 100;
   }
   else if >OMETHING ELSE
   {
      value = 200;
   }
   else
   {
      value = 0;
   }

   return value;
}

Somewhere in the program was a check that just did a “return” and the compiler was seeing that, but my eyes were looking at the lower portion of the program where a value was clearly being returned.

I am guessing the function originally did not return a value, and when a return value was added later, that initial “return;” was not corrected, leaving a compiler warning. This warning may have been in the code for a long time and was simply left alone because someone couldn’t figure it out (my situation) or wasn’t concerned about compiler warnings.

Today, the warning bugged me enough that I did a deep dive through the function, line-by-line, trying to figure out what was going on. And I found it. A simple correction could have been this:

int function(int x)
{
   int value;

   if (!ValueIsValid(x)) return 0; // FIXED: Add missing return value.

   if SOMETHING
   {
      value = 100;
   }
   else if SOMETHING ELSE
   {
      value = 200;
   }
   else
   {
      value = 0;
   }

   return value;
}

That resolved the compiler warning, but still left two spots where a value was returned, so I ended up doing something like this:

int function(int x)
{
   int value;

   if (ValueIsValid(x) == true) // do this if valid
   {
      if SOMETHING
      {
         value = 100;
      }
      else SOMETHING ELSE
      {
         value = 200;
      }
      else
      {
         value = 0;
      }
   }
   else // Not valid
   {
      value = 0;
   }

   return value;
}

Now there is only one place the function returns, and it only processeses things if the initial value appears valid.

I will sleep better at night.

I sleep on a soap box.

char versus C versus C#

Updates:

  • 2020-07-16 – Added a few additional notes.

I am mostly a simple C programmer, but I do touch a bit of C# at my day job.

If you don’t think about what is going on behind the scenes, languages like Java and C# are quite fun to work with. For instance, if I was pulling bytes out of a buffer in C, I’d have to write all the code manually. For example:

Buffer: [ A | B | C | D | D | E | E | E | E ]

Let’s say I wanted to pull three one-byte values out of the buffer (A, B and C), followed by a two-byte value (DD), and a four byte value (EEEE). There are many ways to do this, but one lazy way (which breaks if the data is written on a system with different endianness to how data is stored) might be:

#include <stdint.h>

uint8_t a, b, c;
uint16_t d;
uint32_t e;

a = Buffer[0];
b = Buffer[1];
c = Buffer[2];
memcpy (&d, &Buffer[3], 2);
memcpy (&e, &Buffer[5], 4);

There is much to critique about this example, but this post is not about being “safe” or “portable.” It is just an example.

In C#, I assume there are many ways to achieve this, but the one I was exposed to used a BitConverter class that can pull bytes from a buffer (array of bytes) and load them in to a variable. I think it would look something like this:

UInt8 a, b, c;
UInt16 d;
UInt32 e;

a = Buffer[0];
b = Buffer[1];
c = Buffer[2];
d = BitConverter.ToInt16(Buffer, 3);
e = BitConverter.ToInt32(Buffer, 5);

…or something like that. I found this code in something new I am working on. It makes sense, but I wondered why some bytes were being copied directly (a, b and c) and others went through BitConverter. Does BitConverter not have a byte copy?

I checked the BitConverter page and saw there was no ToUInt8 method, but there was a ToChar method. In C, “char” is signed, representing -127 to 128. If we wanted a byte, we’d really want an “unsigned char” (0-255), and I did not see a ToUChar method. Was that why the author did not use it?

Here’s where I learned something new…

The description of ToChar says it “Returns a Unicode character converted from two bytes“.

Two bytes? Unicode can represent more characters than normal 8-bit ASCII, so it looks like a C# char is not the same as a C char. Indeed, checking the Char page confirms it is a 16-bit value.

I’m glad I read the fine manual before trying to “fix” this code like this:

Char a, b, c;
UInt16 d;
UInt32 e;

// The ToChar will not work as intended!
a = BitConverter.ToChar(Buffer, 0); //Buffer[0];
b = BitConverter.ToChar(Buffer, 1); //Buffer[1];
c = BitConverter.ToChar(Buffer, 2); //Buffer[2];
d = BitConverter.ToInt16(Buffer, 3);
e = BitConverter.ToInt32(Buffer, 5);

For a C programmer experimenting in C# code, that might look fine, but had I just tried it without reading the manual first, I’d have been puzzled why it was not copying the values I expected from the buffer.

Making assumptions about languages, even simple things like a data type “char”, can cause problems.

Thank you, manual.