When a+b+c is not the same as b+a+c plus the Barr coding standard

DISCLAIMER: All compilers are not created equal. Different compilers may achieve the same result, but may take different steps to achieve that result. Optimizers and code generators can do wonderful things. Thus, if you want to leave a comment and say “compiler XYZ does not do that,” that is fine, but that is not the point of this. This is for those “other” compilers you don’t use, that do not behave that way…

During my embedded C programming career, there are some interesting optimizations I have been taught. Most of these are things I would never consider on a modern C compiler running on a system that has ample memory and CPU resources. But when you are on a microcontroller with 4K or RAM or 16K of program storage, sometimes you have to do things oddly to make it fit, or, if the CPU is slow, make it run fast enough.

True, False, or Not True or Not False?

Consider this:

bool flag = false;

if (flag)
{
// Do something
}

And “if” like this will be looking for a true result. Now, one compiler I work with has its own “TRUE” and “FALSE”, in uppercase, which all their code uses. Why? Maybe because they originated before the stdbool.h header file was added to the C standard and defined an official “true” and “false” in lowercase. Fortunately, they currently provide a stdbool which will undefine the uppercase ones (if the compiler is set to NON-case sensative — yep, by default “foo” and “FOO” and “else” and “Else” are processed the same) and define lowercase ones:

#if !getenv("CASE")
// remove TRUE and FALSE added by CCS's device .h file, only if
// compiler has case sensitivty off.

#if defined(TRUE)
#undef TRUE
#endif

#if defined(FALSE)
#undef FALSE
#endif
#endif

typedef int1 bool;
#define true 1
#define false 0
#define __bool_true_false_are_defined

With 0 representing false, and 1 representing true, the “if” works — anything that is not 0 will be processed. In a normal compiler:

if (0)
{
printf ("This will not print.\n");
}

if (1)
{
printf ("This will print\n");
}

if (42)
{
printf ("This will print\n");
}

On my Radio Shack Color Computer’s 6809 microprocessor, I expect such an “if” test compiles into assemble code that represents something like “branch if not zero”. I would expect every CPU has a similar instruction.

So checking for true (not 0) should be as fast as checking for false (0), assuming there is a similar instruction for “branch if zero.”

However, what if the CPU uses a different number of instruction cycles for a “branch if zero” versus “branch if not zero”? If that were the case, these might have different execution speeds:

if (flag == true)
{
// Do something...
}

if (flag == false)
{
// Do something...
}

But that seems unlikely, and is not the point of this post. (If you are aware of any CPU where this would be the case, please leave a comment.)

Some company coding standards I have used said to never use just “if (x)” but instead write out what it actually means. While you and I are experts and clearly know what the “if (x)” does, as should any programmer who knows programming, what if they don’t? In that case “if (x == true)” and “if (x == false)” are impossible to misunderstand, and should generate the same code as “if (x)” and “if (!x)”.

Right?

But suppose you used a crappy “C-like” compiler, and it had a “test for zero” which is used for “if (flag == false)” but used something dumb like “compare against a number” when you did “if (flag == true)” or “if (flag)”… Like, the compiler saw a check for 0 and knew it could efficiently do that… but if it was not zero, it did a compare against a number, resulting in something like…

load #1 in to some accumulator
compare register holding "flag" against accumulator
branch if equal (or if not equal)

That can generate some extra code each and every time you check for “true”, so checking for “not false” might save a few bytes every time.

Because of that, I often just default to doing this:

if (flag != false)
{
// Do something...
}

And this looks stupid. But might save enough bytes to make something compile that otherwise would not fit.

Hopefully you have never had to work in such a constrained environment with such a crappy C-like compiler.

The good news is, by changing to doing this, it works the same on “real” compilers but “might” make smaller or faster code on bad compilers.

But I digress…

Adding it all up…

I really wanted to write this about something I had never considered:

#define HEADER_LENGTH 5
#define CRC_LENGH 2

unsigned int messageSize = HEADER_LENGTH + payloadLength + CRC_LENGTH;

If the message protocol uses a format like “[HEADER][PAYLOAD][CRC]”, writing out the C code like that makes it easy to visualize what the message bytes look like.

The compiler would be seeing that code as:

unsigned int messageSize = 5 + payloadLength + 2;

A compiler might be doing…

  • Set messageSize to 5
  • Add payloadLength to messageSize
  • Add 2 to messageSize

But if you grouped the #define values together:

unsigned int messageSize = HEADER_LENGTH + CRC_LENGTH + payloadLength;

A good compiler might be changing that to:

unsigned int messageSize = 5 + 2 + payloadLength;
...
unsigned int messageSize = 7 + payloadLength;

…which results in:

  • Set messageSize to 7
  • Add payloadLength to messageSize

And if you deal with hundreds of messages where this might be calculated, that savings can really add up.

I would hope a real/smart compiler might be able to detect this and optimize the constants together … but I know this is not guaranteed to be the case.

The best thing about standards…

And as a bonus, earlier I posted asking about C coding standards trying to find one my employer could adopt, instead of rolling our own. Bing CoPilot led me to a few, including this one specifically for embedded C:

Embedded C Coding Standard | Barr Group

This “Barr C” standard has many things I have already forced myself to start doing, and does look promising. You can but a paperback book for the standard for $6 on Amazon, or download the book free as a PDF. I plan to go through it and see what all it discusses.

One thing I like about the approach is gives a reason for each of the coding standard things is presents. For example, braces:

Rules:

a. Braces shall always surround the blocks of code (a.k.a., compound
statements), following if, else, switch, while, do, and for statements; single statements and empty statements following these keywords shall also always be surrounded by braces.

b. Each left brace ({) shall appear by itself on the line below the start of the block it opens. The corresponding right brace (}) shall appear by itself in the same position the appropriate number of lines later in the file.

Reasoning:

There is considerable risk associated with the presence of empty
statements and single statements that are not surrounded by braces. Code constructs like this are often associated with bugs when nearby code is changed or commented out. This risk is entirely eliminated by the consistent use of braces. The placement of the left brace on the following line allows for easy visual checking for the corresponding right brace.

barr_c_coding_standard_2018.pdf

When I started learning C back in the late 1980s, it was the pre-ANSI K&R C. Thus, I learned C the way the books I had showed it:

if (something) {
// Do something
} else {
// Do something else
}

The placement of the “{” on the first line seems to be referred to as “line saver” in some of the code editors I use. It was at a job where their standard says “line them up so you can see what goes to what” that I had to change my style:

if (something)
{
// Do something
}
else
{
// Do something else
}

Now the start of each code block has the start brace and end brace on the same column, making it much easier to spot rather than having to look at the ends of lines or some characters in to a line.

I hated that at first, but now I am used to it.

I also used to do things like this:

if (something)
DoSomething();
else
DoSomethingElse();

Somewhere on this site, I have written about this at least once or twice. This breaks when someone adds something without thinking about the braces:

if (something)
DoSomething();
WriteToLog(); // added this
else
DoSomethingElse();

Without the braces, trying to compile this would at least give an error:

main.c: In function ‘main’:
main.c:31:5: error: ‘else’ without a previous ‘if’
31 | else
| ^~~~

BUT, if you did not have the else…

if (something)
DoSomething();
WriteToLog();

That code might “look” good, but running it would do something if the case was true, but would then ALWAYS write to the log… Because C is seeing it like this:

if (something)
{
DoSomething();
}

WriteToLog();

And I have now seen a modern programmer, brought up on scripting languages that made use of tabs rather than braces, make this mistake working on C code they were not really familiar with.

But I digress. Again.

More to come when my book arrives and I start reading through it. Unless someone presents me a better alternative, I think this one may suffice. The book is cheap, it can be downloaded free (so it is searchable) and the items I have spot checked seemed reasonable.

If you have ever worked with the Barr-C coding standard, I’d love to hear your thoughts in the comments.

Until then…

4 thoughts on “When a+b+c is not the same as b+a+c plus the Barr coding standard

  1. MiaM

    Re adding three things:
    A potential problem is if some values are negative and some positive. The order affects the risk of an overflow or underflow. (This obviously also applies to subtraction).

    (I can’t remember the details but the C standard defines evaluation order. Obviously that can be ignored when adding constants with each other, unless there is some mechanism that traps overflows that need to be able to trace exactly what caused the overflow. Don’t know if any such overflow tests happen in any C compiler though. It would for sure be easy to add a branch if overflow after each addition, “only” wasting say 1-2 cycles or so).

    Re coding standard:
    I think that we nowadays should have editors that display the code with indentations conforming to how the code is arranged, rather than require the coder to do correct formatting.

    But also: I saw a discussion elsewhere about line length and that obviously led to indentations and whatnot. I think that an editor should use background color rather than indentation levels to show which parts of the code belongs together and whatnot. Apparently there is a plugin (presumably to vscode) that automatically colors the empty space to the left of the first character in different colors depending on the indentation level. But I think this can be extended so that narrow vertical lines connect rows that belong together. In your case the if statement, the else statement and any code at the same level before the if statement and after what else executes should have the same color, and also be joined by a vertical line (with the same color), and to the right of that vertical line there would be two “blobs” with another color for the “then” part of the if statement, and for the “else” part. (I know that there isn’t any “then” keyword in C, but you get what I mean).

    Reply
    1. Allen Huffman Post author

      It is surprising to me that I have yet to use an editor for a job that took care for formatting automatically. A few have had functions to reformat, but you’d think it could just do 8t as you type and force it.

      The color coding based on indention level would take me some getting used to.

      I’ll have to think through what you said about negative values and order of operation.

      Reply
  2. James Jones

    MiaM is right about a + b + c. I’d also mention that for signed integer types, C says the results of over/underflow are undefined, so the program could do anything. (The standard joke back in the day was that if that happened, the program could launch missiles or order pizza.) unsigned will just wrap around. Also, floating point is, as Douglas Adams would say, almost totally unlike the real numbers, letting you get into even stranger situations.

    C originated the notion of “truthy” and “falsy” values by not initially having a boolean type and by not requiring explicit comparisons. Comparisons have an integer value, which at least allows this for former BASIC programmers:

    int sgn(int i) {
    return (i > 0) – (i < 0);
    }

    I was surprised to find that converting a value to _Bool will in fact give you a 0 or a 1… but sizeof(_Bool) == 1 and, given C’s sleazily-typed nature,

    bool my_bool;

    *((char *) &my_bool) = 42;

    will mean

    ((my_bool == false) || (my_bool == true)) == false

    is true under the “always explicitly compare bools” coding standard, surprising anyone expecting the Law of the Excluded Middle to hold.

    Reply
  3. MiaM

    @James
    as a someone who started out with Basic, it feels a bit weird that C comparison operators return 1 rather than -1 for true like at least Microsoft 8-bit Basic does. Sure, 1 might feel more obvious for anyone who never have done anything with assembler, but having true set all bits (and false obviously be zero) makes it a bit easier to use the various conditional branch and test and whatnot instructions. Thinking about things like the BIT instruction on the 6502 that sets two flags to the contents of the two uppermost bits of the operand (and also sets the zero flag according to what an and between the operator and the accumulator results in, without storing anything except the flags.

    Reply

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.