1/27/2023: Hello, everyone. This page continues to be one of the most-viewed page on my site. Can you leave a comment and tell me what led you here? Thanks! -Allen
Recently in my day job, I came across some C code that just felt inefficient. It was code that appeared to take a 16-bit integer and split the high and low bytes in to two 8-bit integers. In all my years of C coding, I had never seen it done this way, so obviously it must be wrong.
NOTE: In this example, I am using modern C99 definitions for 8-bit and 16-bit unsigned values. “int” may be different on different systems (it only has to be “at least” 16-bits per the C standard. On the Arduino it is 16-bits, and on my PC it is 32-bits).
uint8_t bytes[2];
uint16_t value;
value = 0x1234;
bytes[0] = *((uint8_t*)&(value)+1); //high byte (0x12)
bytes[1] = *((uint8_t*)&(value)+0); //low byte (0x34)
This code just felt bad to me because I had previously seen how much larger a program becomes when you are accessing structure elements like “foo.a” repeatedly in code. Each access was a bit larger, so it you used it more than a few times in a block of code you were better off to put it in a temporary variable like “temp = foo.a” and use “temp” over and over. Surely all this “address of” and math (+1) would be generating something like that, right?
Traditionally, the way I always see this done is using bit shifting and logical AND:
uint8_t bytes[2];
uint16_t value;
value = 0x1234;
bytes[0] = value >> 8; // high byte (0x12)
bytes[1] = value & 0x00FF; // low byte (0x34)
Above, bytes[0] starts out with the 16-bit value and shifts it right 8 bits. That turns 0x1234 in to 0x0012 (the 0x34 falls off the end) so you can set bytes[0] to 0x12.
bytes[1] uses logical AND to get only the right 8-bits, turning 0x1234 in to 0x0034.
I did a quick test on an Arduino, and was surprised to see that the first example compiled in to 512 bytes, and the second (using bit shift) was 516. I had expected a simple AND and bitshift to be smaller, but apparently, on this processor/compiler, getting a byte from an address was smaller. (I did not tests to see which one used more clock cycles, and did no experiments with compiler optimizations.)
On a Windows PC under GNU-C, the first compiled to 784 bytes, and the second to 800. Interesting.
I ran across this code in a project targeting the Texas Instruments MSP430 processor. The MSP430 Launchpad is very Arduino-like, and previous developers had to do many tricks to get the most out of the limited RAM, flash and CPU cycles of these small devices.
I do not know if I can get in the habit of doing my integer splits this way, but perhaps I should retrain myself since this does appear incrementally better.
Update: Timing tests (using millis() on Arduino, and clock() on PC) show that it is also faster.
Here is my full Arduino test program. Note the use of “volatile” variable types. This prevents the compiler from optimizing them out (since they are never used unless you uncomment the prints to display them).
#define OURWAY
void setup() {
volatile char bytes[2];
volatile uint16_t value;
//Serial.begin(9600);
value = 0x1234;
#ifdef OURWAY
// 512 bytes:
bytes[0] = *((uint8_t*)&(value)+1); //high byte (0x12)
bytes[1] = *((uint8_t*)&(value)+0); //low byte (0x34)
#else
// 516 bytes:
bytes[0] = value >> 8; // high byte (0x12)
bytes[1] = value & 0x00FF; // low byte (0x34)
#endif
//Serial.println(bytes[0], HEX); // 0x12
//Serial.println(bytes[1], HEX); // 0x34
}
void loop() {
// put your main code here, to run repeatedly:
}
The method using addresses is not portable. You would at least need to #ifdef based on the endianness of your processor.
Also, it’s hard to say which would be faster or smaller from such a small sample of code. You’ve disabled some optimization by declaring the variable volatile. For example, on x86 it’s possible that 0x1234 could be loaded into %eax and then value >> 8 could be directly accessed as %ah and value & 0xff as %al. No extra operations needed at all. This might be harder for the compiler to recognize with the address approach. (It might not be allowed to keep the variable in a register since its address is taken; I’d have to read the spec to be sure.) Or, if you’re dealing with a constant, the compiler could recognize that fact and treat the individual bytes as constants also.
I’d recommend that you stick with your old, portable method.
Hey Greg, long time no hear! The coders that came before me went to this approach, no doubt, due to the very constrained memory on the system I am working with. In my other tests, volatile was not used (I printed the variables, forcing them to stay around). Benchmarking the Arduino showed it 68% faster, and on the PC/x86 it was not quite that much of a jump, but still faster.
I would wonder if some architectures might not allow accessing the odd-byte and would generate more code.
While I try to write as strict-ANSI as I can, in constrained systems (like Arduino, or the MSP430 at work) I have been turning to things like this. That code, versus the bitshift/AND, saved 144 bytes. Considering my project has already required me to add various compression methods to even fit in the remaining space, every byte counts.
I am just surprised it seems to generate smaller and faster code on the two test systems. I am looking for one where it is slower and/or larger. I wish I had Ultra-C setup to see how it deals with it. (I have tested GNU C and some MSP430 compiler that is proprietary.)
The code to avoid gratuitous SEX in Ultra C would turn too & 0xff into (unsigned char) foo, so it would, if too were in a d register, do a mov.b to get the LSB. I’m not sure whether we made the x86 back end notice getting the MSB of an unsigned short. If we did, I’m pretty sure that if value were in ax, it would just mov al and then mov ah, about as good as it gets.
Pingback: Converting two 8-bit values to one 16-bit value in C | Sub-Etha Software
Pingback: const-ant confusion in C. | Sub-Etha Software