Updates:
- 2022-08-30 – Corrected a major bug in the Get8BitHexStringPtr() routine.
“Here we go again…”
Last week I ran out of ROM space in a work project. For each code addition, I have to do some size optimization elsewhere in the program. Some things I tried actually made the program larger. For example, we have some status bits that get set in two different structures. The code will do it like this:
shortStatus.faults |= FAULT_BIT; longStatus.faults |= FAULT_BIT;
We have code like that in dozens of places. One of the things I had done earlier was to change that in to a function. This was primarily so I could have common code set fault bits (since each of the four different boards I work with had a different name for its status structures). It was also to reduce the amount of lines in the code and make what they were doing more clear (“clean code”).
void setFault (uint8_t faultBit) { shortStatus.faults |= faultBit; longStatus.faults |= faultBit; }
During a round of optimizing last week, I noticed that the overhead of calling that function was larger than just doing it manually. I could switch back and save a few bytes every time it was used, but since I still wanted to maintain “clean code”, I decided to make a macro instead of the function. Now I can still do:
setFault (FAULT_BIT);
…but under the hood it’s really doing a macro instead:
#define setFault(faultBit) { shortStatus.faults |= faultbit; longStatus.faults |= faultBit; }
Now I get what I wanted (a “function”) but retain the code size savings of in-lining each instance.
I also thought that doing something like this might be smaller:
shortStatus.faults |= FAULT_BIT; longStatus.faults = shortStatus.faults;
…but from looking at the PIC24 assembly code, that’s much larger. I did end up using it in large blocks of code that conditionally decided which fault bit to set, and then I sync the long status at the end. As long as the overhead of “this = that” is less than the overhead of multiple inline instructions it was worth doing.
And keep in mind, this is because I am 100% out of ROM. Saving 4 bytes here, and 20 bytes there means the difference between being able to build or not.
Formatting Output
One of the reasons for the “code bloat” was adding support for an LCD display. The panel, an LCD2004, hooks up to I2C vie a PCF8574 I2C I/O chip. I wrote just the routines needed for the minimal functionality required: Initialize, Clear Screen, Position Cursor, and Write String.
The full libraries (there are many) for Arduino are so large by comparison, so often it makes more sense to spend the time to “roll your own” than port what someone else has already done. (This also means I do not have to worry about any licensing restrictions for using open source code.)
I created a simple function like:
LCDWriteDataString (0, 0, "This is my message.");
The two numbers are the X and Y (or Column and Row) of where to display the text on the 20×4 LCD screen.
But, I was quickly reminded that the PIC architecture doesn’t support passing constant string data due to “reasons”. (Harvard architecture, for those who know.)
To make it work, you had to do something like:
const char *msg = "This is my message"; LCDWriteDataString (0, 0, msg);
…or…
chr buffer[19]; memcpy (buffer, "This is my message"); LCDWriteDataString (0, 0, msg);
…or, using the CCS compiler tools, add this to make the compiler take care of it for you:
#device PASS_STRINGS=IN_RAM
Initially I did that so I could get on with the task at had, but as I ran out of ROM space, I revisited this to see which approach was smaller.
From looking at the assembly generated by the CCS compiler, I could tell that “PASS_STRINGS=IN_RAM” generated quite a bit of extra code. Passing in a constant string pointer was much smaller.
So that’s what I did. And development continued…
Then I ran out of ROM yet again. Since I had some strings that needed formatted output, I was using sprintf(). I knew that sprintf() was large, so I thought I could create my own that only did what I needed:
char buffer[21]; sprintf (buffer, "CF:%02x C:%02x T:%02x V:%02x", faults, current, temp, volts); LCDWriteDataString (0, 0, buffer); char buffer[21]; sprintf (buffer, "Fwd: %u", watts); LCDWriteDataString (0, 1, buffer);
In my particular example, all I was doing is printing out an 8-bit value as HEX, and printing out a 16-bit value as a decimal number. I did not need any of the other baggage sprintf() was bringing when I started using it.
I came out with these quick and dirty routines:
char GetHexDigit(uint8_t nibble) { char hexChar; nibble = (nibble & 0x0f); if (nibble <= 9) { hexChar = '0'; } else { hexChar = 'A'-10; } return (hexChar + nibble); } char *Get8BitHexStringPtr (uint8_t value) { static char hexString[3]; hexString[0] = GetHexDigit(value >> 4); hexString[1] = GetHexDigit(value & 0x0f); hexString[2] = '\0'; // NIL terminate return hexString; }
The above routine maintains a static character buffer of 3 bytes. Two for the HEX digits, and the third for a NIL terminator (0). I chose to do it this way rather than having the user pass in a buffer pointer since the more parameters you pass, the larger the function call gets. The downside is those 3 bytes of variable storage are reserved forever, so if I was also out of RAM, I might rethink this approach.
I can now use it like this:
const char *msgC = " C:"; // used by strcat() const char *msgT = " T:"; // used by strcat() const char *msgV = " V:"; // used by strcat() char buffer[20]; strcpy (buffer, "CF:"); // allows constants strcat (buffer, Get8BitHexStringPtr(faults)); strcat (buffer, msgC); strcat (buffer, Get8BitHexStringPtr(current)); strcat (buffer, msgT); strcat (buffer, Get8BitHexStringPtr(temp)); strcat (buffer, msgV); strcat (buffer, Get8BitHexStringPtr(volts)); LCDWriteDataString (0, 1, buffer);
If you are wondering why I do a strcpy() with a constant string, then use const pointers for strcat(), that is due to a limitation of the compiler I am using. Their implementation of strcpy() specifically supports string constants. Their implementation of strcat() does NOT, requiring me to jump through more hoops to make this work.
Even with all that extra code, it still ends up being smaller than linking in sprintf().
And, for printing out a 16-bit value in decimal, I am sure there is a clever way to do that, but this is what I did:
char *Get16BitDecStringPtr (uint16_t value) { static char decString[6]; uint16_t temp = 10000; int pos = 0; memset (decString, '0', sizeof(decString)); while (value > 0) { while (value >= temp) { decString[pos]++; value = value - temp; } pos++; temp = temp / 10; } decString[5] = '\0'; // NIL terminate return decString; }
Since I know the value is limited to what 16-bits can old, I know the max value possible is 65535.
I initialize my five-digit string with “00000”. I start with a temporary value of 10000. If the users value is larger than that, I decrement it by that amount and increase the first digit in the string (so “0” goes to “1”). I repeat until the user value has been decremented to be less than 10000.
Then I divide that temporary value by 10, so 10000 becomes 1000. I move my position to the next character in the output string and the process repeats.
Eventually I’ve subtracted all the 10000s, 1000s, 100s, 10s and 1s that I can, leaving me with a string of five digits (“00000” to “65535”).
I am sure there is a better way, and I am open to it if it generates SMALLER code. :)
And that’s my tale of today… I needed some extra ROM space, so I got rid of sprintf() and rolled my own routines for the two specific types of output I needed.
But this is barely scratching the surface of the things I’ve been doing this week to save a few bytes here or there. I’d like to revisit this subject in the future.
Until next time…
Might not be smaller, but I’m curious about what this would do…
#include <stdlib.h>
char *Get16BitDecStringPtr(uint16_t value)
{
static char decString[6];
char *scan = &decString[6];
div_t div_result;
*--scan = '\0';
for (; value > 0; value = div_result.quot) {
div_result = div(value, 10);
*--scan = div_result.rem;
}
/* or maybe memset(decString, '0', scan - decString); */
while (scan > decString) {
*--scan = '0';
}
return decString;
}
I checked this out, and the code grew by about 30 bytes. I suspect it may be from div(), which wasn’t being used in the code before so it had to be added.
From looking at the code, where each “line” is a PIC24 instruction, it looks like the original was 27 or so, and the update was 23, so it is smaller without the div() added.
And of course that should be div_result.rem + ‘0’…
Next time I’ll know better than to use tabs… :(
Looks like comments allow using less-than code greater-than around blocks of code!
int main()
{
test();
}
But not sure what tabs would do, even with that.
If you need to optimize more, maybe you can put the string in RAM and set (only) the data bytes just before printing it? Initializing strings in RAM probably isn’t free, but should be cheaper than repeated strcat()’ing.
E.g.
char msg[] = “CF:xx C:xx T:xx V:%xx”;
msg[4] = ‘0’ + (faults >> 4);
msg[5] = ‘0’ + (faults & 0x0f);
msg[9] = ‘0’ + (current >> 4);
msg[10] = ‘0’ + (current & 0x0f);
/* … */
… where you can replace the hardcoded offsets with sizeof() on fragments like “CF:” (using a #define to ensure you can repeat the fragment in the “template” and the sizeof() calculations.)
That’s a great idea. I’ll have to check the overhead on the PIC24 setting those values manually in five spots versus the routine.
If it was larger, I could change my routine to elimiate the static string, and just have it put the 2 ASCII bytes where the user specifies, like:
char *msg = “CF:xx C:xx T:xx V:xx”;
function (faults, &msg[3]);
function (current, &msg[8]);
function (temp, &msg[13]);
function (voltage, &msg[18]);
I am guessing that on my PIC tools, the overhead of the function call would be more code space. A #define macro could inline it like you suggest:
#define function(v,ptr) { *ptr=(v >> 4); *(ptr+1)=(v & 0xf0); }
…or something like that. That probably wouldn’t be portable, since it wouldn’t work on processors that can’t do odd byte access, so using your offset would solve that:
#define function(v,o) { ptr[o]=(v >>4); ptr[o+1]=(v & 0xf0); }
…perhaps.
Great thoughts!
Good call.
strcpy (buffer, “EF:”);
strcat (buffer, Get8BitHexStringPtr(faults));
…to something like…
char buffer[6] = “EF:xx”;
buffer[3] = ‘0’+(faults >> 4);
buffer[4] = ‘0’+(faults & 0x0f);
…does save code space. I had used my call in five places, and changing it out like that goes from 21748 ROM used to 21616 ROM used, freeing up 132 bytes.
We missed something. Since the ASCII digits aren’t in the order of 0-9, A-F, just taking a ‘0’ and adding the nibble value to it did not work for anything beyond 0-9. :(
My test values were not setting high enough numbers to notice this. I guess the check gets a bit larger with something like:
nibble = (value >> 4);
base = ‘0’ + nibble;
if (nibble > 9) base += XXX; // whatever gets us from ASCII 0 to ASCII A
msg[0] = base;
But that’s two extra if/adds for each digit. Still probably smaller.
Pingback: Links 30/08/2022: LastPass Cracked Again, Tor Browser 11.5.2 | Techrights