Category Archives: Programming

Tackling the Logiker 2022 Vintage Computing Christmas Challenge – part 1

See also: part 1, part 2, part 3, part 4, part 5, part 6 and part 7.

Here we go again! Over in the Facebook Color Computer group, David M. shared a link to this year’s Vintage Computing Christmas Challenge from Logiker. Although I did not submit an entry, I did play with last year’s challenge on my TRS-80 Color Computer.

Last year, it was this:

This year, the challenge is a bit more challenging. Per the challenge website, here is sample code for Commodore:

 10 print"{clear}"
 20 print""
 30 print""
 40 print""
 50 print"               *       *"
 60 print"               **     **"
 70 print"               ***   ***"
 80 print"               **** ****"
 90 print"           *****************"
100 print"            ***************"
110 print"             *************"
120 print"              ***********"
130 print"               *********"
140 print"              ***********"
150 print"             *************"
160 print"            ***************"
170 print"           *****************"
180 print"               **** ****"
190 print"               ***   ***"
200 print"               **     **"
210 print"               *       *"
220 goto 220

Starting with that un-optimized version, I will change it to work on the CoCo 1/2/3’s 32-column screen by adjusting it to be properly centered on that display.

 10 CLS
 50 PRINT"           *       *"
 60 PRINT"           **     **"
 70 PRINT"           ***   ***"
 80 PRINT"           **** ****"
 90 PRINT"       *****************"
100 PRINT"        ***************"
110 PRINT"         *************"
120 PRINT"          ***********"
130 PRINT"           *********"
140 PRINT"          ***********"
150 PRINT"         *************"
160 PRINT"        ***************"
170 PRINT"       *****************"
180 PRINT"           **** ****"
190 PRINT"           ***   ***"
200 PRINT"           **     **"
210 PRINT"           *       *"
220 GOTO 220

Unfortunately, this design is 17 rows tall, and the CoCo’s standard display is only 16. It won’t fit:

We should still be able to enter the challenge by having the program print this pattern, even if it scrolls off the screen a bit. To get one extra line there, we can get rid of the line feed at the end of the final PRINT statement in line 210 by adding a semi-colon to the end:

210 PRINT"           *       *";

And so it begins…

And so it begins

The goal is to make this as small as possible. There were many ways to approach last year’s Christmas tree challenge, and you can read about the results and a follow-up with suggestions folks gave to save a byte or two.

A simple thing is to remove the spaces at the front and replace them with the TAB() command:

 10 CLS
 50 PRINTTAB(7)"    *       *"
 60 PRINTTAB(7)"    **     **"
 70 PRINTTAB(7)"    ***   ***"
 80 PRINTTAB(7)"    **** ****"
 90 PRINTTAB(7)"*****************"
100 PRINTTAB(7)" ***************"
110 PRINTTAB(7)"  *************"
120 PRINTTAB(7)"   ***********"
130 PRINTTAB(7)"    *********"
140 PRINTTAB(7)"   ***********"
150 PRINTTAB(7)"  *************"
160 PRINTTAB(7)" ***************"
170 PRINTTAB(7)"*****************"
180 PRINTTAB(7)"    **** ****"
190 PRINTTAB(7)"    ***   ***"
200 PRINTTAB(7)"    **     **"
210 PRINTTAB(7)"    *       *";
220 GOTO 220

Although this only looks like it saves a character per line (“TAB(8)” versus “seven spaces”), the code itself will be smaller since the TAB command tokenizes down to one (or maybe two?) bytes.

Also, the ending quote is not needed if it’s the last thing on a line, so they could be removed:

 50 PRINTTAB(7)"    *       *
 60 PRINTTAB(7)"    **     **
 70 PRINTTAB(7)"    ***   ***

That would save one byte per line.

But, each line number consumes 5-bytes on it’s own, so a better way to save space would be to pack the lines together. Each line you eliminate saves five bytes. That would become pretty unreadable though, but let’s do it anyway:

10 CLS:PRINTTAB(7)"    *       *":PRINTTAB(7)"    **     **":PRINTTAB(7)"    ***   ***":PRINTTAB(7)"    **** ****":PRINTTAB(7)"*****************":PRINTTAB(7)" ***************":PRINTTAB(7)"  *************":PRINTTAB(7)"   ***********"
130 PRINTTAB(7)"    *********":PRINTTAB(7)"   ***********":PRINTTAB(7)"  *************":PRINTTAB(7)" ***************":PRINTTAB(7)"*****************":PRINTTAB(7)"    **** ****":PRINTTAB(7)"    ***   ***":PRINTTAB(7)"    **     **"
210 PRINTTAB(7)"    *       *";
220 GOTO 220

That’s quite the unreadable mess!

This could still be made better, since the text lines were kept under the input buffer limitation size, but when you enter that line, BASIC compresses it (tokenizes keywords like PRINT, TAB and GOTO) making it take less space. You can then sometimes EDIT the line, Xtend to the end and type a few more characters.

That may or may not be allowed for the Logiker challenge. And since I want to provide code here you could copy and then load in to an emulator, I’ll keep it to the limit of what you could type in.

In the next installment, I’ll see if my brane can figure out a way to generate this code using program logic rather than brute-force PRINT statements.

Until then…

Reversing bits in C

In my day job, we have a device that needs data sent to it with the bits reversed. For example, if we were sending an 8-bit value of 128, that bit pattern is 10000000. The device expects the high bit first so we’d send it 00000001.

In one system, we do an 8-bit bit reversal using a lookup table. I suppose that one needed it to be really fast.

In another (using a faster PIC24 chip with more RAM, flash and CPU speed), we do it with a simple C routine that was easy to understand.

I suppose this breaks down to four main approaches to take:

  • Smallest Code Size – for when ROM/flash is at a premium, even if the code is a confusingf mess.
  • Smallest Memory Usage – for when RAM is at a premium, even if the code is a confusing mess.
  • Fastest – for when speed is the most important thing, even if the code is a confusing mess.
  • Clean Code – easiest to understand and maintain, for when you don’t want code to be a confusing mess.

In our system, which is made up of multiple independent boards with their own CPUs and firmware, we do indeed have some places where code size is most important (because we are out of room), and other places where speed is most important.

When I noticed we did it two different ways, I wondered if there might be even more approaches we could consider.

I did a quick search on “fastest way to reverse bits in C” and found a variety of resources, and wanted to point out this fun one:

https://graphics.stanford.edu/~seander/bithacks.html#BitReverseObvious

At that section of this lengthy article are a number of methods to reverse bits. Two of them make use of systems that support 64-bit math and do it with just one line of C code (though I honestly have no understanding of how they work).

Just in case you ever need to do this, I hope this pointer is useful to you.

Happy reading!

Color BASIC and string concatenation

The C programming language has a few standard library functions that deal with strings, namely strcpy() (string copy) and strcat() (string concatenate).

Microsoft BASIC has similar string manipulation features built in to the language. For example, to copy an 8 character “string” in to a buffer (array of chars):

char buffer1[80];

strcpy(buffer1, "12345678");

printf("%s\n", buffer1);

In BASIC, you do not need to allocate space for individual strings. Color BASIC allows doing whatever you want with a string provided it is 255 characters or less, and provided the total string space is large enough. By default, Color BASIC reserves 200 bytes for string storage. If you wanted to strcpy() “12345678” to a string variable, you would just do:

BUFFER1$="12345678"

Yes, that’s legal, but Color BASIC only recognizes the first two characters of a string name, so in reality, that is just like doing:

BU$="12345678"

If you need more than the default 200 bytes, the CLEAR command will reserve more string storage. For example, “CLEAR 500” or “CLEAR 10000”.

“CLEAR 500” would let you have five 100 character strings, or 500 one character strings.

And, keep in mind, strings stored in PROGRAM MEMORY do not use this space. For example, if you reduced string space to only 9 bytes, then tried to make a 10 byte string direct from BASIC:

CLEAR 9
A$="1234567890"
?OS ERROR

The wonderful “?OS ERROR” (Out of String Space).

BUT, if strings are declared inside PROGRAM data, BASIC references them from within your program instead of string memory:

5 CLEAR 9
10 A$="1234567890"
20 B$="1234567890"
30 C$="1234567890"
40 D$="1234567890"
50 E$="1234567890"

Yes, that actually works. If you would like to know more, please see my String Theory series.

But I digress…

The other common C string function is strcat(), which appends a string at the end of another:

char buffer1[80];

strcpy(buffer1, "12345678");
strcat(buffer1, "plus this");

printf("%s\n", buffer1);

That code would COPY “12345678” in to the buffer1 memory, then concatenate “plus this” to the end of it, printing out “12345678plus this”.

In BASIC, string concatenation is done by adding the strings together, such as:

A$="12345678"
A$=A$+"plus this"

PRINT A$

Color BASIC allows for strings to be up to 255 characters long, and no more:

The wonderful “?LS ERROR” (Length of String).

Make it Bigger!

In something I am writing, I started out with an 8 character string, and wanted to duplicate it until it was 64 characters long. I did it like this:

10 A$="12345678"
20 A$=A$+A$+A$+A$+A$+A$+A$+A$

In C, if you tried to strcat() a buffer on top of the buffer, it would not work like that.

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

int main(int argc, char **argv)
{
    char buffer1[80];
    
    // buffer1$ = "12345678"
    strcpy(buffer1, "12345678");
    // Result: buffer1="12345678"
    printf ("%s\n", buffer1);
    
    // buffer1$ = buffer1$ + buffer1$
    strcat(buffer1, buffer1);
    // Result: buffer1$="1234567812345678"
    printf ("%s\n", buffer1);

    // buffer1$ = buffer1$ + buffer1$    
    strcat(buffer1, buffer1);
    // Result: buffer1$="12345678123456781234567812345678"
    printf ("%s\n", buffer1);

    return EXIT_SUCCESS;
}

As you can see, each strcat() copies all of the previous buffer to the end of the buffer, doubling the size each time.

The same thing happens if you do it step by step in BASIC:

A$="12345678"
REM RESULT: A$="12345678"

A$=A$+A$
REM RESULT: A$="1234567812345678"

A$=A$+A$
REM RESULT: A$="12345678123456781234567812345678"

But, saying “A$=A$+A$+A$+A$” is not the same as saying “A$=A$+A$ followed by A$=A$+A$”

For example, if you add two strings three times, you get a doubling of string size at each step:

5 CLEAR 500
10 A$="12345678"
20 A$=A$+A$
30 A$=A$+A$
40 A$=A$+A$
50 PRINT A$

Above creates a 64 character string (8 added to 8 to make 16, then 16 added to 16 to make 32, then 32 added to 32 to make 64).

BUT, if you had done the six adds on one line:

5 CLEAR 500
10 A$="12345678"
20 A$=A$+A$ + A$+A$ + A$+A$
50 PRINT A$

…you would get a 48 character string (8 characters, added 6 times).

In C, using strcat(buffer, buffer) with the same buffer has a doubling effect each time, just like A$=A$+A$ does in BASIC each time.

And, adding a bunch of strings together like…

A$=A$+A$+A$+A$+A$+A$+A$+A$ '64 characters

…could also be done in three doubling steps like this:

A$=A$+A$:A$=A$+A$:A$=A$+A$ ' 64 characters

Two different ways to concatenate strings together to make a longer string. Which one should we use?

Must … Benchmark …

In my code, I just added my 8 character A$ up 8 times to make a 64 character string. Then I started thinking about it. And here we go…

Using my standard benchmark program, we will declare an 8 character string then add it together 8 times to make a 64 character string. Over and over and over, and time the results.

0 REM strcat1.BAS
5 DIM TE,TM,B,A,TT
10 FORA=0TO3:TIMER=0:TM=TIMER
20 FORB=0TO1000
30 A$="12345678"
40 A$=A$+A$+A$+A$+A$+A$+A$+A$
70 NEXT
80 TE=TIMER-TM:PRINTA,TE
90 TT=TT+TE:NEXT:PRINTTT/A:END

This produces an average of 1235.

Now we switch to the doubling three times approach:

0 REM strcat2.BAS
5 DIM TE,TM,B,A,TT
10 FORA=0TO3:TIMER=0:TM=TIMER
20 FORB=0TO1000
30 A$="12345678"
40 A$=A$+A$:A$=A$+A$:A$=A$+A$
70 NEXT
80 TE=TIMER-TM:PRINTA,TE
90 TT=TT+TE:NEXT:PRINTTT/A:END

This drops the time down to 888!

Adding separately three times versus add together eight times is a significant speed improvement.

Based on what I learned when exploring string theory (and being shocked when I realized how MID$, LEFT$ and RIGHT$ worked), I believe every time you do a string add, there is a new string created:

“A$=A$+A$+A$+A$+A$+A$+A$+A$” creates eight strings along the way.

“A$=A$+A$:A$=A$+A$:A$=A$+A$” creates six.

No wonder it is faster.

Looks like I need to go rewrite my experiment.

Until next time…

Checksums and zeros and XMODEM and randomness.

A year or two ago, I ran across some C code at my day that finally got me to do an experiment…

When I was first using a modem to dial in to BBSes, it was strictly a text-only interface. No pictures. No downloads. Just messages. (Heck a physical bulletin board at least would let you put pictures on it! Maybe whoever came up with the term BBS was just forward thinking?)

The first program I ever had that sent a program over the modem was DFT (direct file transfer). It was magic.

Later, I got one that used a protocol known as XMODEM. It seems like warp speed compared to DFT!

XMODEM would send a series of bytes, followed by a checksum of those bytes, then the other end would calculate a checksum over the received bytes and compare. If they matched, it went on to the next series of bytes… If it did not, it would resend those bytes.

Very simple. And, believe it or not, checksums are still being used by modern programmers today, even though newer methods have been created (such as CRC).

Checking the sum…

A checksum is simple the value you get when you add up all the bytes of some data. Checksum values are normally not floating point, so they will be limited to a fixed range. For example, an 8-bit checksum (using one byte) can hold a value of 0 to 255. A 16-bit checksum (2 bytes) can hold a value of 0-65535. Since checksums can be much higher values, especially if using an 8-bit checksum, the value just rolls over.

For example, if the current checksum calculated value is 250 for an 8-bit checksum, and the next byte being counted is a 10, the checksum would be 250+10, but that exceeds what a byte can hold. The value just rolls over, like this:

250 + 10: 251, 252, 253, 254, 255, 0, 1, 2, 3, 4

Thus, the checksum after adding that 10 is now 4.

Here is a simple 8-bit checksum routine for strings in Color BASIC:

0 REM CHKSUM8.BAS
10 INPUT "STRING";A$
20 GOSUB 100
30 PRINT "CHECKSUM IS";CK
40 GOTO 10

100 REM 8-BIT CHECKSUM ON A$
110 CK=0
120 FOR A=1 TO LEN(A$)
130 CK=CK+ASC(MID$(A$,A,1))
140 IF CK>255 THEN CK=CK-255
150 NEXT
160 RETURN

Line 140 is what handles the rollover. If we had a checksum of 250 and the next byte was a 10, it would be 260. That line would detect it, and subtract 255, making it 4. (The value starts at 0.)

The goal of a checksum is to verify data and make sure it hasn’t been corrupted. You send the data and checksum. The received passes the data through a checksum routine, then compares what it calculated with the checksum that was sent with the message. If they do not match, the data has something wrong with it. If they do match, the data is less likely to have something wrong with it.

Double checking the sum.

One of the problems with just adding (summing) up the data bytes is that two swapped bytes would still create the same checksum. For example “HELLO” would have the same checksum as “HLLEO”. Same bytes. Same values added. Same checksum.

A good 8-bit checksum.

However, if one byte got changed, the checksum would catch that.

A bad 8-bit checksum.

It would be quite a coincidence if two data bytes got swapped during transfer, but I still wouldn’t use a checksum on anything where lives were at stake if it processed a bad message because the checksum didn’t catch it ;-)

Another problem is that if the value rolls over, that means a long message or a short message could cause the same checksum. In the case of an 8-bit checksum, and data bytes that range from 0-255, you could have a 255 byte followed by a 1 byte and that would roll over to 0. A checksum of no data would also be 0. Not good.

Checking the sum: Extreme edition

A 16-bit or 32-bit checksum would just be a larger number, reducing how often it could roll over.

For a 16-bit value, ranging from 0-65535, you could hold up to 257 bytes of value 255 before it would roll over:

255 * 257 = 65535

But if the data were 258 bytes of value 255, it would roll over:

255 * 258 = 65790 -> rollover to 255.

Thus, a 258-byte message of all 255s would have the same checksum as a 1-byte message of a 255.

To update the Color BASIC program for 16-bit checksum, change line 140 to be:

140 IF CK>65535 THEN CK=CK-65535

Conclusion

Obviously, an 8-bit checksum is rather useless, but if a checksum is all you can do, at least use a 16-bit checksum. If you were using the checksum for data packets larger than 257 bytes, maybe a 48-bit checksum would be better.

Or just use a CRC. They are much better and catch things like bytes being out of order.

But I have no idea how I’d write one in BASIC.

One more thing…

I almost forgot what prompted me to write this. I found some code that would flag an error if the checksum value was 0. When I first saw that, I thought “but 0 can be a valid checksum!”

For example, if there was enough data bytes that caused the value to roll over from 65535 to 0, that would be a valid checksum. To avoid any large data causing value to add up to 0 and be flagged bad, I added a small check for the 16-bit checksum validation code:

if ((checksum == 0) && (datasize < 258)) // Don't bother doing this.
{
    // checksum appears invalid.
}
else if (checksum != dataChecksum)
{
    // checksum did not match.
}
else
{
    // guess it must be okay, then! Maybe...
}

But, what about a buffer full of 00s? The checksum would also be zero, which would be valid.

Conclusion: Don’t error check for a 0 checksum.

Better yet, use something better than a checksum…

Until next time…

3X+1 in C#

Anyone remember when I was writing articles about the 3X+1 math problem? And then I wrote about it some more? And even more?

Apparently, I had planned to write about it even more than that. I found this unpublished source code. Writing a version in Color BASIC wasn’t enough. Apparently I tried writing it in C#, too:

// 3X+1

using System;
					
public class Program
{
	public static void Main()
	{
		while (true)
		{
			Int32 x = 0;

			Console.WriteLine();
			Console.WriteLine("Enter number:");

			x = Int32.Parse(Console.ReadLine());
			
			while (true)
			{
				Console.Write(x);
				Console.Write(" ");
				
				if (x == 1) break;
				
				if ((x & 1) == 1) // Odd
				{
					x = x * 3 + 1;
				}
				else // Even
				{
					x = x / 2;
				}
			}
		}
	}
}

So, if you’re in to that kind of thing (C# is available for Windows, Mac OS X and Linux), you can give that a try and tell me how I should have written it.

Until next time…

When there’s not enough room for sprintf…

Updates:

  • 2022-08-30 – Corrected a major bug in the Get8BitHexStringPtr() routine.

“Here we go again…”

Last week I ran out of ROM space in a work project. For each code addition, I have to do some size optimization elsewhere in the program. Some things I tried actually made the program larger. For example, we have some status bits that get set in two different structures. The code will do it like this:

shortStatus.faults |= FAULT_BIT;
longStatus.faults |= FAULT_BIT;

We have code like that in dozens of places. One of the things I had done earlier was to change that in to a function. This was primarily so I could have common code set fault bits (since each of the four different boards I work with had a different name for its status structures). It was also to reduce the amount of lines in the code and make what they were doing more clear (“clean code”).

void setFault (uint8_t faultBit)
{
    shortStatus.faults |= faultBit;
    longStatus.faults |= faultBit;
}

During a round of optimizing last week, I noticed that the overhead of calling that function was larger than just doing it manually. I could switch back and save a few bytes every time it was used, but since I still wanted to maintain “clean code”, I decided to make a macro instead of the function. Now I can still do:

setFault (FAULT_BIT);

…but under the hood it’s really doing a macro instead:

#define setFault(faultBit) { shortStatus.faults |= faultbit; longStatus.faults |= faultBit; }

Now I get what I wanted (a “function”) but retain the code size savings of in-lining each instance.

I also thought that doing something like this might be smaller:

shortStatus.faults |= FAULT_BIT;
longStatus.faults = shortStatus.faults;

…but from looking at the PIC24 assembly code, that’s much larger. I did end up using it in large blocks of code that conditionally decided which fault bit to set, and then I sync the long status at the end. As long as the overhead of “this = that” is less than the overhead of multiple inline instructions it was worth doing.

And keep in mind, this is because I am 100% out of ROM. Saving 4 bytes here, and 20 bytes there means the difference between being able to build or not.

Formatting Output

One of the reasons for the “code bloat” was adding support for an LCD display. The panel, an LCD2004, hooks up to I2C vie a PCF8574 I2C I/O chip. I wrote just the routines needed for the minimal functionality required: Initialize, Clear Screen, Position Cursor, and Write String.

The full libraries (there are many) for Arduino are so large by comparison, so often it makes more sense to spend the time to “roll your own” than port what someone else has already done. (This also means I do not have to worry about any licensing restrictions for using open source code.)

I created a simple function like:

LCDWriteDataString (0, 0, "This is my message.");

The two numbers are the X and Y (or Column and Row) of where to display the text on the 20×4 LCD screen.

But, I was quickly reminded that the PIC architecture doesn’t support passing constant string data due to “reasons”. (Harvard architecture, for those who know.)

To make it work, you had to do something like:

const char *msg = "This is my message";
LCDWriteDataString (0, 0, msg);

…or…

chr buffer[19];
memcpy (buffer, "This is my message");
LCDWriteDataString (0, 0, msg);

…or, using the CCS compiler tools, add this to make the compiler take care of it for you:

#device PASS_STRINGS=IN_RAM

Initially I did that so I could get on with the task at had, but as I ran out of ROM space, I revisited this to see which approach was smaller.

From looking at the assembly generated by the CCS compiler, I could tell that “PASS_STRINGS=IN_RAM” generated quite a bit of extra code. Passing in a constant string pointer was much smaller.

So that’s what I did. And development continued…

Then I ran out of ROM yet again. Since I had some strings that needed formatted output, I was using sprintf(). I knew that sprintf() was large, so I thought I could create my own that only did what I needed:

char buffer[21];
sprintf (buffer, "CF:%02x C:%02x T:%02x V:%02x", faults, current, temp, volts);
LCDWriteDataString (0, 0, buffer);

char buffer[21];
sprintf (buffer, "Fwd: %u", watts);
LCDWriteDataString (0, 1, buffer);

In my particular example, all I was doing is printing out an 8-bit value as HEX, and printing out a 16-bit value as a decimal number. I did not need any of the other baggage sprintf() was bringing when I started using it.

I came out with these quick and dirty routines:

char GetHexDigit(uint8_t nibble)
{
  char hexChar;

  nibble = (nibble & 0x0f);

  if (nibble <= 9)
  {
    hexChar = '0';
  }
  else
  {
    hexChar = 'A'-10;
  }

  return (hexChar + nibble);
}

char *Get8BitHexStringPtr (uint8_t value)
{
    static char hexString[3];

    hexString[0] = GetHexDigit(value >> 4);
    hexString[1] = GetHexDigit(value & 0x0f);
    hexString[2] = '\0'; // NIL terminate

    return hexString;
}

The above routine maintains a static character buffer of 3 bytes. Two for the HEX digits, and the third for a NIL terminator (0). I chose to do it this way rather than having the user pass in a buffer pointer since the more parameters you pass, the larger the function call gets. The downside is those 3 bytes of variable storage are reserved forever, so if I was also out of RAM, I might rethink this approach.

I can now use it like this:

const char *msgC = " C:"; // used by strcat()
const char *msgT = " T:"; // used by strcat()
const char *msgV = " V:"; // used by strcat()

char buffer[20];

strcpy (buffer, "CF:"); // allows constants
strcat (buffer, Get8BitHexStringPtr(faults));
strcat (buffer, msgC);
strcat (buffer, Get8BitHexStringPtr(current));
strcat (buffer, msgT);
strcat (buffer, Get8BitHexStringPtr(temp));
strcat (buffer, msgV);
strcat (buffer, Get8BitHexStringPtr(volts));

LCDWriteDataString (0, 1, buffer);

If you are wondering why I do a strcpy() with a constant string, then use const pointers for strcat(), that is due to a limitation of the compiler I am using. Their implementation of strcpy() specifically supports string constants. Their implementation of strcat() does NOT, requiring me to jump through more hoops to make this work.

Even with all that extra code, it still ends up being smaller than linking in sprintf().

And, for printing out a 16-bit value in decimal, I am sure there is a clever way to do that, but this is what I did:

char *Get16BitDecStringPtr (uint16_t value)
{
    static char decString[6];
    uint16_t temp = 10000;
    int pos = 0;

    memset (decString, '0', sizeof(decString));

    while (value > 0)
    {
        while (value >= temp)
        {
            decString[pos]++;
            value = value - temp;
        }

        pos++;
        temp = temp / 10;
    }

    decString[5] = '\0'; // NIL terminate

    return decString;
}

Since I know the value is limited to what 16-bits can old, I know the max value possible is 65535.

I initialize my five-digit string with “00000”. I start with a temporary value of 10000. If the users value is larger than that, I decrement it by that amount and increase the first digit in the string (so “0” goes to “1”). I repeat until the user value has been decremented to be less than 10000.

Then I divide that temporary value by 10, so 10000 becomes 1000. I move my position to the next character in the output string and the process repeats.

Eventually I’ve subtracted all the 10000s, 1000s, 100s, 10s and 1s that I can, leaving me with a string of five digits (“00000” to “65535”).

I am sure there is a better way, and I am open to it if it generates SMALLER code. :)

And that’s my tale of today… I needed some extra ROM space, so I got rid of sprintf() and rolled my own routines for the two specific types of output I needed.

But this is barely scratching the surface of the things I’ve been doing this week to save a few bytes here or there. I’d like to revisit this subject in the future.

Until next time…

C and size_t and 64-bits

Do what I say and nobody gets hurt!

At my day job, I work on a Windows application that supervises high power solid state microwave generators. (The actual controlling part is done by multiple embedded PIC24-based boards, which is a good thing considering the issues Windows has given us over the years.)

At some point, we switched from building a 32-bit version of this application to a 64-bit version. The compiler started complaining about various things dealing with “ints” which were not 64-bits, so the engineer put in #ifdef code like this:

#ifdef _NI_mswin64_
    unsigned __int64 length = 0;
#else
    unsigned int length = 0;
#endif

That took care of the warnings since it would now use either a native “int” or a 64-bit int, depending on the target.

Today I ran across this and wondered why C wasn’t just taking care of things. A C library routine that returns an “int” should always expect an int, whether that int is 16-bits (like on Arduino), 32-bits or 64-bits on the system. I decided to look in to this, and saw the culprits were things like this:

length = strlen (pathName);

Surely if strlen() returned an int, it should not need to be changed to an “unsigned __int64” to work.

And indeed, C already does take care of this, if you do what it tells you to do. strlen does NOT return an int:

size_t strlen ( const char * str );

size_t is a special C data type that is “whatever type of number it needs to be.” And by simply changing all the #ifdef’d code to actually use the data type the C library call specifies, all the errors go away and the #ifdefs can be removed.

size_t length = 0;

A better, more stricter compiler might have complained about using an “int” to catch something coming back as “size_t.”

Oh wait. It did. We just chose to solve it a different way.

Until next time…

More CoCo MC6847 VDG chip “draw black” challenge responses.

See also: challenge, responses, and more responses.

Today Sebastian Tepper submitted a solution to the “draw black” challenge. He wrote:

I think this is much faster and avoids unnecessary SETs. Instruction 100 will do the POKE only once per character block.

– Sebastian Tepper, 7/5/2022

The routine he presented (seen in lines 100 and 101) looked like this:

10 CLS
20 FOR A=0 TO 31
30 X=A:Y=A:GOSUB 100
40 NEXT
50 GOTO 50
100 IF POINT(X,Y)<0 THEN POKE 1024+Y*16+X/2,143
101 RESET(X,Y):RETURN

It did see the criteria of the challenge, correctly drawing a diagonal line from (0,0) down to (31,31) on the screen. And, it was fast.

POINT() will return -1 if the location is not a graphics character. On the standard CLS screen, the screen is filled with character 96 — a space. (That’s the value you use to POKE to the screen, but when printing, it would be CHR$(32) instead.) His code would simply figure out which screen character contained the target pixel, and POKE it to 143 before setting the pixel.

So I immediately tried to break it. I wondered what would happen if it was setting two pixels next to each other in the same block. What would RESET do?

I added a few lines to the original test program so it drew the diagonal line in both directions PLUS draw a box (with no overlapping corners). My intent was to make it draw a horizontal line on an even pixel line, and odd pixel line, and the same for verticals. It looks like this (and the original article has been updated):

10 CLS
20 FOR A=0 TO 15

30 X=A:Y=A:GOSUB 100
31 X=15-A:Y=16+A:GOSUB 100

32 X=40+A:Y=7:GOSUB 100
33 X=40+A:Y=24:GOSUB 100

34 X=39:Y=8+A:GOSUB 100
35 X=56:Y=8+A:GOSUB 100

40 NEXT
50 GOTO 50

And this did break Sebastian’s routine… and he immediately fixed it:

100 IF POINT(X,Y)<0 THEN POKE 1024+INT(Y/2)*32+INT(X/2),143
101 RESET(X,Y):RETURN

I haven’t looked at what changed, but I see it calculates the character memory location by dividing Y by two (and making sure it’s an integer with no floating point decimals — so for 15 becomes 7 rather than 7.5), and then adds half of X. (Screen blocks are half as many as the SET/RESET pixels).

And it works. And it works well — all cases are satisfied.

And if that wasn’t enough, some optimizations came next:

And for maximum speed you could change line 100 from:

100 IF POINT(X,Y)<0 THEN POKE 1024+INT(Y/2)*32+INT(X/2),143

to:

100 IFPOINT(X,Y)<.THEN POKE&H400+INT(Y/2)*&H20+INT(X/2),&H8F

To time the difference, I added these extra lines:

15 TIMER=0

and:

45 PRINT TIMER

This lowers execution time from 188 to 163 timer units, i.e., down to 87% of the original time.

– Sebastian Tepper, 7/5/2022

Any time I see TIMER in the mix, I get giddy.

Spaces had been removed, 0 was changed to . (which BASIC will see a much faster-to-parse zero), and integer values were changed to base-16 hex values.

Also, in doing speed tests about the number format I verified that using hexadecimal numbers was more convenient only when the numbers in question have two or more digits.

– Sebastian Tepper, 7/5/2022

Awesome!

Perhaps final improvement could be to change the screen memory location from 1024/&H400 to a variable set to that value, the multiplication value of 32/&h20, as well as the 143/&H8F. Looking up a variable, if there are not too many of them in the list before the ones you’re looking up, can be even faster.

Using the timer value of 163 for our speed to beat, first I moved that extra space just to see if it mattered. No change.

Then I declared three new variables, and used DIM to put them in the order I wanted them (the A in the FOR/NEXT loop initially being the last):

11 DIM S,C,W,A
12 S=1024:W=32:C=143
...
100 IFPOINT(X,Y)<.THENPOKES+INT(Y/2)*W+INT(X/2),C
101 RESET(X,Y):RETURN

No change. I still got 163. So I moved A to the start. A is used more than any other variable, so maybe that will help:

11 DIM A,S,C,W

No change — still 163.

Are there any other optimizations we could try? Let us know in the comments.

Thank you for this contribution, Sebastian. I admire your attention to speed.

Until next time…

Redundant C variable initialization for redundant reasons.

The top article on this site for the past 5 or so years has been a simple C tidbit about splitting 16-bit values in to 8-bit values. Because of this, I continue to drop small C things here in case they might help when someone stumbles upon them.

Today, I’ll mention some redundant, useless code I always try to add.,,

I seem to recall that older specifications for the C programming language did not guarantee variables would be initialized to 0. I am not even sure if the current specification defines this, since one of the compilers I use at work has a specific proprietary override to enable this behavior.

You might find that this code prints non-zero on certain systems:

int i;

printf ("i = %d\n", i);

Likewise, trying to print a buffer that has not been initialized might produce non-empty data:

char message[32];

printf ("Message: '%s'\n", message);

Because of this, it’s a good habit to always initialize variables with at least something:

int i=0;

char message[42];
...
memset (message, 0x0, sizeof(message));

Likewise, when setting variables in code, it is also a good idea to always set an expected result and NOT rely on any previous initialization. For example:

int result = -1;

if (something == 1)
{
    result = 10;
}
else if (something == 2)
{
    result = 42;
}
else
{
    result = -1;
}

Above, you can clearly see that in the case none of the something values are met, it defaults to setting “result” to the same value it was just initialized to.

This is just redundant, wasteful code.

And you should always do it, unless you absolutely positively need those extra bytes of code space.

It is quite possible that at some point this code could be copy/pasted elsewhere, without the initialization. On first compile, the coder sees the undeclared “result” and just adds “int result;” at the top of the function. If the final else with “result = -1;” wasn’t there, the results could be unexpected.

The reverse of this is also true. If you know you are coding so you ALWAYS return a value and never rely on initialized defaults, it would be safe to just do “int result;” at the top of this code. But, many modern compilers will warn you of “possibly initialized variables.”

Because of this, I always try to initialize any variable (sometimes to a value I know it won’t ever use, to aid in debugging — “why did I suddenly get 42 back from this function? Oh, my code must not be running…”).

And I always try to have a redundant default “else” or whatever to set it, instead of relying on “always try.”

Maybe two “always tries” make a “did”?

Until next time…