Category Archives: Programming

I hate floating point.

See also: the 902.1 incident of 2020.

Once again, oddness from floating point values took me down a rabbit hole trying to understand why something was not working as I expected.

Earlier, I had stumbled upon one of the magic values that a 32-bit floating point value cannot represent in C. Instead of 902.1, a float will give you 902.099976… Close, but it caused me issues due to how we were doing some math conversions.

float value = 902.1;
printf ("value = %f\n", value);

To work around this, I switched these values to double precision floating point values and now 902.1 shows up as 902.1:

double value = 902.1;
printf ("value = %f\n", value);

That example will indeed show 902.100000.

This extra precision ended up causing a different issue. Consider this simple code, which took a value in kilowatts and converted it to watts, then converted that to a signed integer.

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>

int main(int argc, char **argv)
{
    double kw = 64.60;
    double watts = kw * 1000;

    printf ("kw   : %f\n", kw);

    printf ("watts: %f\n", watts);

    printf ("int32: %d\n", (int32_t)watts);

    return EXIT_SUCCESS;
}

That looks simple enough, but the output shows it is not:

kw   : 64.600000
watts: 64600.000000
int32: 64599

Er… what? 64.6 multiplied by 1000 displayed as 64600.00000 so that all looks good, but when converted to a signed 32-bit integer, it turned in to 64599. “Oh no, not again…”

I was amused that, by converting these values to float instead of double it worked as I expected:

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>

int main(int argc, char **argv)
{
    float kw = 64.60;
    float watts = kw * 1000;

    printf ("kw   : %f\n", kw);

    printf ("watts: %f\n", watts);

    printf ("int32: %d\n", (int32_t)watts);

    return EXIT_SUCCESS;
}
kw   : 64.599998
watts: 64600.000000
int32: 64600

Apparently, whatever extra precision I was gaining from using double in this case was adding enough extra precision to throw off the conversion to integer.

I don’t know why. But at least I have a workaround.

Until next (floating point problem) time…

C structures and padding and sizeof

This came up at my day job when two programmers were trying to get a block of data to be the size both expected it to be. Consider this example:

typedef struct
{
    uint8_t     byte1;  // 1
    uint16_t    word1;  // 2
    uint8_t     byte2;  // 1
    uint16_t    word2;  // 2
    uint8_t     byte3;  // 1
                        // 7 bytes
} MyStruct1;

The above structure represents three 8-bit byte values and two 16-bit word values for a total of 7 bytes.

However, if you were to run this code in GCC for Windows, and print the sizeof() that structure, you would see it returns 10:

sizeof(MyStruct1) = 10

This is due to the compiler padding variables so they all start on a 16-bit boundary.

The expected data storage in memory feels like it should be:

[..|..|..|..|..|..|..] = 7 bytes
 |  |  |  |  |  |  |
 |  |  |  |  \  /  byte3
 |  |  |  |  word2
 |   \ /  byte2
 |  word1
 byte1

But, using GCC on a Windows 10 machine shows that each value is stored on a 16-bit boundary, leaving unused padding bytes after the 8-bit values:

[..|xx|..|..|..|xx|..|..|..|xx] = 10 bytes
 |     |  |  |     |  |  |
 |     |  |  |     \  /  byte3
 |     |  |  |     word2
 |      \ /  byte2
 |     word1
 byte1

As you can see, three extra bytes were added to the “blob” of memory that contains this structure. This is being done so each element starts on an even-byte address (0, 2, 4, etc.). Some processors require this, but if you were using one that allowed odd-byte access, you would likely get a sizeof() 7.

Do not rely on processor architecture

To create portable C, you must not rely on the behavior of how things work on your environment. The same can/will could produce different results on a different environment.

See also: sizeof() matters, where I demonstrated a simple example of using “int” and how it was quite different on a 16-bit Arduino versus a 32/64-bit PC.

Make it smaller

One easy thing to do to reduce wasted memory in structures is to try to group the 8-bit values together. Using the earlier structure example, by simple changing the ordering of values, we can reduce the amount of memory it uses:

typedef struct
{
    uint8_t     byte1;  // 1
    uint8_t     byte2;  // 1
    uint8_t     byte3;  // 1
    uint16_t    word1;  // 2
    uint16_t    word2;  // 2
                        // 7 bytes
} MyStruct2;

On a Windows 10 GCC compiler, this will produce:

sizeof(MyStruct1) = 8

It is still not the 7 bytes we might expect, but at least the waste is less. In memory, it looks like this:

[..|..|..|xx|..|..|..|..] = 8 bytes
 |  |  |     |  |  \  /
 |  |  |     \  /  word2
 |  |  |     word1
 |  |  byte3
 |  byte2
 byte1

You can see an extra byte of padding being added after the third 8-bit value. Just out of curiosity, I moved the third byte to the end of the structure like this:

typedef struct
{
    uint8_t     byte1;  // 1
    uint8_t     byte2;  // 1
    uint16_t    word1;  // 2
    uint16_t    word2;  // 2
    uint8_t     byte3;  // 1
                        // 7 bytes
} MyStruct3;

…but that also produced 8. I believe it is just adding an extra byte of padding at the end (which doesn’t seem necessary, but perhaps memory must be reserved on even byte boundaries and this just marks that byte as used so the next bit of memory would start after it).

[..|..|..|..|..|..|..|xx] = 8 bytes
 |  |  |  |  |  |  |
 |  |  |  |  \  /  byte3
 |  |  \  /  word2
 |  |  word1
 |  byte2
 byte1

Because you cannot ensure how a structure ends up in memory without knowing how the compiler works, it is best to simply not rely or expect a structure to be “packed” with all the bytes aligned like the code. You also cannot expect the memory usage is just the values contained in the structure.

I do frequently see programmers attempt to massage the structure by adding in padding values, such as:

typedef struct
{
    uint8_t     byte1;    // 1
    uint8_t     padding1; // 1

    uint16_t    word1;    // 2

    uint8_t     byte2;    // 1
    uint8_t     padding2; // 1

    uint16_t    word2;    // 2

    uint8_t     byte3;    // 1
    uint8_t     padding3; // 1
                          // 10 bytes
} MyPaddedStruct1;

At least on a system that aligns values to 16-bits, the structure now matches what we actually get. But what if you used a processor where everything was aligned to 32-bits?

It is always best to not assume. Code written for an Arduino one day (with 16-bit integers) may be ported to a 32-bit Raspberry Pi Pico at some point, and not work as intended.

Here’s some sample code to try. You would have to change the printfs to Serial.println() and change how it prints the sizeof() values, but then you could see what it does on a 16-bit Arduino UNO versus a 32-bit PC or other system.

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>

typedef struct
{
    uint8_t     byte1;  // 1
    uint16_t    word1;  // 2
    uint8_t     byte2;  // 1
    uint16_t    word2;  // 2
    uint8_t     byte3;  // 1
                        // 7 bytes
} MyStruct1;

typedef struct
{
    uint8_t     byte1;  // 1
    uint8_t     byte2;  // 1
    uint8_t     byte3;  // 1
    uint16_t    word1;  // 2
    uint16_t    word2;  // 2
                        // 7 bytes
} MyStruct2;

typedef struct
{
    uint8_t     byte1;  // 1
    uint8_t     byte2;  // 1
    uint16_t    word1;  // 2
    uint16_t    word2;  // 2
    uint8_t     byte3;  // 1
                        // 7 bytes
} MyStruct3;

int main()
{
    printf ("sizeof(MyStruct1) = %u\n", (unsigned int)sizeof(MyStruct1));
    printf ("sizeof(MyStruct2) = %u\n", (unsigned int)sizeof(MyStruct2));
    printf ("sizeof(MyStruct3) = %u\n", (unsigned int)sizeof(MyStruct3));

    return EXIT_SUCCESS;
}

Until next time…

Color BASIC Attract Screen – part 4

See also: part 1, part 2, part 3, part 4, unrelated, part 5 and part 6.

Beyond removing some spaces and a REM statement, here is the smallest I have been able to get my “attract” program:

10 ' ATTRACT4.BAS
20 FOR I=0 TO 3:READ L(I),LD(I),CL(I),CD(I):NEXT:Z=143:CLS 0:PRINT @268,"ATTRACT!";
30 Z=Z+16:IF Z>255 THEN Z=143
40 FOR I=0 TO 3:POKE L(I),Z:L(I)=L(I)+LD(I):FOR C=0 TO 3:IF L(I)=CL(C) THEN LD(I)=CD(C)
50 NEXT:NEXT:GOTO 30
60 DATA 1024,1,1024,1,1047,1,1055,32,1535,-1,1535,-1,1512,-1,1504,-32

(We could reduce it by one line by sticking the DATA statement on the end of line 50, now that I look at it.)

Let’s rewind and look at the original, which used individual variables for each of the moving color blocks:

10 ' ATTRACT.BAS
20 A=1024:B=A+23:C=1535:D=C-23:Z=143
30 AD=1:BD=1:CD=-1:DD=-1
40 CLS 0:PRINT @268,"ATTRACT!";
50 POKE A,Z:POKE B,Z:POKE C,Z:POKE D,Z
60 Z=Z+16:IF Z>255 THEN Z=143
70 A=A+AD
80 IF A=1055 THEN AD=32
90 IF A=1535 THEN AD=-1
100 IF A=1504 THEN AD=-32
110 IF A=1024 THEN AD=1
120 ' 
130 B=B+BD
140 IF B=1055 THEN BD=32
150 IF B=1535 THEN BD=-1
160 IF B=1504 THEN BD=-32
170 IF B=1024 THEN BD=1
180 ' 
190 C=C+CD
200 IF C=1055 THEN CD=32
210 IF C=1535 THEN CD=-1
220 IF C=1504 THEN CD=-32
230 IF C=1024 THEN CD=1
240 ' 
250 D=D+DD
260 IF D=1055 THEN DD=32
270 IF D=1535 THEN DD=-1
280 IF D=1504 THEN DD=-32
290 IF D=1024 THEN DD=1
300 GOTO 50

This was then converted to us an array:

10 ' ATTRACT2.BAS
20 L(0)=1024:L(1)=1024+23:L(2)=1535:L(3)=1535-23
30 Z=143
40 CL(0)=1024:CD(0)=1
50 CL(1)=1055:CD(1)=32
60 CL(2)=1535:CD(2)=-1
70 CL(3)=1504:CD(3)=-32
80 CLS 0:PRINT @268,"ATTRACT!";
90 LD(0)=1:LD(1)=1:LD(2)=-1:LD(3)=-1
100 FOR I=0 TO 3:POKE L(I),Z:NEXT
110 Z=Z+16:IF Z>255 THEN Z=143
120 FOR I=0 TO 3:L(I)=L(I)+LD(I):NEXT
130 FOR L=0 TO 3
140 FOR C=0 TO 3
150 IF L(L)=CL(C) THEN LD(L)=CD(C)
160 NEXT
170 NEXT
180 GOTO 100

And then it was converted to use READ/DATA instead of hard-coding values:

10 ' ATTRACT3.BAS
20 FOR I=0 TO 3
30 READ L(I),LD(I),CL(I),CD(I)
40 NEXT
50 Z=143
60 CLS 0:PRINT @268,"ATTRACT!";
70 Z=Z+16:IF Z>255 THEN Z=143
80 FOR I=0 TO 3
90 POKE L(I),Z
100 L(I)=L(I)+LD(I)
110 FOR C=0 TO 3
120 IF L(I)=CL(C) THEN LD(I)=CD(C)
130 NEXT
140 NEXT
150 GOTO 70
160 ' L,LD,CL,CD
170 DATA 1024,1,1024,1
180 DATA 1047,1,1055,32
190 DATA 1535,-1,1535,-1
200 DATA 1512,-1,1504,-32

Shuffling code around is fun.

But it’s still really slow.

10 PRINT “FASTER”

There are other ways to do similar effects, such as with strings. We could make a string that contained a repeating series of the color block characters, like this:

FOR I=0 TO 7:B$=B$+CHR$(143+16*I):NEXT

Then we could duplicate that 8-character string a few times until we had a string that was twice the length of the 32 column screen:

B$=B$+B$+B$+B$+B$+B$+B$+B$

Then we could make the entire thing move by printing the MID$ of it, like this:

FOR I=1 TO 32
PRINT@0,MID$(B$,33-I,32);
PRINT@480,MID$(B$,I,31);
NEXT

We print one section @0 for the top line, and the other @480 for the bottom line. Unfortunately, using PRINT instead of POKE means if we ever print on the bottom right location, the screen would scroll, so the bottom right block has to be left un-printed (thus, printing 31 characters for the bottom line instead of the full 32). This bothers me so apparently I do have O.C.D. Maybe we can fix that later.

But, it gives the advantage of scrolling ALL the blocks, and is super fast. Check it out:

10 ' ATTRACT5.BAS
20 CLS 0:PRINT @268,"ATTRACT!";
30 FOR I=0 TO 7:B$=B$+CHR$(143+16*I):NEXT
40 B$=B$+B$+B$+B$+B$+B$+B$+B$
50 FOR I=1 TO 32
60 PRINT@0,MID$(B$,33-I,32);
70 PRINT@480,MID$(B$,I,31);
80 NEXT:GOTO 50

That’s not bad, but only gives the top and bottom rows (minus that bottom right location). But, it’s fast!

ATTRACT5.BAS

Since the orders of the colors is the same on the top and bottom, we’d really need to reverse the bottom characters to make it look like it’s rotating versus just reversing. Let’s tweak that:

10 ' ATTRACT6.BAS
20 CLS 0:PRINT @268,"ATTRACT!";
30 FOR I=0 TO 7:B$=B$+CHR$(143+16*I)
35 R$=R$+CHR$(255-16*I):NEXT
40 B$=B$+B$+B$+B$+B$+B$+B$+B$
45 R$=R$+R$+R$+R$+R$+R$+R$+R$
50 FOR I=1 TO 32
60 PRINT@0,MID$(B$,33-I,32);
70 PRINT@480,MID$(R$,I,31);
80 NEXT:GOTO 50

That’s a bit better. But getting the sides to work is a bit more work and it will slow things down quite a bit. But let’s try anyway.

Initially, I tried scanning down the sides of the string using MID$, like this:

FOR J=1 TO 14
PRINT@480-32*J,MID$(R$,39-J+I,1);
PRINT@31+32*J,MID$(R$,33-J+I,1);
NEXT

But that was very, very slow. You could see it “paint” the sides. Each time you use MID$, a new string is created (with data copied from the first string). That’s a bunch of memory shuffling just for one character.

Then I thought, since I can’t get the speed up from a horizontal string being PRINTed, it was probably faster to just use CHR$().

I tried that, and it was still too slow.

Benchmark Digression: POKE vs PRINT

This led me back to an earlier benchmark discussion… Since I cannot get any benefit of using PRINT for a vertical column of characters, I could switch to the faster POKE method. This would also allow me to fill that bottom right character block. My O.C.D. approves.

To prove this to myself, again, I did two quick benchmarks — one using PRINT@ and the other using POKE.

0 ' LRBENCH1.BAS
1 ' 4745
10 C=143+16
20 TIMER=0:FOR A=1 TO 1000
30 FOR P=1024 TO 1535 STEP 32
40 POKEP,C
50 NEXT
60 NEXT:PRINT TIMER

0 ' LRBENCH2.BAS
1 ' 6013
10 C=143+16
20 TIMER=0:FOR A=1 TO 1000
30 FOR P=0 TO 511 STEP 32
40 PRINT@P,CHR$(C);
50 NEXT
60 NEXT:PRINT TIMER

Line 1 has the time that it printed for me in the Xroar emulator.

POKE will be the way.

However, there is still a problem: Math.

It just doesn’t add up…

The CoCo screen is 32×16. There are 8 colors. That means those 8 colors can repeat four times along the top of the screen, and four times along the bottom, leaving only 14 on each side going vertical. 32+32+14+14 is 92, which is not evenly divisible by our 8 colors. If we represent them as numbers, they would look like this:

If you start at the top left corner and go across, repeating 12345678 over and over, you end up back at the top left on 4. We have three colors that won’t fit. This means even if I had a nice fast routine for rotating the colors, they would not be evenly balanced using this format.

However…

…if I leave out the four corners, we get 88, and that divides just fine by our 8 colors!

Thus, the actual O.C.D.-compliant border I want to go for would look like this:

The only problem is … how can this be done fast in BASIC?

To be continued…

Bonus: Show Your Work

Here are the stupid BASIC programs I wrote to make the previous four screens:

0 ' border1.bas
10 CLS:C=113:L=1024
20 ' RIGHT
30 L=1024:D=1:T=31:GOSUB 110
40 ' DOWN
50 L=1087:D=32:T=13:GOSUB 110
60 ' LEFT
70 L=1535:D=-1:T=31:GOSUB 110
80 ' UP
90 L=1472:D=-32:T=13:GOSUB 110
100 GOTO 100
110 ' L=LOC, D=DELTA, T=TIMES
120 POKE L,C
130 C=C+1:IF C>120 THEN C=113
140 IF T=0 THEN RETURN
150 L=L+D:IF L>1023 THEN IF L<1536 THEN 170
160 L=L-D:SOUND 200,1
170 T=T-1:GOTO 120
0 ' border2.bas
10 CLS:C=113:L=1024
20 ' RIGHT
30 L=1025:D=1:T=29:GOSUB 110
40 ' DOWN
50 L=1087:D=32:T=13:GOSUB 110
60 ' LEFT
70 L=1534:D=-1:T=29:GOSUB 110
80 ' UP
90 L=1472:D=-32:T=13:GOSUB 110
100 GOTO 100
110 ' L=LOC, D=DELTA, T=TIMES
120 POKE L,C
130 C=C+1:IF C>120 THEN C=113
140 IF T=0 THEN RETURN
150 L=L+D:IF L>1023 THEN IF L<1536 THEN 170
160 L=L-D:SOUND 200,1
170 T=T-1:GOTO 120
0 ' border3.bas
10 CLS 0:C=143:L=1024
20 ' RIGHT
30 L=1025:D=1:T=29:GOSUB 110
40 ' DOWN
50 L=1087:D=32:T=13:GOSUB 110
60 ' LEFT
70 L=1534:D=-1:T=29:GOSUB 110
80 ' UP
90 L=1472:D=-32:T=13:GOSUB 110
100 GOTO 100
110 ' L=LOC, D=DELTA, T=TIMES
120 POKE L,C
130 C=C+16:IF C>255 THEN C=143
140 IF T=0 THEN RETURN
150 L=L+D:IF L>1023 THEN IF L<1536 THEN 170
160 L=L-D:SOUND 200,1
170 T=T-1:GOTO 120

Revisiting 10 PRINT RACER

Awhile back I ported 8-Bit Show and Tell‘s “10 PRINT RACER” from Commodore PET to CoCo. I tried to make it a literal port, keeping the code as close as I could to the original. I did, however, mention a few things that could make it faster, taking advantage of things like Extended Color BASIC’s hex values (&H2 is faster to parse than 2, for instance).

The other day, MiaM left a comment on the original article:

It might be faster to use A=ASC(INKEY$) and IF A=4 instead of IF A$=CHR$(4)

– MiaM

Intriguing. The original Commodore version, the direction was read by using GET A$, and I simply converted that over to A$=INKEY$ for Color BASIC. Here is a look at Robin’s Commodore PET original:

1 REM 10 PRINT RACER: 8-BIT SHOW & TELL
5 R$="":PRINT"{CLR}INIT:";:FORX=1TO75:M$=CHR$(205.5+RND(.)):R$=R$+M$:PRINTM$;:NEXT
10 PRINT"{CLR}":C=20:R=13:W=15:D=0:S=32768
20 L=0:FORZ=0TO1STEP0:X=RND(.)*10
30 IFX<4THENR=R-1:IFR<1THENR=1
40 IFX>6THENR=R+1:IFR+W>37THENR=37-W
50 RN=RND(.)*35+1:PRINTMID$(R$,RN,R);SPC(W);MID$(R$,RN,39-R-W)
60 D=D+1:L=L+1:IFL>49THENL=0:W=W-1:IFW<3THENW=3
70 IFD<25THENNEXT
75 GETA$:IFA$="4"THENC=C-1
80 IFA$="6"THENC=C+1
90 P=PEEK(S+C):IFP<>32THEN200
100 POKES+C,42:NEXT
200 PRINTSPC(17)"CRASH!":IFD>HTHENH=D
205 PRINT,"SCORE:"D"  HIGH:"H
210 FORX=1TO2000:NEXT:POKE158,0
220 GETA$:IFA$=""THEN220
230 GOTO10

And here is my Color BASIC conversion:

0 ' 10 PRINT RACER
1 ' BY WWW.8BITSHOWANDTELL.COM
2 '
3 ' PORTED FROM PET TO COCO
4 ' BY SUBETHASOFTWARE.COM
5 R$="":CLS:PRINT"INIT:";:FORX=1TO75:M$=CHR$(47+45*(RND(2)-1)):R$=R$+M$:PRINTM$;:NEXT
6 S$=STRING$(32," ")
10 CLS:C=16:R=10:W=12:D=0:S=1024
20 L=0:FORZ=0TO1STEP0:X=RND(.)*10
30 IFX<4THENR=R-1:IFR<1THENR=1
40 IFX>5THENR=R+1:IFR+W>29THENR=29-W
50 RN=RND(.)*28+1:PRINTMID$(R$,RN,R);MID$(S$,1,W);MID$(R$,RN,31-R-W)
60 D=D+1:L=L+1:IFL>49THENL=0:W=W-1:IFW<3THENW=3
70 IFD<16THENNEXT
75 A$=INKEY$:IFA$=CHR$(8)THENC=C-1
80 IFA$=CHR$(9)THENC=C+1
90 P=PEEK(S+C):IFP<>96THEN200
100 POKES+C,106:NEXT
200 PRINTTAB(13)"CRASH!":IFD>H THENH=D
205 PRINTTAB(6)"SCORE:"D"  HIGH:"H
210 FORX=1TO2000:NEXT:A$=INKEY$
220 A$=INKEY$:IFA$=""THEN220
230 GOTO10

The block of code MiaM refers to is this:

75 GETA$:IFA$="4"THENC=C-1
80 IFA$="6"THENC=C+1

75 A$=INKEY$:IFA$=CHR$(8)THENC=C-1
80 IFA$=CHR$(9)THENC=C+1

On the Commodore PET, without arrow keys, it used “4” and “6” on the numeric keypad for Left and Right. On the CoCo, I changed that to the Left Arrow key and the Right Arrow key.

The Commodore PET has much less work to do looking for A$=”4″ versus A$=CHR$(8) not he CoCo (due to all the parsing). I could have made the CoCo use letter keys like “A” for left and “S” for right to get similar performance.

But what MiaM suggests may be faster. Instead of comparing strings like A$=CHR$(8), the suggestion is to use BASIC’s ASC() keyword to return the numeric value of the character, then compare a numeric value rather than a string compare.

Which is faster? A one character string compare, or ASC() and a number compare?

Let’s find out.

Comparing a String to a String

For this, I dug out my old BENCH.BAS benchmarking code and inserted the first method I wanted to test — the way the Commodore PET did it:

5 DIM TE,TM,B,A,TT
10 FORA=0TO3:TIMER=0:TM=TIMER
20 FORB=0TO1000
30 A$=INKEY$:IF A$="4" THEN REM

70 NEXT
80 TE=TIMER-TM:PRINTA,TE
90 TT=TT+TE:NEXT:PRINTTT/A:END

Comparing A$ to a quoted value in this loop produces 515.

Comparing a String to a CHR$

My conversion changed this to comparing to a CHR$(8) value, like this:

0 REM ascvsstringcompare.BAS
5 DIM TE,TM,B,A,TT
10 FORA=0TO3:TIMER=0:TM=TIMER
20 FORB=0TO1000
30 A$=INKEY$:IF A$="4" THEN REM
30 A$=INKEY$:IF A$=CHR$(8) THEN REM

70 NEXT
80 TE=TIMER-TM:PRINTA,TE
90 TT=TT+TE:NEXT:PRINTTT/A:END

This produces a slower 628. No surprise, due to having to parse CHR$() and the number. I could easily speed up the CoCo port by using quoted characters like “A” for Left and “S” for Right.

But I really wanted to use the arrow keys.

ASC and you shall receive…

The new suggestion is to use ASC. ASC will convert a character to its ASCII value (or PETASCII on a Commodore, I would suppose). For example:

PRINT ASC("A")
65

The cool suggestion was to try using INKEY$ as the parameter inside of ASC(), and skipping the use of a variable entirely. Unfortunately, when I tried it, I received:

?FC ERROR

Function Call error. Because, if no key is pressed, INKEY$ returns nothing, which I suppose would be like trying to do:

PRINT ASC("")

We have been able to use INKEY$ directly in other functions, such as INSTR (looking up a character inside a string), and that works even when passing in “”:

PRINT INSTR("","ABCDE")
0

But ASC() won’t work without a character, at least not in Color BASIC. And, even if we used A$=INKEY$, we can’t pass A$ in to ASC() if it is empty (no key pressed) which means we’d need an extra check like:

30 A$=INKEY$:IF A$<>"" THEN IF ASC(A$)=4 THEN ..

The more parsing, the slower. This produced 539, which isn’t as slow as I expected. It’s slower than doing IF A$=”4″ but faster than IF A$=CHR$(8). Thus, it would be faster in my CoCo port than my original.

This did give me another thing to try. ASC() allows you to pass in a string that contains more than one character, but it only acts upon the first letter. You can do this:

PRINT ASC("ALLEN TRIED THIS")
65

This means I could always pad the return of INKEY$ with another character so it would either be whatever keys he user pressed, or my other character if nothing was pressed. Like this:

30 IF ASC(INKEY$+".")=8 THEN REM

If no key has been pressed, this would try to parse “”+”.”, and give me the ASCII of “.”.

If a key had been pressed, this would parse that character (like “4.” if I pressed a 4).

As I learned when I first stated my benchmarking BASIC series, string manipulation is slow. Very slow. So I expect this to be very slow.

To my surprise, it returns 520! Just a smidge slower than the original IF A$=”4″ string compare! I’m actually quite surprised.

Now, in the actual 10 PRINT RACER game, which is doing lots of string manipulations to generate the game maze, this could end up being much slower if it had to move around other larger strings. But, still worth a shot.

Thank you, MiaM! Neat idea, even if Color BASIC wouldn’t let me do it the cool way you suggested.

Until next time…

Bonus

Numbers verses string compares:

30 IF Z=4 THEN REM

That gives me 350. Even though decimal values are much slower to parse than HEX values, they are still faster than strings.

But, in pure Color BASIC, there is no way to get input from a keypress to a number other than ASC. BUT, you could PEEK some BASIC RAM value that is the key being held down, and do it that way (which is something I have discussed earlier).

Any more ideas?

10 PRINT big maze in Color BASIC – part 4

See also: part 1, part 2, part 3 and part 4.

My “big maze” program printed 2×2 character blocks along the bottom of the screen until it got to the bottom right of the screen, then the screen will scroll (and an extra PRINT is added to add a second line) and the process resets and repeats.

After William Astle provided some optimizations, it dawns on me that there was another thing we could try. Here is the code in question (removing unneeded lines and adjusting the GOTO as appropriate):

70 P=448
100 P=P+2:IF P>479 THEN PRINT:GOTO 70
110 GOTO 100

Then William suggested changing the logic it as follows:

70 PRINT:P=448
...
100 P=P+2:IF P>479 THEN 70
110 GOTO 100

That was a very subtle change that could double (or more, or less) the speed just by not needing to parse over “PRINT:GOTO 70” every time P was NOT greater than 479 (which is most of the time in that loop).

This made me think that perhaps instead of checking for greater than 479 we could adjust the logic and check for less than 480. Something like this, perhaps:

70 PRINT:P=448
...
100 P=P+2:IF P<480 THEN 100
110 GOTO 70

There’s really no reason for this to be any different speed, is there? GOTO (“THEN”) 100 still has to start at the top and move forward, the same as GOTO (“THEN”) 70 would.

But, in the first case, it quickly skips “THEN 70” to hit the “GOTO 100” below, every time the value is not greater than 479. That

In the second, every time the value is LESS than 480 it returns to 100 (go to top of program and search forward).

Should it matter?

Here is the logic isolated.

5 TIMER=0
10 PRINT:P=0
20 P=P+1:IF P<1001 THEN 20
30 GOTO 50
50 PRINT TIMER

And here is the other version:

5 TIMER=0
10 PRINT:P=0
20 P=P+1:IF P>1000 THEN 50
30 GOTO 20
50 PRINT TIMER

I have adjusted it to reset the TIMER at the start, and count from 0 to 1000. In each case, instead of repeating forever, it ends, printing the TIMER.

The first version looks odd because in the real version, line 30 would be “GOTO 10” to reset P and continue.

The second version would have line 20 end with “THEN 10” to reset and continue.

I just wanted to make them as close as possible.

Which one is faster?

Until next time…

10 PRINT big maze in Color BASIC – part 3

See also: part 1, part 2, part 3 and part 4.

When we last left off, it had been so long since I did any BASIC programming that I found myself wondering why these two sections of BASIC did not perform as I expected:

0 'bigmazebench.bas
100 P=0:TIMER=0:A=0
110 P=0
120 P=P+2:IF P>479 THEN PRINT:GOTO 110
120 A=A+1:IF A >1000 THEN 150
140 GOTO 120
150 PRINT TIMER

200 P=0:TIMER=0:A=0
210 PRINT:P=0
220 P=P+2:IF P>479 THEN 210
230 A=A+1:IF A>1000 THEN 250
240 GOTO 220
250 PRINT TIMER

William Astle once again saw the obvious (just not obvious to me at the time)…

If you have both versions in the same program, the “backwards” jumps will be slower the later in the program they are because they have to do a sequential scan of the program from the beginning to find the correct line number. If you have been running them in the same program, try separating them and running them independently.

– William Astle

Well, duh. Of course. When the block of code starting at line 200 runs, the GOTO 220 has to start at the top of the program and seek past every line to find 220. Much slower compared to how few lines the GOTO 120 has to. Normally my benchmark program is inside a FOR/NEXT loop so there is no line seeking and it behaves the same speed regardless of line number location…

So let’s try them one at a time. I loaded the program and deleted the line 0 comment, and lines 200 and up (DEL 0 and DEL 200-):

100 P=0:TIMER=0:A=0
110 P=0
120 P=P+2:IF P>479 THEN PRINT:GOTO 110
120 A=A+1:IF A >1000 THEN 150
140 GOTO 120
150 PRINT TIMER

This gives me 762.

Then, loading it again, and deleting everything up to 200 (“DEL -199”):

200 P=0:TIMER=0:A=0
210 PRINT:P=0
220 P=P+2:IF P>479 THEN 210
230 A=A+1:IF A>1000 THEN 250
240 GOTO 220
250 PRINT TIMER

That gives me 1394!

Yep, William’s suggestion of moving the PRINT to the destination line, instead of using “THEN PRINT:GOTO xxx” almost doubled the speed it takes to run through that code.

Nicely done, William.

Until next time…

10 PRINT big maze in Color BASIC – part 2

See also: part 1, part 2, part 3 and part 4.

Previously, I presented this Color BASIC program:

0 ' BIGMAZE.BAS
10 C=2
20 B$=CHR$(128)
30 L$=CHR$(128+16*C+9)
40 R$=CHR$(128+16*C+6)
50 M$(0,0)=B$+R$:M$(0,1)=R$+B$
60 M$(1,0)=L$+B$:M$(1,1)=B$+L$
70 P=512-32*2
80 M=RND(2)-1
90 PRINT@P,M$(M,0);:PRINT@P+32,M$(M,1);
100 P=P+2:IF P>479 THEN PRINT:GOTO 70
110 GOTO 80

Running it produces this:

4×4 Maze

And I ended with “Make it smaller. Make it faster.”

William Astle commented:

Welp, everything up to line 60 can be mashed into a single line. Since it’s all setup and none of it is performance critical, you can dispense with the variables and just set the M$ array directly. More typing, but it keeps the variable table smaller. Or define P right at the start so it’s the first variable in the table which would give you a speedup all on its own since the lookups to find P will be faster.

I suspect that if you put the PRINT statement from line 100 at, say, the start of line 70 and have “THEN 70” instead of “THEN PRINT:GOTO 70”, you might get a bit of a performance gain there, especially in the false case where that gives a handful fewer bytes to skip over.

There might be some sort of trick involving FOR/NEXT that can be used to improve the main loop but I think the overhead of setting up a FOR loop will be more than the saving in this case, especially if the setup lines are combined into a single program line.

On a side note, and this won’t improve the speed any, you could put a DIM M(1,1) at the start to avoid the implied DIM M(10,10). That saves a bit of memory, though I don’t think that’s even an issue for this program even on a 4K machine. But it is 585 bytes nevertheless.

– William Astle

Let’s start with the “everything up to line 60” part, which gives us this:

10 M$(0,0)=CHR$(128)+CHR$(166):M$(0,1)=CHR$(166)+CHR$(128):M$(1,0)=CHR$(169)+CHR$(128):M$(1,1)=CHR$(128)+CHR$(169)

If I compare that to the original, it’s about a few less characters to type:

10 C=2:B$=CHR$(128):L$=CHR$(128+16*C+9):R$=CHR$(128+16*C+6):M$(0,0)=B$+R$:M$(0,1)=R$+B$:M$(1,0)=L$+B$:M$(1,1)=B$+L$

It loses the ability to change the color of the maze (easily), but it saves three string variables (B$ for blank block, R$ for right block, and L$ for left block) and one numeric variable (C for color). Definitely lower RAM use, and I am sure it is code-space too since you can’t tokenize “128+16*C+6” (10 bytes) which is replaced by “166”.

Combining the rest of the lines, where possible, and moving the PRINT (so line 100 has a bit less to parse through to get to the end of that line when P>479 is not true) results in:

0 ' BIGMAZE2.BAS - William Astle
10 M$(0,0)=CHR$(128)+CHR$(166):M$(0,1)=CHR$(166)+CHR$(128):M$(1,0)=CHR$(169)+CHR$(128):M$(1,1)=CHR$(128)+CHR$(169)
70 PRINT:P=448
80 M=RND(2)-1:PRINT@P,M$(M,0);:PRINT@P+32,M$(M,1);:P=P+2:IFP>479THEN70
110 GOTO80

On my simulated CoCo, removing the REM statements, then loading the original version and doing “? MEM” showed 8256. Doing the same to the second version shows 8307 — saving 51 bytes of program space. I did not measure what the saving in string and variable memory would be, but that would be even more. Great win.

Since the difference was mostly in the setup of the variables, they should run at the same speed — or will they? Let’s quickly test William’s suggestion of moving the PRINT so the IF statement doesn’t have to parse the end of the line:

0 'bigmazebench.bas
100 P=0:TIMER=0:A=0
110 P=0
120 P=P+2:IF P>479 THEN PRINT:GOTO 110
120 A=A+1:IF A >1000 THEN 150
140 GOTO 120
150 PRINT TIMER

200 P=0:TIMER=0:A=0
210 PRINT:P=0
220 P=P+2:IF P>479 THEN 210
230 A=A+1:IF A>1000 THEN 250
240 GOTO 220
250 PRINT TIMER

Not very elegant, but it should do the job. Since I could not easily use a FOR/NEXT loop for the counter, I used A and a check in line 130 or 230 to exit the test.

This prints 771 for the first one, and 1414 for the second one.

This is not what I would have expected. I must be doing something wrong, because I agree with William that…

IF P>479 THEN PRINT:GOTO 210

…should be slower every time P is NOT greater than 479, compared to:

IF P>479 THEN 210

In the first example, each time P is not greater than 479, BASIC should still have to skip everything past then THEN looking for either ELSE or the end of the line. It should be scanning past a PRINT and GOTO token then the number 220.

In the second example, it should only have to skip the number 210.

I think I did something wrong.

What am I missing?

To be continued…

3X+1 in C#

For my day job, I do embedded C programming for PIC24 compilers and some Windows C programming in something called LabWindows. Lately, I’ve been touching some C# stuff, so I decided to revisit last night’s 3X+1 program by converting it to C#.

You can compile and run it online here: https://www.onlinegdb.com/online_csharp_compiler

// 3X+1

using System;
					
public class Program
{
	public static void Main()
	{
		while (true)
		{
			Int32 x = 0;

			Console.WriteLine();
			Console.Write("STARTING NUMBER? ");
			x = Int32.Parse(Console.ReadLine());
			
			while (true)
			{
				Console.Write(x);
				Console.Write(" ");
				
				if (x == 1) break;
				
				if ((x & 1) == 1) // Odd
				{
					x = x * 3 + 1;
				}
				else // Even
				{
					x = x / 2;
				}
			}
		}
	}
}

Arduino Serial output C macros

Here is a quickie.

In Arduino, instead of being able to use things like printf() and puchar(), console output is done by using the Serial library routines. It provides functions such as:

Serial.print();
Serial.println();
Serial.write();

These do not handle any character formatting like printf() does, but they can print strings, characters or numeric values in different formats. Where you might do something like:

int answer = 42;
printf("The answer is %d\r\n", answer);

…the Arduino version would need to be:

int answer = 42;
Serial.print("This answer is ");
Serial.print(answer);
Serial.println();

To handle printf-style formatting, you can us sprintf() to write the formatted string to a buffer, then use Serial.print() to output that. I found this blog post describing it.

I recently began porting my Arduino Telnet routine over to standard C to use on some PIC24 hardware I have at work. I decided I should revisit my Telnet code and try to make it portable, so the code could be built for Arduino or standard C. This would mean abstracting all console output, since printf() is not used on the Arduino.

I quickly came up with these Arduino-named macros I could use on C:

#include <stdio.h>
#include <stdlib.h>

#define SERIAL_PRINT(s)     printf(s)
#define SERIAL_PRINTLN(s)   printf(s"\r\n")
#define SERIAL_WRITE(c)     putchar(c)

int main()
{
    SERIAL_PRINT("1. This is a line");
    SERIAL_PRINTLN();
    SERIAL_PRINTLN();

    SERIAL_PRINTLN("2. This is a second line.");

    SERIAL_PRINT("3. This is a character:");
    SERIAL_WRITE('x');
    SERIAL_PRINTLN();

    SERIAL_PRINTLN("done.");

    return EXIT_SUCCESS;
}

Ignoring the Serial.begin() setup code that Arduino requires, this would let me replace console output in the program with these macros. For C, it would use the macros as defined above. For Arduino, it would be something like…

#define SERIAL_PRINT(s)     Serial.print(s)
#define SERIAL_PRINTLN(s)   Serial.println(s)
#define SERIAL_WRITE(c)     Serial.write(c)

By using output macros like that, my code would still look familiar to Arduino folks, but build on a standard C environment (for the most part).

This isn’t the most efficient way to do it, since Arduino code like this…

  Serial.print("[");
  Serial.print(val);
  Serial.println("]");

…would be one printf() in C:

printf ("[%d]\n", val);

But, if I wanted to keep code portable, C can certainly do three separate printf()s to do the same output as Arduino, so we code for the lowest level output.

One thing I don’t do, yet, is handle porting things like:

Serial.print(val, HEX);

On Arduino, that outputs the val variable in HEX. I’m not quite sure how I’d make a portable macro for that, unless I did something like:

#define SERIAL_PRINT_HEX(v) Serial.print(v, HEX)

#define SERIAL_PRINT_HEX(v) printf("%x, v)

That would let me do:

SERIAL_PRINT("[");
SERIAL_PRINT_HEX(val);
SERIAL_PRINTLN("]");

I expect to add more macros as-needed when I port code over. This may be less efficient, but it’s easier to make Arduino-style console output code work on C than the other way around.

Cheers…

C: (too) many happy returns…

Here’s another quick C thing…

One of the jobs I had used a pretty complete coding style guide for C. One of the things they insisted on was only one “return” in any function that returns values. For example:

int function(int x)
{
   if SOMETHING
   {
      return 100;
   }
   else SOMETHING ELSE
   {
      return 200;
   }
   else
   {
      return 0;
   }
}

The above function returns values 100, 200 or 0 based on the input (1, 2 or anything else). It has three different places where a value is returned. This saves code, compared to doing it like this:

int function(int x)
{
   int value;

   if SOMETHING
   {
      value = 100;
   }
   else if SOMETHING ELSE
   {
      value = 200;
   }
   else
   {
      value = 0;
   }

   return value;
}

Above, you see we use a variable, and then have three places where it could be set, and then we return that value in one spot at the end of the function. This probably generates larger code and would take longer to run than the top example.

But if you can afford those extra bytes and clock cycles, it is a much better way to do this — at least form a maintenance and debugging standpoint.

I have accepted this, but only today did I run in to a situation where this approach would have saved me some time and frustration. In my case, I was encounter a compiler warning about a function not returning a value where it was defined to return a value. I looked and confirmed the function was indeed returning a value. What was going on?

The problem was that it used multiple returns, and did something like this:

int function(int x)
{
   int value;

   if (!ValueIsValid(x)) return;

   if SOMETHING
   {
      value = 100;
   }
   else if >OMETHING ELSE
   {
      value = 200;
   }
   else
   {
      value = 0;
   }

   return value;
}

Somewhere in the program was a check that just did a “return” and the compiler was seeing that, but my eyes were looking at the lower portion of the program where a value was clearly being returned.

I am guessing the function originally did not return a value, and when a return value was added later, that initial “return;” was not corrected, leaving a compiler warning. This warning may have been in the code for a long time and was simply left alone because someone couldn’t figure it out (my situation) or wasn’t concerned about compiler warnings.

Today, the warning bugged me enough that I did a deep dive through the function, line-by-line, trying to figure out what was going on. And I found it. A simple correction could have been this:

int function(int x)
{
   int value;

   if (!ValueIsValid(x)) return 0; // FIXED: Add missing return value.

   if SOMETHING
   {
      value = 100;
   }
   else if SOMETHING ELSE
   {
      value = 200;
   }
   else
   {
      value = 0;
   }

   return value;
}

That resolved the compiler warning, but still left two spots where a value was returned, so I ended up doing something like this:

int function(int x)
{
   int value;

   if (ValueIsValid(x) == true) // do this if valid
   {
      if SOMETHING
      {
         value = 100;
      }
      else SOMETHING ELSE
      {
         value = 200;
      }
      else
      {
         value = 0;
      }
   }
   else // Not valid
   {
      value = 0;
   }

   return value;
}

Now there is only one place the function returns, and it only processeses things if the initial value appears valid.

I will sleep better at night.

I sleep on a soap box.