Category Archives: Programming

Tackling the Logiker 2022 Vintage Computing Christmas Challenge – part 5

See also: part 1, part 2, part 3, part 4, part 5, part 6 and part 7.

The challenge continues. From humble beginnings of using PRINT, to fancier methods of encoding the image as a series of spaces and asterisks, we eventually ended up with an even fancier method that used only 1/4 of the image data to represent the entire symmetrical image.

That approach could work for any image that is symmetrical vertically and horizontally, and typically general purpose routines are not as small as custom routines that know what they will be doing.

Knowing what we now know…

WIth that said, looking at this image, there is another shortcut that I missed:

The entire image is centered over one column… This means the amount of spaces on the left is unimportant — we just need to center the following lines:

*       *
**     **
***   ***
**** ****
*****************
***************
*************
***********
*********
***********
*************
***************
*****************
**** ****
***   ***
**     **
*       *

And, since we know it’s symmetrical both vertically and horizontally, we really only need to worry about this:

*    | <- 1 asterisk, 4 spaces
**   | <- 2 asterisks, 3 spaces
***  | <- 3 asterisks, 2 space
**** | <- 4 asterisks, 1 space
*****************| 17
***************| 15
*************| 13
***********| 11
*********| 9

I’m not sure what the pattern is as I type this, but I am expecting there is one. Here is a quick program that prints the rows of the shape using FOR/NEXT loops (uncentered):

0 ' LOGIKER13.BAS
10 FOR I=1 TO 4
20 PRINT STRING$(I,"*");STRING$(1+(4-I)*2," ");STRING$(I,"*")
30 NEXT
40 FOR I=17 TO 9 STEP-2
50 PRINT STRING$(I,"*")
60 NEXT
70 FOR I=11 TO 17 STEP 2
80 PRINT STRING$(I,"*")
90 NEXT
100 FOR I=4 TO 1 STEP-1
110 PRINT STRING$(I,"*");STRING$(1+(4-I)*2," ");STRING$(I,"*")
120 NEXT

If each of those lines were centered, we’d have our shape. Let’s try that by creating a string for the row, and then using the LEN() of that string to know how to center it using TAB().

0 ' LOGIKER14.BAS
10 FOR I=1 TO 4
20 A$=STRING$(I,"*")+STRING$(1+(4-I)*2," ")+STRING$(I,"*")
25 PRINT TAB(16-LEN(A$)/2);A$
30 NEXT
40 FOR I=17 TO 9 STEP-2
50 A$=STRING$(I,"*")
55 PRINT TAB(16-LEN(A$)/2);A$
60 NEXT
70 FOR I=11 TO 17 STEP 2
80 A$=STRING$(I,"*")
85 PRINT TAB(16-LEN(A$)/2);A$
90 NEXT
100 FOR I=4 TO 1 STEP-1
110 A$=STRING$(I,"*")+STRING$(1+(4-I)*2," ")+STRING$(I,"*")
115 PRINT TAB(16-LEN(A$)/2);A$
120 NEXT
130 GOTO 130

That produces our desired shape (though it does leave a blank line at the end, which our original version avoided by having a semi-colon on the PRINT and just breaking lines when we went to the next one).

The first thing I see it that the centering code on line 25, 55, 85 and 115 is the same. Subroutine!

0 ' LOGIKER15.BAS
10 FOR I=1 TO 4
20 A$=STRING$(I,"*")+STRING$(1+(4-I)*2," ")+STRING$(I,"*")
25 GOSUB 150
30 NEXT
40 FOR I=17 TO 9 STEP-2
50 A$=STRING$(I,"*")
55 GOSUB 150
60 NEXT
70 FOR I=11 TO 17 STEP 2
80 A$=STRING$(I,"*")
85 GOSUB 150
90 NEXT
100 FOR I=4 TO 1 STEP-1
110 A$=STRING$(I,"*")+STRING$(1+(4-I)*2," ")+STRING$(I,"*")
115 GOSUB 150
120 NEXT
130 GOTO 130
150 PRINT TAB(16-LEN(A$)/2);A$:RETURN

Next, we see that the string building code for the top and bottom are the same, so 20 and 110 are the same (it’s the value of I that changes how it prints), and then 50 and 80 are the same. Subroutines!

0 ' LOGIKER16.BAS
10 FOR I=1 TO 4
20 GOSUB 200
25 GOSUB 150
30 NEXT
40 FOR I=17 TO 9 STEP-2
50 GOSUB 250
55 GOSUB 150
60 NEXT
70 FOR I=11 TO 17 STEP 2
80 GOSUB 250
85 GOSUB 150
90 NEXT
100 FOR I=4 TO 1 STEP-1
110 GOSUB 200
115 GOSUB 150
120 NEXT
130 GOTO 130
150 PRINT TAB(16-LEN(A$)/2);A$:RETURN
200 A$=STRING$(I,"*")+STRING$(1+(4-I)*2," ")+STRING$(I,"*"):RETURN
250 A$=STRING$(I,"*"):RETURN

Next, I notice the subroutines of 200 and 250 both have the centering PRINT called after them, so maybe we change it up a bit…

0 ' LOGIKER17.BAS
10 FOR I=1 TO 4
20 GOSUB 200
30 NEXT
40 FOR I=17 TO 9 STEP-2
50 GOSUB 250
60 NEXT
70 FOR I=11 TO 17 STEP 2
80 GOSUB 250
90 NEXT
100 FOR I=4 TO 1 STEP-1
110 GOSUB 200
120 NEXT
130 GOTO 130
200 A$=STRING$(I,"*")+STRING$(1+(4-I)*2," ")+STRING$(I,"*"):GOTO 300
250 A$=STRING$(I,"*")
300 PRINT TAB(16-LEN(A$)/2);A$:RETURN

What else? The FOR/NEXT loops are basically all the same, except for the start and end value and the step value… Maybe we could come up with a way to have only one, and feed it those values using DATA statements?

10 FOR I=1 TO 4
...
40 FOR I=17 TO 9 STEP-2
...
70 FOR I=11 TO 17 STEP 2
...
100 FOR I=4 TO 1 STEP-1

500 DATA 1,4,1
510 DATA 17,9,-2
520 DATA 11,17,2
530 DATA 4,1,-1

If they all went to the same GOSUB routine this would be easy, but they don’t. The go 200, 250, 250, 200. We could add a fourth element in the DATA that tells it which routine to go to and “IF X=1 THEN GOSUB Y ELSE GOSUB Z” or something. That adds more code. Perhaps we don’t need the DATA since we know it alternates? Still, we’d have to track it ourselves with an IF or something. For now, let’s just try this:

0 ' LOGIKER18.BAS
10 FOR J=1 TO 4
20 READ A,B,C,D
30 FOR I=A TO B STEP C
40 IF D=0 THEN GOSUB 200 ELSE GOSUB 250
50 NEXT I
60 NEXT J
70 GOTO 70
200 A$=STRING$(I,"*")+STRING$(1+(4-I)*2," ")+STRING$(I,"*"):GOTO 300
250 A$=STRING$(I,"*")
300 PRINT TAB(16-LEN(A$)/2);A$:RETURN
500 DATA 1,4,1,0
510 DATA 17,9,-2,1
520 DATA 11,17,2,1
530 DATA 4,1,-1,0

And that still produces your original shape. But is it any smaller?

In part 4, we had a version that (using my default XRoar emulator running DISK EXTENDED COLOR BASIC) showed 22499 bytes free after loading. This new version shows 22567 bytes free. So yes, it is smaller! And, we can pack those lines and make it even smaller than that. (And NEXT doesn’t near the variable — in fact, using “NEXT I” is slower than just saying “NEXT” so I’ll remove those here as well.)

0 ' LOGIKER19.BAS
10 FOR J=1 TO 4:READ A,B,C,D:FOR I=A TO B STEP C:IF D=0 THEN GOSUB 200 ELSE GOSUB 250
50 NEXT:NEXT
70 GOTO 70
200 A$=STRING$(I,"*")+STRING$(1+(4-I)*2," ")+STRING$(I,"*"):GOTO 300
250 A$=STRING$(I,"*")
300 PRINT TAB(16-LEN(A$)/2);A$:RETURN:DATA 1,4,1,0,17,9,-2,1,11,17,2,1,4,1,-1,0

That version shows me 22609 free, which is even smaller — and we could still make this a bit smaller by getting rid of unnecessary spaces in the code.

Side note: I am being lazy and just showing the BASIC “PRINT MEM” values rather than calculating the actual size of the program. On my configuration, 22823 is how much memory is there on startup. So, 22823-22609 shows that this program is 214 bytes. It uses more memory for the strings when running, but I don’t think that matters for this challenge.

What else can we do to save a few bytes? Well, STRING$() takes two parameters. The first is the count of how many times to repeat the second parameter. The second parameter can be a quoted character like “*”, or a number like 42 (the ASCII value of the asterisk). 42 is one by smaller than “*” so we can do that as well as use 32 (the ASCII value for space) instead of ” “:

200 A$=STRING$(I,42)+STRING$(1+(4-I)*2,32)+STRING$(I,42):GOTO 300
250 A$=STRING$(I,42)

Another thing we know is that in the shape there are always the same number of spaces before the top and bottom sections, so we really don’t need to center it. We could just hard code a PRINT TAB for that instead of building a string and calling a center subroutine:

200 PRINT TAB(11);STRING$(I,42);STRING$(1+(4-I)*2,32);STRING$(I,42)
210 RETURN

The middle section is similar. Since we know the length, we could calculate how many spaces to tab using that number:

250 PRINT TAB(16-I/2);STRING$(I,42)
260 RETURN

And that removes a subroutine, leaving us with this (not line packed yet):

0 ' LOGIKER20.BAS
10 FOR J=1 TO 4
20 READ A,B,C,D
30 FOR I=A TO B STEP C
40 IF D=0 THEN GOSUB 200 ELSE GOSUB 250
50 NEXT
60 NEXT
70 GOTO 70
200 PRINT TAB(11);STRING$(I,42);STRING$(1+(4-I)*2,32);STRING$(I,42)
210 RETURN
250 PRINT TAB(16-I/2);STRING$(I,42)
260 RETURN
500 DATA 1,4,1,0
510 DATA 17,9,-2,1
520 DATA 11,17,2,1
530 DATA 4,1,-1,0

Two FOR/NEXT loops, a READ, an IF, and two PRINT subroutines.

Maybe we don’t need those subroutines, now that we have an “IF” in line 40 that decides what to do?

0 ' LOGIKER21.BAS
10 FOR J=1 TO 4
20 READ A,B,C,D
30 FOR I=A TO B STEP C
40 IF D=0 THEN PRINT TAB(11);STRING$(I,42);STRING$(1+(4-I)*2,32);STRING$(I,42) ELSE PRINT TAB(16-I/2);STRING$(I,42)
50 NEXT
60 NEXT
70 GOTO 70
500 DATA 1,4,1,0
510 DATA 17,9,-2,1
520 DATA 11,17,2,1
530 DATA 4,1,-1,0

That’s an ugly line 40, but it got rid of two GOSUBs and two RETURNS. Plus every line takes up 5 bytes, so by removing two of those (each subroutine must start on its own line) we saved even more.

Not only is this smaller, it’s faster, too, since it doesn’t have to spend time seeking through the program looking for a subroutine line number for each line it prints.

If we pack this version, removing unnecessary spaces and semicolons and such, and remove the “GOTO” loop (the challenge does not require the screen to be cleared, and says it can return to a prompt after running), we get:

0 ' LOGIKER22.BAS (don't include this line)

1 FORJ=1TO4:READA,B,C,D:FORI=A TOB STEPC:IFD=0THENPRINTTAB(11)STRING$(I,42)STRING$(1+(4-I)*2,32)STRING$(I,42)ELSEPRINTTAB(16-I/2)STRING$(I,42)
2 NEXT:NEXT:DATA1,4,1,0,17,9,-2,1,11,17,2,1,4,1,-1,0

151 bytes of BASIC code!

Is there more that can be done? Comment with your ideas! I think I’m out of them, here.

Until next time…

Tackling the Logiker 2022 Vintage Computing Christmas Challenge – part 4

See also: part 1, part 2, part 3, part 4, part 5, part 6 and part 7.

The story so far…

In the beginning Logiker offered up a programming challenge to produce this simple image:

From its humble beginnings as a bunch of PRINT statements, to a DATA driven output routine, we have now ended up with code that can print the top half of the symmetrical image by only having data to describe one quarter of the image.

When we last left off, I had code that would read this data, print the left half, then print the right half, mirroring the same data. It worked well, but only printed the top half of the image.

I showed that code to OpenAI and asked how it would accomplish the task of reversing the data to print the bottom portion:

To make the program print the rows in reverse order after it has finished printing all of the rows, you will need to make some changes to the code. One way to do this would be to add another loop after the main loop that reads the data from the DATA statements and prints the rows in reverse order. You can use the “REVERSE” statement in BASIC to reverse the order of the rows in the DATA statements, and then use the same code as in the main loop to read the data and print the rows. This will cause the program to print the rows in the original order, and then again in reverse order.

– chat.openai.com

I’ve never heard of the REVERSE keyword, and the CoCo’s BASIC doesn’t have it. But if I did have it, this task would be pretty easy. I informed the A.I. that I had no such command, and it said:

If your BASIC does not have a REVERSE command, you can still print the rows in reverse order by using a different approach. One way to do this would be to store the data from the DATA statements in an array, and then loop through the array in reverse order to print the rows. In BASIC, you can use the “DIM” statement to create an array, and the “READ” statement to read data from the DATA statements into the array. You can then use a loop to iterate through the array in reverse order, using the same code as in the main loop to print the rows. This will cause the program to print the rows in the original order, and then again in reverse order.

– chat.openai.com

That’s the approach I was thinking of taking, so… okay!

Side Note: I did ask the A.I. to modify my program to do this, and it gave me something that would have been a good starting point for the rest of this article. I could have then discussed things with the A.I. a bit more and encouraged it to evolve the program until it met the requirements. (One issue with BASIC is that the A.I. inevitably tries to do things from other dialects of BASIC that Color BASIC does not have, such as the REVERSE command.) I hope to demonstrate one of these co-programming sessions in a future article series, provided access to ChatGPT remains available to me.

I changed my routine so it loaded all the DATA values in to a dimension. Since I knew how many entries there would be (27), I could eliminate the need for having a -1 to mark the end of data.

Then, in the PRINT routine, instead READing A, B and C and looking at them, I’d use the D() dimension variable as D(x), D(x+1) and D(x+2) to get to the same three values.

At the end of each line (three numbers, or two numbers followed by a zero), I would increment my index by three so it starts at the next set of numbers in the array.

I used a new variable, M, to indicate the mode (forward or reverse). When it reaches the end of the data (D goes past the 27 entries), it sets that variable to 1, and other code now runs decrementing the index by 3 each time until it reaches the top (0).

Here is my new code:

0 ' LOGIKER11.BAS
10 CLS:DIM D(26)
11 FOR I=0 TO 26:READ D(I):NEXT:D=0:M=0
15 PRINTTAB(6);
20 PRINT STRING$(D(D),32);
25 IF D(D+2)=0 THEN PRINT STRING$(D(D+1)*2-1,42);STRING$(D(D),32);:GOTO 40
30 PRINT STRING$(D(D+1),42);STRING$(D(D+2)*2-1,32);STRING$(D(D+1),42);
40 IF M=0 THEN D=D+3:IF D<27 THEN PRINT:GOTO 15 ELSE M=1
41 IF D>0 THEN PRINT:D=D-3:GOTO 15
50 DATA 5,1,4
60 DATA 5,2,3
70 DATA 5,3,2
80 DATA 5,4,1
90 DATA 1,9,0
100 DATA 2,8,0
110 DATA 3,7,0
120 DATA 4,6,0
130 DATA 5,5,0
220 GOTO 220

OpenAI describes it as follows:

This program is similar to the previous ones, but it uses a slightly different approach to read the data from the DATA statements and print the rows. The program starts by defining an array D to store the data from the DATA statements, and then reads the data into the array. The program then enters a loop that prints the rows of the pattern. In each iteration of the loop, the program prints a row of the pattern by printing a string of spaces, followed by a string of asterisks, and then another string of spaces. The number of spaces and asterisks in each string is determined by the data in the array.

After the program has printed all of the rows in the original order, it enters another loop that prints the rows in reverse order. In each iteration of the loop, the program prints the same row as in the previous loop, but in reverse order. This causes the program to print the rows of the pattern in the original order, and then again in reverse order. However, without more information about the data in the DATA statements, it is not possible to say exactly what the output of the program would look like.

– chat.openai.com

Nicely done, A.I.

When I run this program, I get the desired output!

It is, of course, slower than the original brute-force PRINT version. But, we are going for size. And speaking of size, lets try to pack this down as much as possible and see where we stand:

0 ' LOGIKER12.BAS
10 CLS:DIM D(26):FOR I=0 TO 26:READ D(I):NEXT:D=0:M=0
15 PRINTTAB(6);STRING$(D(D),32);:IF D(D+2)=0 THEN PRINT STRING$(D(D+1)*2-1,42);STRING$(D(D),32);:GOTO 40
30 PRINT STRING$(D(D+1),42);STRING$(D(D+2)*2-1,32);STRING$(D(D+1),42);
40 IF M=0 THEN D=D+3:IF D<27 THEN PRINT:GOTO 15 ELSE M=1
41 IF D>0 THEN PRINT:D=D-3:GOTO 15
50 GOTO 50:DATA 5,1,4,5,2,3,5,3,2,5,4,1,1,9,0,2,8,0,3,7,0,4,6,0,5,5,0

Loading the first version gives me 22431 bytes free. Loading the second gives me 22499 – about 68 bytes smaller! Compare that to the original brute-force PRINT version (which was 22309 free), we have saved 190 bytes so far.

And, we could remove spaces, get rid of the REM comment at the start, and save even more.

But is that enough for the challenge? One thing the challenge says it you are not required to clear the screen, and you can return to an OK/Ready prompt. That means I could remove the CLS and the GOTO loop at the end, saving even more.

But saving even more is not enough. I think there’s a few more things we can do, especially now that I understand what it takes to draw this shape.

Until next time, take a look at what I have done in LOGIKER11.BAS and see what suggestions you can come up with.

Tackling the Logiker 2022 Vintage Computing Christmas Challenge – part 3

See also: part 1, part 2, part 3, part 4, part 5, part 6 and part 7.

So far, we’ve taken a brute force PRINT program and turned it in to a less-brute force program that did the same thing using DATA statements:

0 ' LOGIKER8.BAS
10 CLS
15 CH=32:PRINTTAB(6);
20 READ A:IF A=-1 THEN 220
25 IF A=0 THEN PRINT:GOTO 15
30 PRINT STRING$(A,CH);
35 IF CH=32 THEN CH=42 ELSE CH=32
40 GOTO 20
50 DATA 5,1,7,1,0
60 DATA 5,2,5,2,0
70 DATA 5,3,3,3,0
80 DATA 5,4,1,4,0
90 DATA 1,17,0
100 DATA 2,15,0
110 DATA 3,13,0
120 DATA 4,11,0
130 DATA 5,9,0
140 DATA 4,11,0
150 DATA 3,13,0
160 DATA 2,15,0
170 DATA 1,17,0
180 DATA 5,4,1,4,0
190 DATA 5,3,3,3,0
200 DATA 5,2,5,2,0
210 DATA 5,1,7,1,0
215 DATA -1
220 GOTO 220

All of this in an effort to try to print out this image:

While there are still many BASIC optimizations we could do (removing spaces, combining lines even further, renumbering by 1, etc.), those would apply to any version of the code we create. Instead of doing that, let’s look at some other ways we can represent this data.

Simpsons Atari 2600 did it first.

Let no meme go to waste, I always say.

When the Atari VCS came out in 1977 (you younguns may only know it as the 2600, but it didn’t get that name until 1982 — five years after its release later), it required clever tricks to make games run in only 1K or 2K of ROM and with just 128 bytes of RAM.

The game Adventure was quite the challenge, since it had multiple screens representing different mazes, castles and areas.

Atari’s Greatest Hits on an old iPad

Each screen was represented by only 21 bytes of ROM! If you follow that link, you can read more about my efforts to understand how this worked. Here is an example of how the castle room was represented:

;Castle Definition                                                                                                 
CastleDef:
  .byte $F0,$FE,$15 ;XXXXXXXXXXX X X X      R R R RRRRRRRRRRR                                      
  .byte $30,$03,$1F ;XX        XXXXXXX      RRRRRRR        RR                                      
  .byte $30,$03,$FF ;XX        XXXXXXXXXXRRRRRRRRRR        RR                                      
  .byte $30,$00,$FF ;XX          XXXXXXXXRRRRRRRR          RR                                      
  .byte $30,$00,$3F ;XX          XXXXXX    RRRRRR          RR                                      
  .byte $30,$00,$00 ;XX                                    RR                                      
  .byte $F0,$FF,$0F ;XXXXXXXXXXXXXX            RRRRRRRRRRRRRR   

There are three bytes to represent each line. Three bytes would only be able to represent 24 pixels (8 bits per byte), and the ASCII art shows the screen width is actually 40. Those three bytes cannot represent the entire row of pixels.

In fact, 4-bits of that isn’t used. Each set of three bytes represents halfa row (20 bits out of the 24 the three bytes represent). Look at the first entry:

  .byte $F0,$FE,$15 ;XXXXXXXXXXX X X X      R R R RRRRRRRRRRR   

If you turn those bytes into binary, you get this pattern:

 byte 1  byte 2  byte 3|
--------========--------
111100001111111000010101

The Atari drew the first 8-bits from least significant bit to most. the second 8-bits from most significant to least, then the third from least significant to most. That makes it look like this, matching the ASCII art (skipping the unused 4-bits):

000011111111111010101000
    XXXXXXXXXXX X X X   

To represent a full screen, the Atari had a trick that would mirror or duplicate the other half of the screen. In the case of the castle, the right side was a mirror image. In the case of certain mazes, the data was duplicated.

Looking at our image here, since it is symmetrical, we could certainly use the same trick and only store half of the image.

+----------------+
|           *    |
|           **   |
|           ***  |
|           **** |
|       *********|
|        ********|
|         *******|
|          ******|
|           *****|
|          ******|
|         *******|
|        ********|
|       *********|
|           **** |
|           ***  |
|           **   |
|           *    |
+----------------+

Also, since the top and bottom are also mirror images, we could mirror those, too, and get away with only storing 1/4 of the image:

+----------------+
|           *    |
|           **   |
|           ***  |
|           **** |
|       *********|
|        ********|
|         *******|
|          ******|
|           *****|
+----------------+

Since the image is 17×17 (an odd number, so there is a halfway row and column), we’d actually need to just draw to that halfway row/column, then reverse back through the data.

We should be able to take our existing data and crop it down from this, which represents the full image:

50 DATA 5,1,7,1,0
60 DATA 5,2,5,2,0
70 DATA 5,3,3,3,0
80 DATA 5,4,1,4,0
90 DATA 1,17,0
100 DATA 2,15,0
110 DATA 3,13,0
120 DATA 4,11,0
130 DATA 5,9,0
140 DATA 4,11,0
150 DATA 3,13,0
160 DATA 2,15,0
170 DATA 1,17,0
180 DATA 5,4,1,4,0
190 DATA 5,3,3,3,0
200 DATA 5,2,5,2,0
210 DATA 5,1,7,1,0

…to this, which represents the top left quarter-ish of the image:

50 DATA 5,1,4,0   '     X    '
60 DATA 5,2,3,0   '     XX   '
70 DATA 5,3,2,0   '     XXX  '
80 DATA 5,4,1,0   '     XXXX '
90 DATA 1,9,0     ' XXXXXXXXX'
100 DATA 2,8,0    '  XXXXXXXX'
110 DATA 3,7,0    '   XXXXXXX'
120 DATA 4,6,0    '    XXXXXX'
130 DATA 5,5,0    '     XXXXX'

That represents all the data up to the center row/column, and that seems to be a considerable savings in code space (removing eight lines).

But how do we draw that forward, then in reverse? There is no way to back up when using the READ command, so we’d have to remember what we just did. For a general purpose “compress 1-bit image” routine it would be more complex, but since we know the image we are going to produce, we can make an assumption:

  1. The image never has more than three transitions (space, asterisk, space) in a line.
  2. No line entry has more than 4 numbers total.

Knowing that, we could simply save up to three numbers in variables, so we would print them out A B C and then C B A. We won’t even need the zeros now, since we can read A,B,C and act on them (stopping if C is 0).

Neat!

A quick bit of trial and error gave me this code that will print the top half of the image:

0 ' LOGIKER10.BAS
10 CLS
15 CH=32:PRINTTAB(6);
20 READ A:IF A=-1 THEN 220
21 PRINT STRING$(A,32);
22 READ B,C
25 IF C=0 THEN PRINT STRING$(B*2-1,42);STRING$(A,32):GOTO 15
30 PRINT STRING$(B,42);STRING$(C*2-1,32);STRING$(B,42)
40 GOTO 15
50 DATA 5,1,4
60 DATA 5,2,3
70 DATA 5,3,2
80 DATA 5,4,1
90 DATA 1,9,0
100 DATA 2,8,0
110 DATA 3,7,0
120 DATA 4,6,0
130 DATA 5,5,0
215 DATA -1
220 GOTO 220

It creates this:

I can now say “we are halfway there!”

But now I have another issue to solve. How do I back up? There is no way to READ data in reverse. It looks like I’m going to need to load all those numbers in to memory so I can reverse back through them.

To be continued…

Tackling the Logiker 2022 Vintage Computing Christmas Challenge – part 2

See also: part 1, part 2, part 3, part 4, part 5, part 6 and part 7.

The design is one row taller than will fit on the CoCo’s 32×16 text screen, but it would easily fit on the 40 or 80 column screen of the CoCo 3. For this article, I am going to stick with the standard text screen and just let it scroll one row off the top of the screen. When I have something figured out, it might only require modifying the centering code to display on the 40/80 column screen.

Let there be code!

At this stage, the design is being centered using the TAB command. Putting a “TAB(7)” at the start of each string takes up 3 bytes of programming space. It seems “TAB(” is tokenized, then there is the 3 character, followed by the “)” character. I had thought using PRINT@ might save some space, but the “@” takes a byte, then the screen position numbers follow it, and a comma is required. “PRINT@7,” takes the same amount of code space as “PRINTTAB(7)” so no savings there.

The biggest savings is going to come from eliminating the repeated use of the “* characters in the strings. Since the entire image is made up of spaces or asterisks, it could be represented by data that says how many spaces then how many asterisks then how many spaces, etc.

Here is what the image looks like centered to 32-columns:

+--------------------------------+
|           *       *            |
|           **     **            |
|           ***   ***            |
|           **** ****            |
|       *****************        |
|        ***************         |
|         *************          |
|          ***********           |
|           *********            |
|          ***********           |
|         *************          |
|        ***************         |
|       *****************        |
|           **** ****            |
|           ***   ***            |
|           **     **            |
|           *       *            |
+--------------------------------+

The first line has 11 spaces, 1 asterisks, 7 spaces, then 1 asterisk.

The second line has 11 spaces, 1 asterisks, 5 spaces, then 2 asterisks.

And so on… The first line could be represented in DATA to tell how many repeating groups of spaces and asterisks there are such as 11,1,7,1,0 (using 0, to indicate the end of the line).

For something this small, I could have just counted up the spaces and asterisks myself, but it’s more fun to spend ten minutes writing a BASIC program to do what would have taken me 60 seconds to do manually.

I converted the pattern in to DATA statements and created a program that would scan each line looking for repeated characters. When the character changes, it prints how many of the previous repeating characters it saw and resets the counter.

0 ' LOGIKER5.BAS 
1 ' LC$=LAST CHAR
 2 ' DC=DUPLICATE CHAR COUNT
 10 READ A$:IF A$="END" THEN END
 11 LC$=LEFT$(A$,1):DC=1
 12 FOR I=2 TO LEN(A$)
 13 IF MID$(A$,I,1)=LC$ THEN DC=DC+1 ELSE PRINT DC",";:DC=1:LC$=MID$(A$,I,1)
 14 NEXT
 15 PRINT DC", 0"
 16 LC$=""
 17 GOTO 10
 50 DATA"    *       *"
 60 DATA"    **     **"
 70 DATA"    ***   ***"
 80 DATA"    **** ****"
 90 DATA"*****************"
100 DATA" ***************"
110 DATA"  *************"
120 DATA"   ***********"
130 DATA"    *********"
140 DATA"   ***********"
150 DATA"  *************"
160 DATA" ***************"
170 DATA"*****************"
180 DATA"    **** ****"
190 DATA"    ***   ***"
200 DATA"    **     **"
210 DATA"    *       *"
220 DATA "END"

Rather than explain how the program works, like I often try to do, I just submitted this code to the OpenAI ChatGPT system and let it do the work for me:

This BASIC program reads in a series of strings, one at a time, and outputs the number of times each character appears in the string.

Here is a brief explanation of the code:

  • Lines 10-16: This is the main loop of the program, where each string is read in, processed, and output.
  • Line 11: This line initializes the LC$ variable to the first character of the string and the DC variable to 1.
  • Lines 12-14: This is a FOR loop that iterates through each character of the string.
  • Line 13: This line compares the current character with the last character (stored in LC$). If they are the same, it increments the DC variable. If they are different, it outputs the value of DC and then resets DC to 1 and LC$ to the current character.
  • Line 15: This line outputs the value of DC after the loop has finished processing the string.
  • Lines 50-220: These are the data lines that contain the strings to be processed.
  • Line 220: This is the sentinel value that indicates the end of the data.

Note that this code uses a number of built-in BASIC functions, such as LEFT$, MID$, and LEN, which may not be available in all versions of BASIC. You may need to modify the code to use equivalent functionality if it is not available in the version of BASIC you are using.

– chat.openai.com

Seriously, folks. You have to go play with ChatGPT. It’s amazingly fun.

OpenAI ChatGPT Digression

After showing this program to ChatGPT, I asked “is there a way to do it better?” It responded with:

There are a few ways that this routine could be improved. Here are a few suggestions:

  1. Use a DO loop instead of a GOTO statement to control the main loop. This would make the code easier to read and maintain.
  2. Use a SELECT CASE statement to handle the different cases in line 13 instead of an IF statement. This would make the code more concise and easier to read.
  3. Use a DICTIONARY or MAP data structure to store the counts of each character instead of using separate variables for each character. This would make the code more efficient and allow it to handle an arbitrary number of characters without having to hard-code variables for each one.
  4. Add error-checking to the code to handle cases where the input data is invalid (e.g. empty strings, strings with invalid characters, etc.). This would make the code more robust and prevent it from crashing or producing incorrect results in these cases.

Here is an example of how the code could be modified to incorporate some of these changes:

– chat.openai.com

The code it provided was not something that would work on ancient Microsoft BASIC, but it was interesting to see:

DO
  READ A$
  IF A$="END" THEN EXIT DO

  ' Initialize the character counts to 0
  CHAR_COUNTS = DICTIONARY()

  ' Loop through each character of the string
  FOR i = 1 TO LEN(A$)
    ch = MID$(A$, i, 1)
    IF ch = " " THEN CONTINUE ' Skip whitespace

    ' Increment the count for this character
    IF CHAR_COUNTS.EXISTS(ch) THEN
      CHAR_COUNTS[ch] = CHAR_COUNTS[ch] + 1
    ELSE
      CHAR_COUNTS[ch] = 1
    END IF
  NEXT

  ' Output the counts for each character
  FOR ch = "A" TO "Z"
    PRINT CHAR_COUNTS[ch]; ",";
  NEXT
  PRINT "0"
LOOP

50 DATA"    *       *"
60 DATA"    **     **"
70 DATA"    ***   ***"
80 DATA"    **** ****"
90 DATA"*****************"
100 DATA" ***************"
110 DATA"  *************"
120 DATA"   ***********"
130 DATA"    *********"
140 DATA"   ***********"
150 DATA"  *************"
160 DATA" ***************"
170 DATA"*****************"
180 DATA"    **** ****"
190 DATA"    ***   ***"
200 DATA"    **     **"
210 DATA"    *       *"
220 DATA "END"

I have no idea what version of BASIC that code would work on, and the program it wrote doesn’t really look like it’s doing the same thing, but still neat. In a future article (provided they continue to give us access to this AI), I plan to share some of the interesting things it can do with BASIC programs (including writing them from scratch).

But I digress…

Driving Miss Data

When I run the program, it starts printing out rows of numbers. Since there are seventeen rows of the graphic, it prints out seventeen lines of numbers. I needed to pause the program mid-printing so I can write down the numbers before they scroll off. (A fancier way would have been to have the program write out an ASCII BASIC program with the line numbers and DATA statements already in it which I could load back later. I think I showed how I do that in an earlier article somewhere on this site.)

Now that we have numbers for our DATA statements, a new version of the program could be written to generate the image using them. I will make use of the STRING$ that creates a string of a specific character of a specific length…

0 ' LOGIKER6.BAS
10 CLS
15 CH=32:PRINTTAB(7);
20 READ A:IF A=-1 THEN 220
25 IF A=0 THEN PRINT:GOTO 15
30 PRINT STRING$(A,CH);
35 IF CH=32 THEN CH=42 ELSE CH=32
40 GOTO 20
50 DATA 4,1,7,1,0
60 DATA 4,2,5,2,0
70 DATA 4,3,3,3,0
80 DATA 4,4,1,4,0
90 DATA 17,0
100 DATA 1,15,0
110 DATA 2,13,0
120 DATA 3,11,0
130 DATA 4,9,0
140 DATA 3,11,0
150 DATA 2,13,0
160 DATA 1,15,0
170 DATA 17,0
180 DATA 4,4,1,4,0
190 DATA 4,3,3,3,0
200 DATA 4,2,5,2,0
210 DATA 4,1,7,1,0
215 DATA -1
220 GOTO 220

Obviously those data statements could be combined in to fewer lines, but for this version I wanted them to match the same line number the original PRINT was on. You can easily compare the results:

50 DATA"    *       *"
50 DATA 4,1,7,1,0
 
60 DATA"    **     **"
60 DATA 4,2,5,2,0

70 DATA"    ***   ***"
70 DATA 4,3,3,3,0

80 DATA"    **** ****"
80 DATA 4,4,1,4,0

Before I show you the results, can you see the flaw in my program?

I’ll give you a hint… Line 170.

Close but no cigar

My program assumes each line starts with a space, so the first value will be printed as spaces, then the next value as asterisks, and so on. This causes a problem when it gets to the row that is entirely the asterisk it reads the first number and prints it as spaces, giving me this incorrect result:

I can think of several ways to solve this:

  1. Use a different value other than 0 for “end-of-line” and make 0 mean “nothing to print, just switch to the astrisk”. That would change line 170 to be “DATA 0,17,X” (where “X” is the new end-of-line marker. This would probably require a new bit of IF logic to handle.
  2. Make each group of data two bytes that specifies the character to print, and how many. Printing 17 asterisks would be “17,42”. Printing four spaces would be “4,32”. This would make the program logic simpler, but would double the size of the data. Depending on how much smaller the logic is, this might be a winner. (And I can think of optimizations to that as well, such as using 0 and 1 for the data to print and just printing “32+X*10” so it prints either 32 (if the value is 0) or 42 (if the value is 1). This is normally how I would have started, but I was trying to make the data as small as possible.
  3. I could just encode the leading spaces at the start of each line rather than using TAB(7). By doing this, every line would start with a space. This would work for this specific challenge, but not be flexible for patterns that don’t start with a space.

For now, let’s make a quick change and try #3 by simply adding 7 to the first number in each DATA statement, and adding a 7 to line 170 which is the row that doesn’t have a space at the start. I think it would look like this:

0 ' LOGIKER7.BAS
10 CLS
15 CH=32
20 READ A:IF A=-1 THEN 220
25 IF A=0 THEN PRINT:GOTO 15
30 PRINT STRING$(A,CH);
35 IF CH=32 THEN CH=42 ELSE CH=32
40 GOTO 20
50 DATA 11,1,7,1,0
60 DATA 11,2,5,2,0
70 DATA 11,3,3,3,0
80 DATA 11,4,1,4,0
90 DATA 7,17,0
100 DATA 8,15,0
110 DATA 9,13,0
120 DATA 10,11,0
130 DATA 11,9,0
140 DATA 10,11,0
150 DATA 9,13,0
160 DATA 8,15,0
170 DATA 7,17,0
180 DATA 11,4,1,4,0
190 DATA 11,3,3,3,0
200 DATA 11,2,5,2,0
210 DATA 11,1,7,1,0
215 DATA -1
220 GOTO 220

Running this program produces the desired results! But, it has a drawback:

The data size grew. Not only did we add “7,” (two bytes) to line 170, but eleven other lines went from a 1-digit value to a 2-digit value. This means our data grew by 13 bytes. If we saved 13 bytes in the decoding routine, this is a win. If we did not, it is not an acceptable fix.

When I load the previous version of the program in to the XRoar emulator and PRINT MEM, it shows 22425 free. When I do the same with this version, I get 22416 — less memory free, so a larger program. This is bad, but the previous version is still missing the code to handle that line 170.

Perhaps, instead of adding 7 to each line to center on the screen, each line could just add 1 (so it doesn’t create two-digit values) and we can use TAB(6). That would look like this:

0 ' LOGIKER8.BAS
10 CLS
15 CH=32:PRINTTAB(6);
20 READ A:IF A=-1 THEN 220
25 IF A=0 THEN PRINT:GOTO 15
30 PRINT STRING$(A,CH);
35 IF CH=32 THEN CH=42 ELSE CH=32
40 GOTO 20
50 DATA 5,1,7,1,0
60 DATA 5,2,5,2,0
70 DATA 5,3,3,3,0
80 DATA 5,4,1,4,0
90 DATA 1,17,0
100 DATA 2,15,0
110 DATA 3,13,0
120 DATA 4,11,0
130 DATA 5,9,0
140 DATA 4,11,0
150 DATA 3,13,0
160 DATA 2,15,0
170 DATA 1,17,0
180 DATA 5,4,1,4,0
190 DATA 5,3,3,3,0
200 DATA 5,2,5,2,0
210 DATA 5,1,7,1,0
215 DATA -1
220 GOTO 220

Doing a PRINT MEM on that one shows 22421, so it is four bytes larger than the original, and still smaller than the “add 7” version. Perhaps that is good enough for now?

Combining all the lines to make a smaller program would look like this:

0 ' LOGIKER9.BAS
10 CLS
15 CH=32:PRINTTAB(6);
20 READ A:IF A=-1 THEN 220 ELSE IF A=0 THEN PRINT:GOTO 15
30 PRINT STRING$(A,CH);:IF CH=32 THEN CH=42 ELSE CH=32
40 GOTO 20
50 DATA 5,1,7,1,0,5,2,5,2,0,5,3,3,3,0,5,4,1,4,0,1,17,0,2,15,0,3,13,0,4,11,0,5,9,0,4,11,0,3,13,0,2,15,0,1,17,0,5,4,1,4,0,5,3,3,3,0,5,2,5,2,0,5,1,7,1,0,-1
220 GOTO 220

Better! But we can make it more better.

In the next installment, we will do something that I learned from studying the Atari 2600’s Adventure program…

To be continued…

My first C program for CoCo DISK BASIC.

On this day in history … I built the CMOC compiler and compiled my first C program for non-OS-9 CoCO.

I created this source file:

int main()
{
	char *ptr = 1024;
	while (ptr < 1536) *ptr++ = 128;
	return 0;
}

I compiled it using “cmoc hello.c” and it produces “hello.bin”.

I created a new blank disk image using “decb dskini C.DSK”.

I copied the binary to that disk image using “decb copy hello.bin C.DSK,HELLO.BIN -2”

I booted up the XRoar emulator and mounted that disk image as the first drive.

I did LOADM”HELLO” and then EXEC.

And so it begins…

Tackling the Logiker 2022 Vintage Computing Christmas Challenge – part 1

See also: part 1, part 2, part 3, part 4, part 5, part 6 and part 7.

Here we go again! Over in the Facebook Color Computer group, David M. shared a link to this year’s Vintage Computing Christmas Challenge from Logiker. Although I did not submit an entry, I did play with last year’s challenge on my TRS-80 Color Computer.

Last year, it was this:

This year, the challenge is a bit more challenging. Per the challenge website, here is sample code for Commodore:

 10 print"{clear}"
 20 print""
 30 print""
 40 print""
 50 print"               *       *"
 60 print"               **     **"
 70 print"               ***   ***"
 80 print"               **** ****"
 90 print"           *****************"
100 print"            ***************"
110 print"             *************"
120 print"              ***********"
130 print"               *********"
140 print"              ***********"
150 print"             *************"
160 print"            ***************"
170 print"           *****************"
180 print"               **** ****"
190 print"               ***   ***"
200 print"               **     **"
210 print"               *       *"
220 goto 220

Starting with that un-optimized version, I will change it to work on the CoCo 1/2/3’s 32-column screen by adjusting it to be properly centered on that display.

 10 CLS
 50 PRINT"           *       *"
 60 PRINT"           **     **"
 70 PRINT"           ***   ***"
 80 PRINT"           **** ****"
 90 PRINT"       *****************"
100 PRINT"        ***************"
110 PRINT"         *************"
120 PRINT"          ***********"
130 PRINT"           *********"
140 PRINT"          ***********"
150 PRINT"         *************"
160 PRINT"        ***************"
170 PRINT"       *****************"
180 PRINT"           **** ****"
190 PRINT"           ***   ***"
200 PRINT"           **     **"
210 PRINT"           *       *"
220 GOTO 220

Unfortunately, this design is 17 rows tall, and the CoCo’s standard display is only 16. It won’t fit:

We should still be able to enter the challenge by having the program print this pattern, even if it scrolls off the screen a bit. To get one extra line there, we can get rid of the line feed at the end of the final PRINT statement in line 210 by adding a semi-colon to the end:

210 PRINT"           *       *";

And so it begins…

And so it begins

The goal is to make this as small as possible. There were many ways to approach last year’s Christmas tree challenge, and you can read about the results and a follow-up with suggestions folks gave to save a byte or two.

A simple thing is to remove the spaces at the front and replace them with the TAB() command:

 10 CLS
 50 PRINTTAB(7)"    *       *"
 60 PRINTTAB(7)"    **     **"
 70 PRINTTAB(7)"    ***   ***"
 80 PRINTTAB(7)"    **** ****"
 90 PRINTTAB(7)"*****************"
100 PRINTTAB(7)" ***************"
110 PRINTTAB(7)"  *************"
120 PRINTTAB(7)"   ***********"
130 PRINTTAB(7)"    *********"
140 PRINTTAB(7)"   ***********"
150 PRINTTAB(7)"  *************"
160 PRINTTAB(7)" ***************"
170 PRINTTAB(7)"*****************"
180 PRINTTAB(7)"    **** ****"
190 PRINTTAB(7)"    ***   ***"
200 PRINTTAB(7)"    **     **"
210 PRINTTAB(7)"    *       *";
220 GOTO 220

Although this only looks like it saves a character per line (“TAB(8)” versus “seven spaces”), the code itself will be smaller since the TAB command tokenizes down to one (or maybe two?) bytes.

Also, the ending quote is not needed if it’s the last thing on a line, so they could be removed:

 50 PRINTTAB(7)"    *       *
 60 PRINTTAB(7)"    **     **
 70 PRINTTAB(7)"    ***   ***

That would save one byte per line.

But, each line number consumes 5-bytes on it’s own, so a better way to save space would be to pack the lines together. Each line you eliminate saves five bytes. That would become pretty unreadable though, but let’s do it anyway:

10 CLS:PRINTTAB(7)"    *       *":PRINTTAB(7)"    **     **":PRINTTAB(7)"    ***   ***":PRINTTAB(7)"    **** ****":PRINTTAB(7)"*****************":PRINTTAB(7)" ***************":PRINTTAB(7)"  *************":PRINTTAB(7)"   ***********"
130 PRINTTAB(7)"    *********":PRINTTAB(7)"   ***********":PRINTTAB(7)"  *************":PRINTTAB(7)" ***************":PRINTTAB(7)"*****************":PRINTTAB(7)"    **** ****":PRINTTAB(7)"    ***   ***":PRINTTAB(7)"    **     **"
210 PRINTTAB(7)"    *       *";
220 GOTO 220

That’s quite the unreadable mess!

This could still be made better, since the text lines were kept under the input buffer limitation size, but when you enter that line, BASIC compresses it (tokenizes keywords like PRINT, TAB and GOTO) making it take less space. You can then sometimes EDIT the line, Xtend to the end and type a few more characters.

That may or may not be allowed for the Logiker challenge. And since I want to provide code here you could copy and then load in to an emulator, I’ll keep it to the limit of what you could type in.

In the next installment, I’ll see if my brane can figure out a way to generate this code using program logic rather than brute-force PRINT statements.

Until then…

Reversing bits in C

In my day job, we have a device that needs data sent to it with the bits reversed. For example, if we were sending an 8-bit value of 128, that bit pattern is 10000000. The device expects the high bit first so we’d send it 00000001.

In one system, we do an 8-bit bit reversal using a lookup table. I suppose that one needed it to be really fast.

In another (using a faster PIC24 chip with more RAM, flash and CPU speed), we do it with a simple C routine that was easy to understand.

I suppose this breaks down to four main approaches to take:

  • Smallest Code Size – for when ROM/flash is at a premium, even if the code is a confusingf mess.
  • Smallest Memory Usage – for when RAM is at a premium, even if the code is a confusing mess.
  • Fastest – for when speed is the most important thing, even if the code is a confusing mess.
  • Clean Code – easiest to understand and maintain, for when you don’t want code to be a confusing mess.

In our system, which is made up of multiple independent boards with their own CPUs and firmware, we do indeed have some places where code size is most important (because we are out of room), and other places where speed is most important.

When I noticed we did it two different ways, I wondered if there might be even more approaches we could consider.

I did a quick search on “fastest way to reverse bits in C” and found a variety of resources, and wanted to point out this fun one:

https://graphics.stanford.edu/~seander/bithacks.html#BitReverseObvious

At that section of this lengthy article are a number of methods to reverse bits. Two of them make use of systems that support 64-bit math and do it with just one line of C code (though I honestly have no understanding of how they work).

Just in case you ever need to do this, I hope this pointer is useful to you.

Happy reading!

Color BASIC and string concatenation

The C programming language has a few standard library functions that deal with strings, namely strcpy() (string copy) and strcat() (string concatenate).

Microsoft BASIC has similar string manipulation features built in to the language. For example, to copy an 8 character “string” in to a buffer (array of chars):

char buffer1[80];

strcpy(buffer1, "12345678");

printf("%s\n", buffer1);

In BASIC, you do not need to allocate space for individual strings. Color BASIC allows doing whatever you want with a string provided it is 255 characters or less, and provided the total string space is large enough. By default, Color BASIC reserves 200 bytes for string storage. If you wanted to strcpy() “12345678” to a string variable, you would just do:

BUFFER1$="12345678"

Yes, that’s legal, but Color BASIC only recognizes the first two characters of a string name, so in reality, that is just like doing:

BU$="12345678"

If you need more than the default 200 bytes, the CLEAR command will reserve more string storage. For example, “CLEAR 500” or “CLEAR 10000”.

“CLEAR 500” would let you have five 100 character strings, or 500 one character strings.

And, keep in mind, strings stored in PROGRAM MEMORY do not use this space. For example, if you reduced string space to only 9 bytes, then tried to make a 10 byte string direct from BASIC:

CLEAR 9
A$="1234567890"
?OS ERROR

The wonderful “?OS ERROR” (Out of String Space).

BUT, if strings are declared inside PROGRAM data, BASIC references them from within your program instead of string memory:

5 CLEAR 9
10 A$="1234567890"
20 B$="1234567890"
30 C$="1234567890"
40 D$="1234567890"
50 E$="1234567890"

Yes, that actually works. If you would like to know more, please see my String Theory series.

But I digress…

The other common C string function is strcat(), which appends a string at the end of another:

char buffer1[80];

strcpy(buffer1, "12345678");
strcat(buffer1, "plus this");

printf("%s\n", buffer1);

That code would COPY “12345678” in to the buffer1 memory, then concatenate “plus this” to the end of it, printing out “12345678plus this”.

In BASIC, string concatenation is done by adding the strings together, such as:

A$="12345678"
A$=A$+"plus this"

PRINT A$

Color BASIC allows for strings to be up to 255 characters long, and no more:

The wonderful “?LS ERROR” (Length of String).

Make it Bigger!

In something I am writing, I started out with an 8 character string, and wanted to duplicate it until it was 64 characters long. I did it like this:

10 A$="12345678"
20 A$=A$+A$+A$+A$+A$+A$+A$+A$

In C, if you tried to strcat() a buffer on top of the buffer, it would not work like that.

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

int main(int argc, char **argv)
{
    char buffer1[80];
    
    // buffer1$ = "12345678"
    strcpy(buffer1, "12345678");
    // Result: buffer1="12345678"
    printf ("%s\n", buffer1);
    
    // buffer1$ = buffer1$ + buffer1$
    strcat(buffer1, buffer1);
    // Result: buffer1$="1234567812345678"
    printf ("%s\n", buffer1);

    // buffer1$ = buffer1$ + buffer1$    
    strcat(buffer1, buffer1);
    // Result: buffer1$="12345678123456781234567812345678"
    printf ("%s\n", buffer1);

    return EXIT_SUCCESS;
}

As you can see, each strcat() copies all of the previous buffer to the end of the buffer, doubling the size each time.

The same thing happens if you do it step by step in BASIC:

A$="12345678"
REM RESULT: A$="12345678"

A$=A$+A$
REM RESULT: A$="1234567812345678"

A$=A$+A$
REM RESULT: A$="12345678123456781234567812345678"

But, saying “A$=A$+A$+A$+A$” is not the same as saying “A$=A$+A$ followed by A$=A$+A$”

For example, if you add two strings three times, you get a doubling of string size at each step:

5 CLEAR 500
10 A$="12345678"
20 A$=A$+A$
30 A$=A$+A$
40 A$=A$+A$
50 PRINT A$

Above creates a 64 character string (8 added to 8 to make 16, then 16 added to 16 to make 32, then 32 added to 32 to make 64).

BUT, if you had done the six adds on one line:

5 CLEAR 500
10 A$="12345678"
20 A$=A$+A$ + A$+A$ + A$+A$
50 PRINT A$

…you would get a 48 character string (8 characters, added 6 times).

In C, using strcat(buffer, buffer) with the same buffer has a doubling effect each time, just like A$=A$+A$ does in BASIC each time.

And, adding a bunch of strings together like…

A$=A$+A$+A$+A$+A$+A$+A$+A$ '64 characters

…could also be done in three doubling steps like this:

A$=A$+A$:A$=A$+A$:A$=A$+A$ ' 64 characters

Two different ways to concatenate strings together to make a longer string. Which one should we use?

Must … Benchmark …

In my code, I just added my 8 character A$ up 8 times to make a 64 character string. Then I started thinking about it. And here we go…

Using my standard benchmark program, we will declare an 8 character string then add it together 8 times to make a 64 character string. Over and over and over, and time the results.

0 REM strcat1.BAS
5 DIM TE,TM,B,A,TT
10 FORA=0TO3:TIMER=0:TM=TIMER
20 FORB=0TO1000
30 A$="12345678"
40 A$=A$+A$+A$+A$+A$+A$+A$+A$
70 NEXT
80 TE=TIMER-TM:PRINTA,TE
90 TT=TT+TE:NEXT:PRINTTT/A:END

This produces an average of 1235.

Now we switch to the doubling three times approach:

0 REM strcat2.BAS
5 DIM TE,TM,B,A,TT
10 FORA=0TO3:TIMER=0:TM=TIMER
20 FORB=0TO1000
30 A$="12345678"
40 A$=A$+A$:A$=A$+A$:A$=A$+A$
70 NEXT
80 TE=TIMER-TM:PRINTA,TE
90 TT=TT+TE:NEXT:PRINTTT/A:END

This drops the time down to 888!

Adding separately three times versus add together eight times is a significant speed improvement.

Based on what I learned when exploring string theory (and being shocked when I realized how MID$, LEFT$ and RIGHT$ worked), I believe every time you do a string add, there is a new string created:

“A$=A$+A$+A$+A$+A$+A$+A$+A$” creates eight strings along the way.

“A$=A$+A$:A$=A$+A$:A$=A$+A$” creates six.

No wonder it is faster.

Looks like I need to go rewrite my experiment.

Until next time…

Checksums and zeros and XMODEM and randomness.

A year or two ago, I ran across some C code at my day that finally got me to do an experiment…

When I was first using a modem to dial in to BBSes, it was strictly a text-only interface. No pictures. No downloads. Just messages. (Heck a physical bulletin board at least would let you put pictures on it! Maybe whoever came up with the term BBS was just forward thinking?)

The first program I ever had that sent a program over the modem was DFT (direct file transfer). It was magic.

Later, I got one that used a protocol known as XMODEM. It seems like warp speed compared to DFT!

XMODEM would send a series of bytes, followed by a checksum of those bytes, then the other end would calculate a checksum over the received bytes and compare. If they matched, it went on to the next series of bytes… If it did not, it would resend those bytes.

Very simple. And, believe it or not, checksums are still being used by modern programmers today, even though newer methods have been created (such as CRC).

Checking the sum…

A checksum is simple the value you get when you add up all the bytes of some data. Checksum values are normally not floating point, so they will be limited to a fixed range. For example, an 8-bit checksum (using one byte) can hold a value of 0 to 255. A 16-bit checksum (2 bytes) can hold a value of 0-65535. Since checksums can be much higher values, especially if using an 8-bit checksum, the value just rolls over.

For example, if the current checksum calculated value is 250 for an 8-bit checksum, and the next byte being counted is a 10, the checksum would be 250+10, but that exceeds what a byte can hold. The value just rolls over, like this:

250 + 10: 251, 252, 253, 254, 255, 0, 1, 2, 3, 4

Thus, the checksum after adding that 10 is now 4.

Here is a simple 8-bit checksum routine for strings in Color BASIC:

0 REM CHKSUM8.BAS
10 INPUT "STRING";A$
20 GOSUB 100
30 PRINT "CHECKSUM IS";CK
40 GOTO 10

100 REM 8-BIT CHECKSUM ON A$
110 CK=0
120 FOR A=1 TO LEN(A$)
130 CK=CK+ASC(MID$(A$,A,1))
140 IF CK>255 THEN CK=CK-255
150 NEXT
160 RETURN

Line 140 is what handles the rollover. If we had a checksum of 250 and the next byte was a 10, it would be 260. That line would detect it, and subtract 255, making it 4. (The value starts at 0.)

The goal of a checksum is to verify data and make sure it hasn’t been corrupted. You send the data and checksum. The received passes the data through a checksum routine, then compares what it calculated with the checksum that was sent with the message. If they do not match, the data has something wrong with it. If they do match, the data is less likely to have something wrong with it.

Double checking the sum.

One of the problems with just adding (summing) up the data bytes is that two swapped bytes would still create the same checksum. For example “HELLO” would have the same checksum as “HLLEO”. Same bytes. Same values added. Same checksum.

A good 8-bit checksum.

However, if one byte got changed, the checksum would catch that.

A bad 8-bit checksum.

It would be quite a coincidence if two data bytes got swapped during transfer, but I still wouldn’t use a checksum on anything where lives were at stake if it processed a bad message because the checksum didn’t catch it ;-)

Another problem is that if the value rolls over, that means a long message or a short message could cause the same checksum. In the case of an 8-bit checksum, and data bytes that range from 0-255, you could have a 255 byte followed by a 1 byte and that would roll over to 0. A checksum of no data would also be 0. Not good.

Checking the sum: Extreme edition

A 16-bit or 32-bit checksum would just be a larger number, reducing how often it could roll over.

For a 16-bit value, ranging from 0-65535, you could hold up to 257 bytes of value 255 before it would roll over:

255 * 257 = 65535

But if the data were 258 bytes of value 255, it would roll over:

255 * 258 = 65790 -> rollover to 255.

Thus, a 258-byte message of all 255s would have the same checksum as a 1-byte message of a 255.

To update the Color BASIC program for 16-bit checksum, change line 140 to be:

140 IF CK>65535 THEN CK=CK-65535

Conclusion

Obviously, an 8-bit checksum is rather useless, but if a checksum is all you can do, at least use a 16-bit checksum. If you were using the checksum for data packets larger than 257 bytes, maybe a 48-bit checksum would be better.

Or just use a CRC. They are much better and catch things like bytes being out of order.

But I have no idea how I’d write one in BASIC.

One more thing…

I almost forgot what prompted me to write this. I found some code that would flag an error if the checksum value was 0. When I first saw that, I thought “but 0 can be a valid checksum!”

For example, if there was enough data bytes that caused the value to roll over from 65535 to 0, that would be a valid checksum. To avoid any large data causing value to add up to 0 and be flagged bad, I added a small check for the 16-bit checksum validation code:

if ((checksum == 0) && (datasize < 258)) // Don't bother doing this.
{
    // checksum appears invalid.
}
else if (checksum != dataChecksum)
{
    // checksum did not match.
}
else
{
    // guess it must be okay, then! Maybe...
}

But, what about a buffer full of 00s? The checksum would also be zero, which would be valid.

Conclusion: Don’t error check for a 0 checksum.

Better yet, use something better than a checksum…

Until next time…