Category Archives: Retro Computing

Inverse/reverse text on the CoCo

The later model CoCo 2s used an upgrade MC6847 video display generator chip. This chip provided an updated font with true lowercase, as well as the ability to change the border color and inverse the video.

The original 6847 had a hardware pin that could inverse video, and I recall taking the VDG out of my CoCo 1 to pull up a pin then re-insert it. That gave me the nice inverted display I often use in the XRoar emulator:

In the nearly-forty years since I did that modification I had forgotten that it could only be done in hardware. I thought there was probably a POKE or something, since many programs I had used the inverted video mode.

I couldn’t figure it out, and the bits that made the updated 6847T1 show inverse video did not work on the original 6847.

So I asked on the CoCo mailing list…

I wrote a program to go through the 6847 T1 VDG text modes, which includes lowercase, changing the border, and inverting the screen.

I’ve seen programs on the CoCo (like GRABBER) that invert/reverse the screen in software, but the bits to do that on the T1 don’t seem to do anything on the 6847.

Does anyone know of a reference that would show me how to do this? Thanks, much!
– Me, 2/15/2023

L. Curtis Boyle replied:

On the regular VDG, there is no reverse video bit. So you have to use the inverted characters in the character set (ie POKE 1024+loc,0-63).
– L. Curtis Boyle, 2/15/2023

It was Tim Lindner that reminded me of the hardware pin:

There is an inverse video pin on the chip. But it is not hooked up. I think.
– tim lindner, 2/15/2023

With that in mind, I decided to do some quick tests. This test PRINTs every possible visible character on the screen:

10 CLS:PRINT
20 FOR C=32 TO 255:PRINT CHR$(C);:NEXT
30 GOTO 30

I skipped the first line and I started printing with CHR$(32) (a space) because the values below that are non-visible characters.

BUT, if you POKE to the screen, you get more characters. Here is an update that will POKE the values 0-255 to the bottom half of the screen:

10 CLS:PRINT
20 FOR C=32 TO 255:PRINT CHR$(C);:NEXT
30 FOR C=0 TO 255:POKE 1280+C,C:NEXT
40 GOTO 40

Comparing the PRINT versus POKE characters, you can see PRINT has 64 ASCII-printable characters starting at 32-96. After that are 32 inverted characters. There are 32 characters that you cannot print in inverse.

When you POKE, values 0-63 represent the inverted full ASCII character set, and 64-127 are the uppercase.

 PRINT   POKE
------- -------
32-96   64-127  Space to Left Arrow
96-127  0-31    Inverse "@" to Inverse Left Arrow
        32-63   Inverse Space to Inverse "Left Arrow"?"

From that table, you can see it’s impossible to PRINT thirty two of the available inverse characters in the 6847, but you can POKE them.

I thought it might be fun to write a routine that would PEEK through the screen memory and invert all the characters. Since there are 32 non-inverted characters from 32-96, and 32 inverted characters from 96-127, it seems all I needed to do is PEEK each location and if it was from 96-127, subtract 64 and POKE it back.

We start with this slow program, complete with some benchmarking timing code:

10 ' FILL SCREEN WITH STUFF
20 CLS:FOR C=32 TO 255:PRINT CHR$(C);:NEXT
30 TIMER=0
40 FOR L=1024 TO 1536
50 V=PEEK(L):IF V>63 AND V<128 THEN POKE L,V-64
60 NEXT
70 PRINT TIMER

This prints 631 for me. One of the first speedups we can do is change the decimal numbers in line 50 to hex.

10 ' FILL SCREEN WITH STUFF
20 CLS:FOR C=32 TO 255:PRINT CHR$(C);:NEXT
30 TIMER=0
40 FOR L=1024 TO 1536
50 V=PEEK(L):IF V>&H3F AND V<&H80 THEN POKE L,V-&H40
60 NEXT
70 PRINT TIMER

That simple change takes the time down to 490. And, since variable lookups (when there aren’t a ton of variables) is faster than parsing a hex number in source code, we can change those values to variables and make it even faster:

10 ' FILL SCREEN WITH STUFF
20 CLS:FOR C=32 TO 255:PRINT CHR$(C);:NEXT
25 X=&H3F:Y=&H80:Z=&H40
30 TIMER=0
40 FOR L=1024 TO 1536
50 V=PEEK(L):IF V>X AND V<Y THEN POKE L,V-Z
60 NEXT
70 PRINT TIMER

This drops to 475.

And, any time you use “IF this AND that”, it can be sped up by doing “IF this THEN IF that”. Let’s try that…

10 ' FILL SCREEN WITH STUFF
20 CLS:FOR C=32 TO 255:PRINT CHR$(C);:NEXT
25 X=&H3F:Y=&H80:Z=&H40
30 TIMER=0
40 FOR L=1024 TO 1536
50 V=PEEK(L):IF V>X THEN IF V<Y THEN POKE L,V-Z
60 NEXT
70 PRINT TIMER

In this case, it really did not change anything — I see 474. Okay, that was a fail. I guess it doesn’t always help.

Since the math we are doing is a subtract 64, we should see if we can do that by using AND to remove the bit that represents 64. We don’t want to do that if it is a graphics character (128-255) so we’ll need to retain on IF:

10 ' FILL SCREEN WITH STUFF
20 CLS:FOR C=32 TO 255:PRINT CHR$(C);:NEXT
25 X=&H3F:Y=&H80:Z=&H40
30 TIMER=0
40 FOR L=1024 TO 1536

50 V=PEEK(L):IF V<Y THEN POKE L,V AND &H3F
60 NEXT
70 PRINT TIMER

That shows 367 for me. Better!

I did test to see if checking for the 8th bit (V AND 128) was faster than just comparing a value (A > 127) and the “>” check was faster, so we’ll stick with that.

At this point, the routine to inverse the screen while leaving graphics characters alone is getting close to twice as fast as the original version.

Can we do better? At least a bit, by removing spaces and combining lines (and even though it only parses the FOR values once, I’ll switch them to hex):

10 ' FILL SCREEN WITH STUFF
20 CLS:FOR C=32 TO 255:PRINT CHR$(C);:NEXT
25 X=&H3F:Y=&H80
30 TIMER=0
40 FORL=&H400 TO&H5FF:V=PEEK(L):IFV<Y THENPOKEL,V AND&H3F
50 NEXT
70 PRINT TIMER

That prints 356. (There were a few spaces that could not be removed due to needing them so the BASIC parser knows where a variable ends and a keyword begins.)

Now we could make a subroutine that would invert the screen, slowly.

1000 ' INVERT SCREEN
1010 X=&H3F:Y=&H80:FORL=&H400 TO&H5FF:V=PEEK(L):IFV<Y THENPOKEL,V AND&H3F
1020 NEXT:RETURN

While it might be a neat “effect” for a title screen to watch it painting the screen, it’s not fast enough to use in a program that you want the output to always be inverted.

Some thoughts…

If no semigraphics characters were being used, the check for them could be removed:

10 ' FILL SCREEN WITH STUFF
20 CLS:FOR C=32 TO 127:PRINT CHR$(C);:NEXT
25 X=&H3F:Y=&H80
30 TIMER=0
40 FORL=&H400 TO&H5FF:V=PEEK(L):POKEL,V AND&H3F
50 NEXT
70 PRINT TIMER

That prints 304 — twice as fast as the original, though it will corrupt semigraphics blocks by changing them.

Another approach might be to only do the portion of the screen that has been PRINTed to by changing the range that the L loop scans. For example, you could call a routine to invert just the last line of the screen every time something is PRINTed there.

Or, perhaps a custom “print reverse” routine might make more sense. We’d also want a special CLS routine that cleared the screen to inverse spaces.

10 CLS
20 GOSUB 1100
30 P=0:A$="This is in REVERSE video!":GOSUB 1000
40 P=32:A$="And this one is, too.":GOSUB 1000
50 P=480:A$="Hello, bottom line!":GOSUB 1000

999 GOTO 999

1000 ' PRINT@ REVERSE
1010 L=&H3FF+P:FOR I=1 TO LEN(A$)
1020 V=ASC(MID$(A$,I,1)):IF V>95 THEN V=V-96 ELSE IF V>64 THEN V=V-64
1030 POKE L+I,V:NEXT:RETURN

1100 ' CLS REVERSE
1110 P=0:FOR L=&H400 TO &H5FF:POKE L,&H20:NEXT:RETURN

To use the function set P to the PRINT@ location, assign the string to A$, then GOSUB 1000. Not much to it.

Handling scrolling the screen would be more work (and slower), but could also be done. Perhaps the routine could track the PRINT@ position internally, so every time you print it increases P by 32 (next line) and if it gets to the end of the screen, it could scroll everything up… slowly…

10 CLS
20 GOSUB 1200
30 A$="This is in REVERSE video!":GOSUB 1000
40 A$="And this one is, too.":GOSUB 1000
50 A$="PRINT;":GOSUB 1100
60 A$="NEXT TEXT":GOSUB 1100
70 P=480:A$="BOTTOM LINE":GOSUB 1000
80 A$="AND THIS":GOSUB 1000

999 GOTO 999

1000 ' PRINT@ REVERSE
1010 GOSUB 1100:P=P+32-I+1:IF P<512 THEN 1050
1020 ' SCROLL
1030 FOR L=&H400 TO &H5E0:POKE L,PEEK(L+&H20):NEXT
1040 FOR L=&H5E0 TO &H5FF:POKE L,&H20:NEXT:P=P-32
1050 RETURN

1100 ' PRINT@; REVERSE
1110 L=&H3FF+P:FOR I=1 TO LEN(A$)
1120 V=ASC(MID$(A$,I,1)):IF V>95 THEN V=V-96 ELSE IF V>64 THEN V=V-64
1130 POKE L+I,V:NEXT:P=P+I-1
1140 RETURN

1200 ' CLS REVERSE
1210 P=0:FOR L=&H400 TO &H5FF:POKE L,&H20:NEXT:RETURN

To clear the screen, GOSUB 1200.

To do a normal PRINT, set the string in A$ and GOSUB 1000.

To do a PRINT with a semicolon at the end, set the string in A$ and GOSUB 1100.

To simulate a PRINT@, set P to the screen position then GOSUB 1000 or 1100.

This program does NOT handle printing past position 512 at the bottom right of the screen.

IT CAN POKE PAST SCREEN MEMORY AND CAUSE A CRASH!

A simple check could be done before line 1130 to see if L+I is going to be past the end of screen memory (1536) and adjust accordingly.

If there is any interest (leave a comment), we can look at how we might do that in a future article.

Is this useful?

If we really wanted to use inverse video in BASIC, I guess it would be useful. Can you make it faster in BASIC? What would you suggest? Leave a comment.

But, doing inverse text would be a simple thing for some assembly language routines that could be called easily from BASIC.

OR, perhaps there would be a simple way to just hook in to the output vector used by Color BASIC and do it there.

To be continue… Maybe.

Optimizing BASIC for speed versus size

3 Replies

Normally, I would say that “optimizing” a program could mean two things:

Making the program as fast as possible (even if it was larger)
Making the program as small as possible (even if it was slower)

There is always a compromise.

Loop Unrolling

In BASIC, this will print 100 times:

10 FOR I=1 TO 100
20 PRINT I
30 NEXT

But this might be a faster way to print 100 times:

10 PRINT 1
20 PRINT 2
30 PRINT 3
...
1000 PRINT 1000

This is a poor example because PRINTing a variable is generally faster than printing a number due to the BASIC interpreter needing to parse all the TEXT digits of the number and convert it to a string to PRINT. To my surprise, when I ran this test, I found they were basically the same speed. The overhead of processing a line and converting a string of digits seemed to offset the time saved of the loop that was printing a variable that doesn’t need parsing. (FOR/NEXT loop fast, PRINTing variable fast versus a bunch of lines that might be faster but have slower parsing for the text digits.)

However, if the series of individual PRINT lines were printing a variable (“PRINT X”) it would be more than twice as fast.

10 PRINT X
20 PRINT X
...
1000 PRINT X

Apples to apples (difficult to do in this example), blasting out a ton of lines is faster than doing them in a loop. I’ve heard assembly language programmers refer to this as “loop unrolling” and it can work in BASIC as well.

A better example might be how you draw a background screen. Consider a 32 column text screen that is cleared to black, and a border is printed around it:

Here is some code that would do this:

10 CLS 0
20 PRINT CHR$(128);STRING$(30,207);CHR$(128);
30 FOR I=1 TO 14
40 PRINT CHR$(207);STRING$(30," ");CHR$(207);
50 NEXT
60 PRINT CHR$(128);STRING$(30,207);
70 GOTO 70

Or, you could do it using loop unrolling and change that loop of 14 steps to be 14 PRINTS:

10 CLS 0
20 PRINT CHR$(128);STRING$(30,207);CHR$(128);
30 PRINT CHR$(207);STRING$(30," ");CHR$(207);
40 PRINT CHR$(207);STRING$(30," ");CHR$(207);
50 PRINT CHR$(207);STRING$(30," ");CHR$(207);
60 PRINT CHR$(207);STRING$(30," ");CHR$(207);
70 PRINT CHR$(207);STRING$(30," ");CHR$(207);
80 PRINT CHR$(207);STRING$(30," ");CHR$(207);
90 PRINT CHR$(207);STRING$(30," ");CHR$(207);
100 PRINT CHR$(207);STRING$(30," ");CHR$(207);
110 PRINT CHR$(207);STRING$(30," ");CHR$(207);
120 PRINT CHR$(207);STRING$(30," ");CHR$(207);
130 PRINT CHR$(207);STRING$(30," ");CHR$(207);
140 PRINT CHR$(207);STRING$(30," ");CHR$(207);
150 PRINT CHR$(207);STRING$(30," ");CHR$(207);
160 PRINT CHR$(207);STRING$(30," ");CHR$(207);
170 PRINT CHR$(128);STRING$(30,207);
180 GOTO 180

When I time those two, there is a teeny speed improvement in the second version (around 1/60th of a second). Is that enough to justify the overhead of the extra memory needed? Probably not in this example, but it might be if it was in code that was redrawing a screen for a game or something.

Side Note: Of course, we could speed this up by pre-generating the strings and the PRINTing them. That would avoid all the parsing of CHR$() and the numbers and building temporary strings over and over and over.

10 CLS 0
20 PRINT CHR$(128);STRING$(30,207);CHR$(128);
25 L$=CHR$(207)+STRING$(30," ")+CHR$(207);
30 FOR I=1 TO 14
40 PRINT L$;
50 NEXT
60 PRINT CHR$(128);STRING$(30,207);
70 GOTO 70

That is about 40% faster than the original, and it could be made even faster, but that’s not the point currently.

Inlining

Another way to speed things up is eliminate as many GOTOs and GOSUBs as you can. Every GO has to search either forward, line by line, looking for the target line (if going to a line after the current one), or start at the very first line and search forward (if starting at an earlier line).

A 10,000 line program will find it quite slow to be on line 10000 and type GOTO 9999. Likewise, line 1 saying “GOTO 10000” will be quite slow.

If there is enough memory, inlining subroutine code will speed things up every time the code is used. Consider this subroutine from my old *ALL RAM* BBS:

15 BR$="*==============*==============*"
...
25 A$="Welcome To *ALL RAM* BBS!":GOSUB1055
...
50 'New User
55 A$="Password Application Form":GOSUB1055
...
100 'Main Menu
105 A$="*ALL RAM* BBS Master Menu":GOSUB1055
...
150 'Call Sysop
155 A$="Calling the Sysop":GOSUB1055
...
200 'Goodbye
205 A$="Thank you for calling":GOSUB1055
...
250 'Userlog
255 A$="List of Users":GOSUB1055
...
1050 'Function Border
1055 CLS:PRINTBR$:PRINTTAB((32-LEN(A$))/2)A$:PRINTBR$:RETURN

The routine at line 1050 takes a string in A$. It will clear the screen, print a border string, print the A$ centered, then print the border string again and return.

The time it takes to run this code should be the same no matter where it appears in the program, but the time it takes to get to this code will vary depending on where it is called from. If you were calling it from line 1040, it would be quite fast. If you were calling it from line 20, it has to scan forward through every line from 20-1050 to find it, which would be slower. At least, because it is a GOSUB routine, it will return quickly. (That’s a big advantage of using GOSUB over GOTO.)

This routine could be inlined in to the code, and each use would be slightly faster.

15 BR$="*==============*==============*"
...
25 A$="Welcome To *ALL RAM* BBS!"
26 CLS:PRINTBR$:PRINTTAB((32-LEN(A$))/2)A$:PRINTBR$
...
50 'New User
55 A$="Password Application Form"
56 CLS:PRINTBR$:PRINTTAB((32-LEN(A$))/2)A$:PRINTBR$
...
100 'Main Menu
105 A$="*ALL RAM* BBS Master Menu"
106 CLS:PRINTBR$:PRINTTAB((32-LEN(A$))/2)A$:PRINTBR$
...
150 'Call Sysop
155 A$="Calling the Sysop"
156 CLS:PRINTBR$:PRINTTAB((32-LEN(A$))/2)A$:PRINTBR$
...
200 'Goodbye
205 A$="Thank you for calling"
206 CLS:PRINTBR$:PRINTTAB((32-LEN(A$))/2)A$:PRINTBR$
...
250 'Userlog
255 A$="List of Users"
256 CLS:PRINTBR$:PRINTTAB((32-LEN(A$))/2)A$:PRINTBR$
...

Again, this is a pretty poor example, but if this routine was something that required speed (a game, animation, etc.) every little bit would help.

Other time-saving techniques

I have covered many other optimizations to code, such as using “&H” hex numbers rather than decimal and using “.” instead of 0, and all of these things can combine to make a dramatically faster program — at the expense of code size. (A=9 takes three bytes, while A=&H9 is faster but takes four bytes.)

But if we were back in 1980 and trying to program on a 4K CoCo, perhaps we would have more things to worry about than speed.

Size matters

On that 1980 4K CoCo, a “PRINT MEM” on startup shows 2343 bytes free for a program. If you didn’t plan to use any strings or string functions, a CLEAR 0 would give back 200 bytes for a total of 2534 bytes free. You probably won’t be doing any loop unrolling or inlining in this environment.

Instead, focusing on the smallest way to do something makes more sense.

Subroutine everything!

If any bit of code is used more than once, it may make sense to make it a subroutine as long as the overhead of having to add “GOSUB xxx” and “RETURN” do not take more than the duplicate code would.

10 X=0:Y=0:P=(Y*32)+X:PRINT@P,"HELLO"

Above, some X and Y coorindates (representing the CoCo’s 32×16 text screen) are converted to a PRINT@ screen position (0-511). This is the type of code that might appear many places where something is printed at the screen. This makes it a good candidate for being a subroutine:

10 X=0:Y=0:GOSUB 1000:PRINT@P,"HELLO"
...
1000 P=(Y*32)+X:RETURN

Each time “P=(Y*32)+X” is used, that takes 10 bytes. The overhead for “GOSUB 1000” is 8 bytes, then the subroutine itself adds 5 bytes for the line overhead and 2 bytes for “:RETURN”. That’s 15 bytes extra we just added, but if it were used more than a few times, it starts saving memory. (See note below about line numbers and how you can make this save even more.)

And, if the routine was using X/Y to a P position to print something, you might as well put that in the subroutine as well and eliminate the P variable completely:

10 X=0:Y=0:A$="HELLO":GOSUB 1000
...
1000 PRINT@(Y*32)+X,A$:RETURN

With just a bit of reworking code, many bytes can be saved.

But that’s not the best example of saving bytes, because “GOSUB 1000” takes up a lot of memory. Let’s look at how to make it take up less.

Small line numbers save bytes

When a program is tokenized (keywords like PRINT replaced with one and two byte tokens representing the command), the line number is changed in to a two-byte representation of the line. It doesn’t matter if you are at line 5 or line 55000, the line number will take up two bytes of the tokenized line.

But, when you have a line number in the line itself, such as as the target of GOTO, GOSUB if an IF/THEN, those digits will be stored in full just as you typed them:

10 GOTO 10
20 GOTO 100
30 GOTO 1000
40 GOTO 10000

Above, line 20 will take up one more byte than line 10, because “100” takes one more byte than “10”. Because of this, using shorter line numbers will save space. Instead of this:

10 PRINT "ENTER YOUR NAME";:GOSUB 10000
20 PRINT "PHONE NUMBER";:GOSUB 10000
30 PRINT "FAVORITE COLOR";:GOSUB 10000
...
1000 INPUT A$:RETURN

…you could move the subroutine to the top and use single digit line numbers like this:

0 GOTO 10
1 INPUT A$:RETURN
...
10 PRINT "ENTER YOUR NAME";:GOSUB 1
20 PRINT "PHONE NUMBER";:GOSUB 1
30 PRINT "FAVORITE COLOR";:GOSUB 1
...

In this example, you have immediately added extra overhead by having line 0, but as long as you save enough to offset that, the program will be smaller. In this example, the three “GOSUB 10000” were turned in to “GOSUB 1” saving four bytes every time they were used.

As a bonus, having the subroutines at the start of the program will make them faster, since every GOSUB will be to a higher number, and BASIC will just start at the first line and scan forward, finding the subroutine much quicker. Smaller and faster!

And, of course, don’t forget to number your program by 1 — or, if using Extended Color BASIC with the RENUM command, you can do this:

RENUM newline, startline, increment
Renumbers a program.
newline is the first new renumbered line. If you omit newline, BASIC uses 10.
startline is where the renumbering starts. If you omit start/ine, BASIC renumbers the entire program.
increment is the increment between each renumberedline. If you omit increment, BASIC uses 10.
Note: RENUM does not rearrange the order of lines.
– Getting Started With Extended Color BASIC (1984 edition), page 58

RENUM 0,0,1

That will renumber the program to start at line 0, beginning at line line 0, and incrementing by 1s. If you started with:

100 PRINT
200 PRINT
300 PRINT
400 PRINT

…a RENUM 0,0,1 will give you:

0 PRINT
1 PRINT
2 PRINT
3 PRINT

You may save a few bytes every time a GOTO/GOSUB or IF/THEN is used.

Side Note: I believe it was Alex “Basic Utils” Evans who recently pointed out to me that GOTO and GOSUB each used two bytes. They are actually three token words — GO, TO and SUB. You can even write them separated with a space like “GO TO 100” or “GO SUB 100“. I hadn’t seen that since I was first learning BASIC back in 1982 or so, in some books a schoolmate was loaning me.

Avoid gratuitous use of lines

Every line in Color BASIC has five bytes of overhead. If you typed in a line that was just:

10 A

…you would see memory goes down by six bytes. Five bytes for the line overhead, and one byte for the “A”.

Side Note: Those five bytes are a 2-byte address of the next line, a 2-byte line number, and a trailing 0 terminator at the end of the line. In the above example, the bytes would look like “[xx][xx][yy][yy][A][0]” for that line.

Because of this, writing things out on separate lines consumes more memory, and is also slower since BASIC has more line number data to parse (included when doing a GOTO/GOSUB).

10 PRINT
20 PRINT
30 PRINT
40 PRINT
50 PRINT

…will be 15 bytes larger than doing:

10 PRINT:PRINT:PRINT:PRINT:PRINT

The first example would be tokenized as:

[xx][xx][yy][yy][print_token][0] <- 6 bytes
[xx][xx][yy][yy][print_token][0] <- 6 bytes
[xx][xx][yy][yy][print_token][0] <- 6 bytes
[xx][xx][yy][yy][print_token][0] <- 6 bytes
[xx][xx][yy][yy][print_token][0] <- 6 bytes (30 bytes total)

…versus this for the second example:

[xx][xx][yy][yy][print_token][:][print_token][:][print_token][:][print_token][:][print_token][:][0] <- 15 bytes total

In Color BASIC, the input buffer will allow you to type up to 249 characters for one line, so you can really pack a lot of commands together and save space (and speed).

Side Note: In Extended BASIC, the EDIT command can be used to get a few extra characters on a line. If you type a long line that features BASIC keywords up to the max of 249 characters, then press ENTER, the line gets tokenized and words that took up five characters like “PRINT” get turned in to one or two byte tokens. This can free up enough space to allow doing an EDIT on the line, then an Xtend to go to the end of the line, and type a few more characters. “When every last byte matters…”

Other space-saving techniques

REM – If space matters, leave out any REMarks in the code as they both take up extra space and slow the program down. I would sometimes comment my programs pretty heavily, then save the commented version out, and then delete the remmarks and save out the smaller version.
Spaces – Removing any unnecessary spaces (“GO TO 1000” becomes “GOTO1000”) helps.
Semicolons – A semi-colon is only required at the end of a PRINT line if you want to avoid having the carriage return printed, or when separating variables that BASIC cannot tell apart form keywords. For example, “PRINT A;B;C” is obviously needed because “PRINT ABC” would be a different variable. But, the semicolon is assumed between items you can print such as “PRINT VAL(A$);B$;CHR$(128)” which works just fine as “PRINT VAL(A$)B$CHR$(128)”. Heck, even when printing string variables like “PRINT A$B$C$D$” the semicolons are not needed. BUT, you still need it if BASIC can’t tell. If you wanted to print A$, B (number) and C$, you would need “PRINT A$B;C$” since BASIC needs to know where the B non-string variable ends.
Trailing Quotes – If a line ends in a quote, the quote can be left off. This goes for things like PRINT”SOMETHING” or even A$=”SOMETHING”. BASIC will figure it out when it gets to the end of the line.
Parenthesis – There are times when parenthesis are required to make math work (order of operations), but sometimes it is just done to make the code easier to read (like spaces and splitting up one instruction per line). In my X/Y example, I used “(Y*32)+X” but the parens don’t matter. The multiplication will happen first, then the addition, so “Y*32+X” will save two bytes. TEST FIRST! Some things look like they should work, but will require parens. (“1*256+1 – 1*256+1” feels like it should be zero, but it will be 2. For this you do need “(1*256+1)-(1*256+1)”.

And, of course, figuring out how to reduce code and combine things is always part of the process.

See also “Carl England’s CRUNCH program” which will do those steps for you.

There are many other tips and tricks for optimizing BASIC for speed or size, but hopefully these examples get you started in experimenting.

Until next time…

CoCo Disk BASIC disk structure – part 2

2 Replies

See also: part 1, part 2 and part 3.

In the first installment, we began exploring the anatomy of a Disk BASIC disk and how it is made up of tracks and sectors. We also took a peek at the file allocation table (FAT) and a simple program was shared that would count the number of empty granules on the disk. It did this by reading the file allocations able sector (track 17, sector 2) and looking at the first 68 bytes (which represent the status of the 68 available granules). If a byte was 255, the granule is free.

But what if it isn’t free?

File allocation table (FAT) revisited

In that case, the number will represent one of two things:

If the granule is used, but the file is larger and continues on to another granule, the value will be the granule number for the next granule of the file. It’s a linked list!
If the granule is used, but the file does not continue to another granule, the top two bits will be set (11xxxxxx, hex value &HC0 or decimal 192) and the remaining five bits will indicate how many sectors of that granule are part of the file.

If each granule is 2304 (and it is), and a file is 6000 bytes, it is going to need three granules (2304 *3 = 6912) to store that file. That would be two full granules (4608 bytes) and then the remaining 1292 bytes (6000-4608=1392) in the third granule. 1392 bytes needs 6 sectors (6*256 = 1536) to fit the remaining data.

But, since files are not always exactly multiples of 256-bytes, there is one more bit of information that tells how many bytes in the final sector are used by the file. That value is part of the directory entry in bytes 14-15:

Since the full file size is not part of the directory entry, we will need to scan the File Allocation Table and do some calculations. Here are what the 68 bytes of the FAT can be:

To calculate the size of a file, we need to do these steps:

Get the file’s directory entry (32 bytes), specifically byte 13 (the number of the first granule in the file) and bytes 14-15 (the number of bytes used in the last sector of the file).
Read the FAT byte that corresponds to the start granule of the file, then…
- If the value is 0-67, that is the number of the next granule used by the file. Add the size of the granule (2304 bytes) and get the next granule value.
- If the value has high two bits set (11000000), the remaining value will be how many sectors of that granule are used by the file. Since this is the last sector, add the “number of bytes used in the last sector” from the directory entry (bytes 14-15) then the number of sectors minus 1 multiplied by the size of a sector (256).

“And it’s just that easy!”

So let’s try it… Here is a more “complete” DIR program, though this time instead of doing less, it does more by showing the size of the file in bytes, rather than how many granules it takes up on disk, and by showing file types in a more verbose/descriptive way.

To speed it up (even though it is still slow), it will load the FAT entries in to an array, along with the directory entries. This makes calculating the size easier since everything is now a variable in an array rather than having to read and parse bytes from a disk sector.

10 ' FILEINFO.BAS
20 '
30 ' 0.0 2023-01-25 ALLENH
40 ' 0.1 2023-01-26 ADD DR
50 ' 0.2 2023-01-27 MORE COMMENTS
60 '
70 ' E$(0-1) - SECTOR HALVES
80 ' FT$ - FILE TYPE STRINGS
90 '
100 CLEAR 1500:DIM E$(1),FT$(3)
110 FT$(0)="BPRG":FT$(1)="BDAT":FT$(2)="M/L ":FT$(3)="TEXT "
120 '
130 ' DIR HOLDS UP TO 72 ENTRIES
140 '
150 ' NM$ - NAME
160 ' EX$ - EXTENSION
170 ' FT - FILE TYPE (0-3)
180 ' AF - ASCII FLAG (0/255)
190 ' FG - FIRST GRANULE #
200 ' BU - BYTES USED IN LAST SECTOR
210 ' SZ - FILE SIZE
220 '
230 DIM NM$(71),EX$(71),FT(71),AF(71),FG(71),BU(71),SZ(71)
240 '
250 INPUT "DRIVE";DR
260 '
270 ' FILE ALLOCATION TABLE
280 ' 68 GRANULE ENTRIES
290 '
300 DIM FA(67)
310 DSKI$ DR,17,2,G$,Z$:Z$=""
320 FOR G=0 TO 67
330 FA(G)=ASC(MID$(G$,G+1,1))
340 NEXT
350 '
360 ' READ DIRECTORY
370 '
380 DE=0
390 FOR S=3 TO 11
400 DSKI$ DR,17,S,E$(0),E$(1)
410 '
420 ' PART OF SECTOR
430 '
440 FOR P=0 TO 1
450 '
460 ' ENTRY WITHIN SECTOR PART
470 '
480 FOR E=0 TO 3
490 '
500 ' DIR ENTRY IS 32 BYTES
510 '
520 E$=MID$(E$(P),E*32+1,32)
530 '
540 ' NAME IS FIRST 8 BYTES
550 '
560 NM$(DE)=LEFT$(E$,8)
570 '
580 ' EXTENSION IS BYTES 9-11
590 '
600 EX$(DE)=MID$(E$,9,3)
610 '
620 ' FILE TYPE IS BYTE 12
630 '
640 FT(DE)=ASC(MID$(E$,12,1))
650 '
660 ' ASCII FLAG IS BYTE 13
670 '
680 AF(DE)=ASC(MID$(E$,13,1))
690 '
700 ' FIRST GRANUAL IS BYTE 14
710 '
720 FG(DE)=ASC(MID$(E$,14,1))
730 '
740 ' BYTES USED IN LAST SECTOR
750 ' ARE IN BYTES 15-16
760 '
770 BU(DE)=ASC(MID$(E$,15,1))*256+ASC(MID$(E$,16,1))
780 '
790 ' IF FIRST BYTE IS 255, END
800 ' OF USED DIR ENTRIES
810 '
820 IF LEFT$(NM$(DE),1)=CHR$(255) THEN 1390
830 '
840 ' IF FIRST BYTE IS 0, FILE
850 ' WAS DELETED
860 '
870 IF LEFT$(NM$(DE),1)=CHR$(0) THEN 1370
880 '
890 ' SHOW DIRECTORY ENTRY
900 '
910 PRINT NM$(DE);TAB(9);EX$(DE);"  ";FT$(FT(DE));" ";
920 IF AF(DE)=0 THEN PRINT"ASC"; ELSE PRINT "BIN";
930 '
940 ' CALCULATE FILE SIZE
950 ' SZ - TEMP SIZE
960 ' GN - TEMP GRANULE NUM
970 ' SG - SECTORS IN LAST GRAN
980 '
990 SZ=0:GN=FG(DE):SG=0
1000 '
1010 ' GET GRANULE VALUE
1020 ' GV - GRAN VALUE
1030 '
1040 GV=FA(GN)
1050 '
1060 ' IF TOP TWO BITS SET (C0
1070 ' OR GREATER), IT IS THE
1080 ' LAST GRANULE OF THE FILE
1090 ' SG - SECTORS IN GRANULE
1100 '
1110 IF GV>=&HC0 THEN SG=(GV AND &H1F):GOTO 1280
1120 '
1130 ' ELSE, MORE GRANS
1140 ' ADD GRANULE SIZE
1150 '
1160 SZ=SZ+2304
1170 '
1180 ' MOVE ON TO NEXT GRANULE
1190 '
1200 GN=GV
1210 GOTO 1040
1220 '
1230 ' DONE WITH GRANS
1240 ' CALCULATE SIZE
1250 '
1260 ' FOR EMPTY FILES
1270 '
1280 IF SG>0 THEN SG=SG-1
1290 '
1300 ' FILE SIZE IS SZ PLUS
1310 ' 256 BYTES PER SECTOR
1320 ' IN LAST GRAN PLUS
1330 ' NUM BYTES IN LAST SECT
1340 '
1350 SZ(DE)=SZ+(SG*256)+BU(DE)
1360 PRINT " ";SZ(DE)
1370 DE=DE+1
1380 NEXT:NEXT:NEXT
1390 END
1400 ' SUBETHASOFTWARE.COM

It looks like this:

And that, I suppose, is about as much disk talk as I can handle for the moment. Let me know in the comments if I missed anything else important.

Until then…

CoCo Disk BASIC disk structure – part 1

6 Replies

See also: part 1, part 2 and part 3.

I know I must have learned at least some basic stuff about the layout of an RS-DOS disk, because I had a directory searching routine in my first commercial CoCo program – Huffman K1 Librarian sold by Rulaford Research.

That product was a MIDI program that would load or save sound patches (synthesizer voices) to and from a Kawai K1 synthesizer. For functions where you were going to send a patch, it would allow showing all directory of the files of that type. The extension was used to determine if it was a single patch or a block. Though today, I cannot remember the details on what a “block” was for the K1.

Looking at that program now, it would have been nice if I had allowed the user to just cursor around the files and select one, rather than having the user type the name in. Maybe I’ll fix that in a version 1.3 someday … though I sold my K1 long ago, as well as all my CoCo MIDI gear, so I wouldn’t have any way to actually test the update. So maybe I won’t.

Anatomy of an RS-DOS disk

Back in those days, we’d refer to Disk Extended Color BASIC (DECB) as “RS-DOS”. I’m not sure why “Radio Shack DOS” was used for a name, since I don’t recall it saying this anywhere on the screen or in the manuals, but someone must have come up with it and it caught on. (Much like the nickname “CoCo”.)

RS-DOS had a simple file system that was described in detail in the back of the Disk BASIC manual. Back then, most of this was probably beyond me, since looking at it today it still is. It’s interesting that it described the technical details of the disk beyond just the tracks and sector data that could actually be used from BASIC — at least in the 1981 TRS-80 Color Computer Disk System Owners Manual & Programming Guide.

I also found it interesting that by the 1986 version of the manual, which was the version available after the CoCo 3 was released, this technical information had been removed.

Above, out of those 338 bytes, the only section we got to use was the 256 data bytes. The rest was used by the FD1773 floppy drive controller chip.

Looking back at the 1981 manual, the format of an RS-DOS disk was pretty clearly defined. The disk drive was a single-sided 35 track device, and each of those tracks contain 18 sectors. Each sector was 256 bytes. Each track of 18 256-byte sectors could hold 4608 bytes. This meant that a disk could hold 161,280 bytes of data! (35 * 18 * 256) Wow, 157K of storage!

Side Note: Although Radio Shack never updated the Disk BASIC ROM to take advantage of it, the floppy controller was capable of supporting up to three double-sided 80 track (720K) floppy drives. Others came up with patches (or replacement DOS ROMs) that let BASIC handle this. This was also supported in disk operating systems such as OS-9 (which was sold by Radio Shack) and FLEX. But, we’re sticking with generic Disk Extended Color BASIC for this article series, so 35 tracks it is…

While 157K sounds pretty amazing, we didn’t actually get to use all of that from BASIC. Track 17 was used to store the directory and file allocation table (FAT). Yep, even back then, Microsoft (who wrote this Disk BASIC) already had a FAT file system… Just not the FAT we became familiar with via PC-DOS/MS-DOS a few years later.

Since track 17 could not be used for data storage, that left us with 34 tracks we could use — 156,672 bytes. Oh well, 153K of high speed disk access sure beats cassette storage. Or, as we learned, “68 granules” of high speed disk access.

Side Note: Tracks are numbers from 0 to 34, so track 17 is actually the 18th track. Sectors, however, are numbers 1 to 18. Go figure. Drives were numbers 0 to 3, so sectors are really the odd one here.

Here is a representation of the 35 tracks on a disk (numbers 0 to 34):

+----------+
| Track 00 | - 4608 bytes
+----------+
| Track 01 | - 4608 bytes
+----------+
|    ...   |
|    ...   |
+----------+
| Track 17 | - File Allocation Table and Directory
+----------+
|    ...   |
|    ...   |
+----------+
| Track 33 | - 4608 bytes
+----------+
| Track 34 | - 4608 bytes
+----------+

And here is a representation of the 256-byte sectors inside a track (numbered 1 to 18):

Track X
+----+----+----+-----+----+----+----+
| 01 | 02 | 03 | ... | 06 | 07 | 18 | - 256 bytes each
+----+----+----+-----+----+----+----+

Granule

gran·ule /ˈɡranˌyo͞ol/ noun
a small compact particle of a substance.
– https://www.merriam-webster.com/dictionary/granule

Each of the available 34 tracks was split in half (each half being 9 256-byte sectors) and called granules. Each granule was, therefore, 9 * 256 bytes in size — 2304 bytes.

To see how much space was free on disk, we could type:

PRINT FREE(0)

…and that would print a number, such as 68 for a completely empty disk, or 0 for a full one. Granules never made much sense to me back then, and I don’t suppose they really do today except that it was “half a track.” I don’t know if that would have meant much to me before I got in to OS-9 years later and learned way more about floppy disks.

Track 17, where the directory and FAT were stored, was two more granules of storage we didn’t get to use.

Here is a representation of the granules of a disk, numbers 0 to 68:

Track 00 +------------+
         | Granule 00 | - Sectors 1-9   (2304 bytes)
         | Granule 01 | - Sectors 10-18 (2304 bytes)
Track 01 +------------+
         | Granule 02 | - Sectors 1-9   (2304 bytes)
         | Granule 03 | - Sectors 10-18 (2304 bytes)
Track xx +------------+
         |     ...    |
Track 16 +------------+
         | Granule 33 | - Sectors 1-9   (2304 bytes)
         | Granule 34 | - Sectors 10-18 (2304 bytes)
Track 17 +------------+
         | FAT &      | - 4608 bytes
         | Directory  |
Track 18 +------------+
         | Granule 35 | - Sectors 1-9   (2304 bytes)
         | Granule 36 | - Sectors 10-18 (2304 bytes)
Track xx +------------+
         |     ...    |
Track 34 +------------+
         | Granule 67 | - Sectors 1-9   (2304 bytes)
         | Granule 68 | - Sectors 10-18 (2304 bytes)
         +------------+

Here is a simple program that displays this information on the CoCo 32-column screen. Use UP/DOWN arrows to go track by track, or SHIFT-UP/SHIFT-DOWN to go a page at a time:

10 'DISKMAP.BAS
20 '
30 ' TRACKS & SECTORS
40 '
50 CLS
60 PRINT"TRACK         SECTORS"
70 PRINT"          (1-9)    (10-18)"
80 PRINT STRING$(27,"-");
90 ST=0:PG=12
100 FOR A=0 TO PG
110 TR=ST+A
120 PRINT@96+32*A,TR;
130 IF TR=17 THEN PRINT TAB(10);"FAT & DIRECTORY ";:GOTO 160
140 IF TR<17 THEN GR=(TR*2)+1 ELSE GR=((TR-1)*2)+1
150 PRINT TAB(9);"GRAN";GR;TAB(19);"GRAN";GR+1;
160 NEXT
170 A$=INKEY$:IF A$="" THEN 170
180 IF A$=CHR$(94) THEN ST=ST-1
190 IF A$=CHR$(95) THEN ST=ST-PG-1
200 IF A$=CHR$(10) THEN ST=ST+1
210 IF A$=CHR$(91) THEN ST=ST+PG+1
220 IF ST<0 THEN ST=0
230 IF ST>34-PG THEN ST=34-PG
240 GOTO 100

If you wanted to use all of an RS-DOS disk, a CoCo program could use disk access commands to read/write sectors to any spot on the disk — including track 17 — and manually use all 70 granules for storage. But, if it did that, typing “DIR” would not produce expected results (they would be no directory) and trying to SAVE something on this disk would overwrite data (if it worked at all; it would have needed valid directory information to even do this).

But I digress…

Track 17

Of the 18 sectors contained in track 17, sectors 1 and 12-18 were “for future use.” Radio Shack never used them, as far as I know, but third party patches to Disk BASIC did use them for other features, such as supporting 40 track drives.

Sector 1 – Unused (“for future use”)
Sector 2 – File Allocation Table (FAT)
Sectors 3-11 – Directory Entries
Sectors 13-18 – Unused (“for future use”)

FAT (File Allocation Table)

The first 68 bytes of Sector 2 contained the file allocation table. Each byte represented the status of one of the available granules on the disk. If the granule was not used by any file, the byte representing it would be set to 255 (&HFF). I expect that the FREE() command simply read Track 17, Sector 2, and quickly scanned the first 68 bytes, counting how many were 255.

DSKI$ / DSKO$

Let’s do a quick tangent here. Disk BASIC provided two commands for reading and writing sectors on the disk. DSKI (disk input) and DSKO (disk output) needed a drive number (0-3), track number (0-34), and a sector number (1-18) to read or write from/to. Since a Color BASIC string variable could not be longer than 255 (the maximum size a byte could represent for length), a string could not hold an entire sector. Because of this, DSKI and DSKO split the sector up in to two 128-byte strings like this:

CLERA 256
DSKI$ 0,17,2,A$,B$

Above, the CLEAR 256 is needed to increase string space from the default 200 bytes to enough to store the full sector in A$ and B$ and two 128 byte strings. Keep in mind, more memory will be needed when you do any manipulation on either of those strings. As you will see below, CLEAR 384 is really needed at the very least, since if you do a MID$() or LEFT$() on A$ or B$, enough string memory has to be available to hold a copy of that string (256+128 is 384). See my string abuse article for a deep dive in to why that is the case.

For DSKI$, the first parameter is the drive (0-3), the second is the track (0-34) and the third is the sector (1-18). After that are two string variables that will hold the two halves of the 256-byte sector. In this example, A$ holds the fist 128 bytes, and B$ holds the second 128 bytes.

I only wanted to mention this so I could show a BASIC program that calculates how much space is free on a disk by reading the FAT bytes. It might look something like this:

0 'DISKFREE.BAS
10 CLEAR 384
20 INPUT "DRIVE";DR
30 DSKI$ DR,17,2,A$,B$
40 FOR I=1 TO 68
50 IF MID$(A$,I,1)=CHR$(255) THEN FG=FG+1
60 NEXT
70 PRINT "FREE GRANULES:";FG

This could, of course, be made smaller and faster. And, if you wanted to show the free space in bytes, you could just multiply the free granules (FG) variable by 2304, the side of a granule:

70 PRINT "FREE SPACE:";FG*2304;"BYTES"

Of course, the FREE(0) command could also have been used for this, even getting the value in a variable:

10 PRINT "FREE GRANULES:";FREE(0)

10 PRINT "FREE SPACE:";FREE(0)*2304;"BYTES"

10 FG=FREE(0):PRINT "FREE GRANULES:";FG

10 FS=FREE(0)*2304:PRINT "FREE SPACE:";FS;"BYTES"

But I digress.

But what if the granule is being used by a file? If you wanted to see the values in the non-free granules used on the disk, you could modify the program as follows:

0 'DISKFREE.BAS
10 CLEAR 384
20 INPUT "DRIVE";DR
30 DSKI$ DR,17,2,A$,B$
40 FOR I=1 TO 68
50 GN=ASC(MID$(A$,I,1))
55 IF GN=255 THEN FG=FG+1 ELSE PRINT GN;
60 NEXT:PRINT
70 PRINT "FREE GRANULES:";FG

If you run that, you will probably see values that are outside of the range of a granule number (0-67). This will be explained later when we discuss the FAT in more detail.

Bonus: File Exists?

Here’s some useless code. This routine will determine if a file exists. Rather than parse each 32 byte portion of the sectors, I decided to use INSTR() to see if the target filename string exists anywhere in the sector strings. To make sure “EMP” didn’t show up as a match for a file named “EMPTY”, I pad the target string with spaces just like they are stored in the directory.

10 INPUT "FILE NAME";F$
20 ' PAD NAME WITH SPACES
30 F$=F$+STRING$(8-LEN(F$)," ")
40 FOR S=3 TO 17
50 DSKI$ 0,17,S,A$,B$
60 IF INSTR(A$,F$) OR INSTR(B$,F$) THEN PRINT "FOUND":END
70 NEXT

This could be improved by having the process stop as soon as it finds an entry starting with 255 (no more directories after that point). To keep the code small, a simple “55 IF ASC(LEFT$(A$,1))=255 THEN END” would be enough. It might still read a sector more than it needs to, since it’s not checking every entry in the string, but that would be a way to do it with minimal code.

We’ll do something less useless in part 2.

Until then…

Write BASIC easier with Alex Evans Python scripts

Leave a reply

I’ve mentioned the BASIC Utilities from Alex Evans quite a bit lately. These Python scripts can be run on any system that has Python. I have used them under Windows 11 and macOS. In addition to having scripts that will tokenize and de-tokenize BASIC programs, it has some other interesting capabilities. One is the ability to have a BASIC source file that is written in a more user-friendly way — using labels instead of line numbers, and then only having them where needed.

Consider this example of a “guess the number” game:

NUMBER = RND(100)

PRINT "I AM THINKING OF A NUMBER FROM"
PRINT "1 TO 100. CAN YOU GUESS IT?"

LOOP: INPUT "WHAT IS YOUR GUESS";GUESS

IF GUESS > NUMBER THEN PRINT "THAT IS TOO HIGH!"
IF GUESS < NUMBER THEN PRINT "THAT IS TOO LOW!"
IF GUESS = NUMBER THEN PRINT "YOU GOT IT!":END

GOTO LOOP

This code can be processed through “coco_pack.py” and it will become a tokenized program (guess.bas.pack) that runs on a CoCo. The PACK routine will do things like rename longer variables (such as “GUESS”) to valid short ones, as well as change labels (LABEL:) in to line numbers. Obviously, it also adds line numbers to everything in between, giving the programmer one less thing to worry about.

I processed this file using “coco_pack.py -t guess.bas” to have it generate a text file (-t) rather than a tokenized file. It produced “guess.bas.pack” as a text file which looked like this:

0 A=RND(100):PRINT"I AM THINKING OF A NUMBER FROM":PRINT"1 TO 100. CAN YOU GUESS IT?"
1 INPUT"WHAT IS YOUR GUESS";B:IFB>A THENPRINT"THAT IS TOO HIGH!"
2 IFB<A THENPRINT"THAT IS TOO LOW!"
3 IFB=A THENPRINT"YOU GOT IT!":END
4 GOTO1

You can see how it combined as much as possible on each line, taking three separate lines (RND, and two PRINTs) in to one. The INPUT needed to be on its own line since it will be the target of a GOTO loop. The first comparison (is guess too high) could be moved to the end of that line, but the next ones needed to be separate lines.

The result is something that looks a bit more friendly being turned in to something much more compact. Using “coco_pack guess.bas” without “-t” would have produced a tokenized file ready to load on the CoCo.

A warning about variable and label names

In BASIC, there are certain things you cannot use for variable names. You cannot have a variable named “FOR” since that is a keyword, but you can use “FO“. (Besides, if you use a variable name longer than two characters, the rest get ignored. “FOR” should be honored as “FO” and be allowed, but the BASIC parser has already found “FOR” and turned it in to a token … I think.)

Some variables work fine on Color BASIC, but cannot be used on Extended BASIC. “FN” is allowed on Color BASIC, but you cannot use “FN” as a variable under Extended/Disk BASIC because FN is a token (see DEF FN). “AS” works fine as a variable for Color or Extended BASIC, but cannot be used under Disk BASIC because “AS” is now a keyword (see the “FIELD” command).

The Alex Evans tools need to follow similar tools, since they are parsing the line looking for BASIC keywords to tokenize. If you tried to make a label called “ASK:”, as I initially did, you will find that a “GOTO ASK” does not work — it turns in to “GOTO ASx” where X is a variable it made, since it treated “AS” as a token, and then K as a variable. This is something that Alex may be able to resolve in a future update, but for right now, keep labels and long variables to things that are not conflicting with BASIC keywords.

Here’s an example of bad stuff. Notice that you can use leading tabs/spaces to help “prettify” the code if you want, but since it is still BASIC lines, everything that needs to be on the same line in BASIC will have to be on the same line of this code:

FOR=1

GOTO: PRINT "GOTO"
    GOTO GOTO

LOOP: PRINT "LOOP"
    GOTO LOOP

And here is what it tries to create:

0 FOR=1:PRINT"GOTO":GOTOGOTO
1 PRINT"LOOP":GOTO1

You can see it tried to make “FOR=1”, which will error out on the CoCo, and it did not treat “GOTO” as a label, since GOTO is a keyword.

Using LOOP: works just fine, since there is no LOOP keyword.

Just keep that in mind. This is a script that creates files, and not a full BASIC interpreter that can give you errors as you type things in. :)

A warning about line format

Since these are lines of BASIC, they have to be on the same line just as BASIC expects. You can NOT write something like:

REM * THIS WON'T WORK:
IF G > N
    PRINT "THAT IS TOO HIGH!"
ELSE IF G < N
    PRINT "THAT IS TOO LOW!"

…because that would be like doing this in BASIC:

10 REM * THIS WON'T WORK:
20 IF G > N
30 PRINT "THAT IS TOO HIGH"
40 ELSE IF G < N
50 PRINT "THAT IS TOO LOW!"

The PACK process would then combine lines and produce:

0 IFA>B
1 PRINT"THAT IS TOO HIGH!":ELSEIFA<B
2 PRINT"THAT IS TOO LOW!"

“It’s still BASIC, after all!”

Carl England’s CRUNCH versus Alex Evans’ PACK

Leave a reply

After learning about some of the things the PACK script from Basic Utils from Alex Evans, I wondered how it compared to the Carl England CRUNCH program that ran directly on a CoCo.

The Test Program

First, I wanted to create a simple test program that would allow for lines to be packed together easily. To do this, I made a BASIC program that created a BASIC program. Each line would have a PRINT command followed by a quoted string of 32 characters (the width of the CoCo screen). My program creator looks like this:

0 'MTOOBIG.BAS
10 OPEN "O",#1,"TOOBIG.BAS"
20 FOR I=1 TO 26
30 PRINT #1,I*10;"PRINT ";CHR$(34);STRING$(32,64+I);CHR$(34);";"
40 NEXT
50 CLOSE #1

This program is designed to create a disk file, but it could have been modified to work on cassette by changing the device #1 to device #-1.

The resulting file “TOOBIG.BAS” can be loaded. It looks like this:

10 PRINT "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA";
20 PRINT "BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB";
30 PRINT "CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC";
40 PRINT "DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD";
50 PRINT "EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE";
60 PRINT "FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF";
70 PRINT "GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG";
80 PRINT "HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH";
90 PRINT "IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII";
100 PRINT "JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ";
110 PRINT "KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK";
120 PRINT "LLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLL";
130 PRINT "MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM";
140 PRINT "NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN";
150 PRINT "OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO";
160 PRINT "PPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP";
170 PRINT "QQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQ";
180 PRINT "RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR";
190 PRINT "SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS";
200 PRINT "TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT";
210 PRINT "UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU";
220 PRINT "VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV";
230 PRINT "WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW";
240 PRINT "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX";
250 PRINT "YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY";
260 PRINT "ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ";

Carl England’s CRUNCH

The CRUNCH utility by Carl England can be found on his The Defeater disk. You can download it from the Color Computer Archive site:

https://colorcomputerarchive.com/search?q=Carl+England

With my working disk in Drive 0, I mounted Carl’s disk image in Drive 1 and copied the CRUNCH program to my disk so I could get to it easier.

With my “TOOBIG.BAS” program loaded, I can then LOADM”CRUNCH”:EXEC to run it. Here is an example of what CRUNCH looks like (but ignore the program size shown at the bottom of the screen; this screen shot it from an earlier article where I was using it on a different program):

For the correct size, here is the output of CRUNCH when processing on the “TOOBIG.BAS” test program:

You can see it managed to shrink the program by 102 bytes! There were no trailing quotes, REM statements, or THEN GOTOs to remove, but it could take out the space after each PRINT and pack the lines. Somewhat.

Inspecting the resulting program was not what I expected. It did some work, but not as much as it could — for example, while it did combine lines 10-60 in to one longer line, it left line 70 alone. The same thing happened with lines 140 and 210. This may be a bug.

UPDATE: Since I wrote this, I bribed William “Lost Wizard” Astle to disassemble the CRUNCH program for me, and provided the source to the original author Carl England (it was easier than hoping Carl could somehow find his source code from 23 years ago). Carl identified a one byte change to fix this issue, and I recompiled and it fixes the following behavior to match that of the Alex Evans tools. More on this in a future article…

10 PRINT"AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA";:PRINT"BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB";:PRINT"CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC";:PRINT"DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD";:PRINT"EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE";:PRINT"FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF";
70 PRINT"GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG";
80 PRINT"HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH";:PRINT"IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII";:PRINT"JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ";:PRINT"KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK";:PRINT"LLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLL";:PRINT"MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM";
140 PRINT"NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN";
150 PRINT"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO";:PRINT"PPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP";:PRINT"QQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQ";:PRINT"RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR";:PRINT"SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS";:PRINT"TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT";
210 PRINT"UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU";
220 PRINT"VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV";:PRINT"WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW";:PRINT"XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX";:PRINT"YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY";:PRINT"ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ";

Using my “BASLINES.BAS” test program, I can get some stats on the line lengths before and after:

ADDR NADDR LINE# SIZ
 9729  9771    10  42
 9771  9813    20  42
 9813  9855    30  42
 9855  9897    40  42
 9897  9939    50  42
 9939  9981    60  42
 9981 10023    70  42
10023 10065    80  42
10065 10107    90  42
10107 10149   100  42
10149 10191   110  42
10191 10233   120  42
10233 10275   130  42
10275 10317   140  42
10317 10359   150  42
10359 10401   160  42
10401 10443   170  42
10443 10485   180  42
10485 10527   190  42
10527 10569   200  42
10569 10611   210  42
10611 10653   220  42
10653 10695   230  42
10695 10737   240  42
10737 10779   250  42
10779 10821   260  42
10821 10874 63000  53
10874 10997 63010 123
10997     0

The programs lines from 10 to 260 are all 42 bytes in length. The two lines after that are the BASLINES routine.

The CRUNCHed version looks like this:

 ADDR NADDR LINE# SIZ
 9729  9955    10 226
 9955  9996    70  41
 9996 10222    80 226
10222 10263   140  41
10263 10489   150 226
10489 10530   210  41
10530 10719   220 189
10719 10772 63000  53
10772 10895 63010 123
10895     0

It took 26 lines down to just seven, but the way it packed lines together is a bit odd, with a long line (226 bytes) then a short line (41 bytes), repeating. The 41 byte line is our original 42 byte line with the space after PRINT removed. The math checks out.

Alex Evans’ PACK

Alex Evans’ PACK did a better job, combining all 26 lines of code in to one impossibly long line:

0 PRINT"AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA";:PRINT"BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB";:PRINT"CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC";:PRINT"DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD";:PRINT"EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE";:PRINT"FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF";:PRINT"GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG";:PRINT"HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH";:PRINT"IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII";:PRINT"JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ";:PRINT"KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK";:PRINT"LLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLL";:PRINT"MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM";:PRINT"NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN";:PRINT"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO";:PRINT"PPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP";:PRINT"QQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQ";:PRINT"RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR";:PRINT"SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS";:PRINT"TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT";:PRINT"UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU";:PRINT"VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV";:PRINT"WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW";:PRINT"XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX";:PRINT"YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY";:PRINT"ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ";

Using my “BASLINES” program, here is what it shows:

 ADDR NADDR LINE# SIZ
 9729 10695     0 966
10695 10748 63000  53
10748 10871 63010 123
10871     0

Yes, that’s right! Alex Evans’ PACK turned the entire 26 line program in to ONE line 0 that was 966 bytes long. Wowza.

I plan to do more tests to see if there are any tricks that CRUNCH does that PACK does not, so you have that to look forward to…

Until next time…

Alex Evans’ BASIC UTILS change everything – part 3

Leave a reply

Or…

An easier way

See also: part 1, part 2 and part 3.

So far, we’ve explored how BASIC programs are stored in memory, experimented with moving a program somewhere else in memory and making it run, then finally creating a stupid one line program that features over 1700 bytes of PRINT statements — an impossible (to type in) program!

But there’s certainly easier ways. And one such easier way was created by Alex Evans, though I’m not sure if his intent was to create stupid crazy long line programs or not.

Over on Alex’s Github page is a series of Python scripts that manipulate CoCo BASIC programs:

https://github.com/varmfskii/basic_utils

Alex has created routines to convert an ASCII program to a tokenized one, as well as convert tokenized BASIC program to ASCII. Very useful!

But, he also has routines to RENUMBER a program, and even PACK and UNPACK a program. It’s the PACK and UNPACK that caught my interest, once I realized what they could do.

Another interesting on is the “RE ID” script, which allows you to create a BASIC program as a text file and use long, descriptive variable names. The REID script will convert them all to short variables that actually work in BASIC. That could be fun.

Let’s look at how these are used… To make them work, you just need Python installed. Mac and Linux come with it, but for Windows you’ll need to download it. You can then run these scripts as if they were programs, though specific details for Windows are something I can’t currently add because I haven’t tried it yet.

BASIC UTILS

Converting between ASCII text file and tokenized BASIC:

coco_tokenize.py – Convert BASIC program in text form to tokenized form.

Usage: coco_tokenize.py [<opts>] [<iname>] [<oname>]
	-b	--basic=<dialect>	basic dialect
	-c	--cassette		tokenized cassette file
	-d	--disk			tokenized disk file (default)
	-h	--help			this help
	-i<n>	--input=<file>		input file
	-o<n>	--output=<file>		output file

coco_detokenize.py – Convert a tokenized BASIC program into text.

Usage: coco_detokenize.py [<opts>] [<iname>] [<oname>]
	-b	--basic=<dialect>	basic dialect
	-c	--cassette		tokenized cassette file
	-d	--disk			tokenized disk file (default)
	-h	--help			this help
	-i<n>	--input=<file>		input file
	-o<n>	--output=<file>		output file

Packing BASIC program lines together, or splitting them up:

coco_pack.py – Make a BASIC program take as little space as possible.

Usage: coco_pack.py [<opts>] [<iname>] [<oname>]
	-b	--basic=<dialect>	basic dialect
	-c	--cassette		tokenized cassette file
	-d	--disk			tokenized disk file (default)
	-h	--help			this help
	-i<n>	--input=<file>		input file
	-k	--token-len		line length is for tokenized form
	-m	--maxline=<num>		maximum line length
	-o<n>	--output=<file>		output file
	-t	--text			output as text file
	-x	--text-len		line length is for untokenized form

coco_unpack.py – Split a BASIC program up to one command per line.

Usage: coco_unpack.py [<opts>] [<iname>] [<oname>]
	-b	--basic=<dialect>	basic dialect
	-c	--cassette		tokenized cassette file
	-d	--disk			tokenized disk file (default)
	-h	--help			this help
	-i<n>	--input=<file>		input file
	-n	--no-whitespace		do not add extra whitespace
	-o<n>	--output=<file>		output file

Renumbering the program:

coco_renumber.py – Adjust line numbers.

Usage: coco_renumber.py [<opts>] [<iname>] [<oname>]
	-b	--basic=<dialect>	basic dialect
	-c	--cassette		tokenized cassette file
	-d	--disk			tokenized disk file (default)
	-h	--help			this help
	-i<n>	--input=<file>		input file
	-o<n>	--output=<file>		output file
	-s<n>	--start=<num>		starting line number
	-t	--text			output as text file
	-v<n>	--interval=<num>	interval between line numbers

Renaming long variables:

coco_reid.py – Transform variable names. This allows source code with long meaningful variable names to fit the restrictions of BASIC. This will also obfuscate existing variable names since they are replaced by A, B, C, etc.

Usage: coco_reid.py [<opts>] [<iname>] [<oname>]
	-b	--basic=<dialect>	basic dialect
	-c	--cassette		tokenized cassette file
	-d	--disk			tokenized disk file (default)
	-h	--help			this help
	-i<n>	--input=<file>		input file
	-o<n>	--output=<file>		output file
	-t	--text			output as text file

Tokenizing / Detokenizing

To test these programs, I had a copy of my ALLRAM BBS from 1983. The file I had was an ASCII text file, so I was able to convert it to a tokenized BASIC program:

% coco_tokenize.py allram.bas
% ls -l allram.bas*          
-rw-r--r--@ 1 allenh  staff  6230 Apr  3  2013 allram.bas
-rw-r--r--@ 1 allenh  staff  5125 Feb 13 21:23 allram.bas.tok

Above, you can see the original test file (allram.bas) and the tokenized version (allram.bas.tok). If I had started with a tokenized file, I could convert it to ASCII like this:

% coco_detokenize.py allram.bas.tok 
% ls -l allram.bas*                
-rw-r--r--@ 1 allenh  staff  6230 Apr  3  2013 allram.bas
-rw-r--r--@ 1 allenh  staff  5125 Feb 13 21:23 allram.bas.tok
-rw-r--r--@ 1 allenh  staff  6229 Feb 13 21:23 allram.bas.tok.txt

The original “allram.bas” had a blank line at the start of the file, but other than that, it and the tokenized-then-detokenized “allram.bas.tok.txt” should be the same.

Packing

I already tried to make my *ALLRAM* BBS somewhat compact, but I’ve learned much more since 1983. Today I’d go through it line by line and combine them as much as I could. OR, I could just use “coco_pack.py” like this:

% ./coco_pack.py allram.bas    
% ls -l allram.bas*                
-rw-r--r--@ 1 allenh  staff  6230 Apr  3  2013 allram.bas
-rw-r--r--@ 1 allenh  staff  4441 Feb 13 21:26 allram.bas.pack
-rw-r--r--@ 1 allenh  staff  5125 Feb 13 21:23 allram.bas.tok
-rw-r--r--@ 1 allenh  staff  6229 Feb 13 21:25 allram.bas.tok.txt

“coco_pack” script will work on an ASCII file and produc a tokenized file that is more compact that if you’d just used “coco_tokenize” on the original file. In this case, my BASIC program (tokenized) was 5125 bytes. The packed version was 4441 bytes!

What did it do, I wondered? I then de-tokenized this packed version so I could look at it:

% ./coco_detokenize.py allram.bas.pack
% ls -l allram.bas*                 
-rw-r--r--@ 1 allenh  staff  6230 Apr  3  2013 allram.bas
-rw-r--r--@ 1 allenh  staff  4441 Feb 13 21:26 allram.bas.pack
-rw-r--r--@ 1 allenh  staff  5497 Feb 13 21:28 allram.bas.pack.txt
-rw-r--r--@ 1 allenh  staff  5125 Feb 13 21:23 allram.bas.tok
-rw-r--r--@ 1 allenh  staff  6229 Feb 13 21:25 allram.bas.tok.txt

And you can see there is now an “allram.bas.pack.txt” de-tokenized file. This allowed me to inspect and see what the routine was doing.

The pack routine removed all comments, as well as any unnecessary spaces. It also combines lines when possible, and renumbered by 1s (starting at line 0).

Original (first few lines):

0 REM *ALL RAM* BBS System 1.0
1 REM   Shareware / (C) 1983
2 REM     By Allen Huffman
3 REM  110 Champions Dr, #811
4 REM     Lufkin, TX 75901
5 CLS:FORA=0TO8:READA$:POKE1024+A,VAL("&H"+A$):NEXTA:EXEC1024:DATAC6,1,96,BC,1F,2,7E,96,A3
10 CLEAR21000:DIMNM$(200),MS$(19,10),A$,F$,S$,T$,BR$,CL$,NM$,PS$,PW$,A,B,C,CL,LN,LV,MS,NM,KY,UC
15 CL$=CHR$(12)+CHR$(14):BR$="*==============*==============*":GOSUB555

Packed:

0 CLS:FORJ=0TO8:READI$:POKE1024+J,VAL("&H"+I$):NEXTJ:EXEC1024:DATAC6,1,96,BC,1F,2,7E,96,A3:CLEAR21000:DIMA$(200),B$(19,10),I$,H$,F$,E$,B$,C$,A$,D$,G$,J,H,A,E,I,K,F,C,G,D:C$=CHR$(12)+CHR$(14):B$="*==============*==============*":GOSUB62

You can see it removed the REMs in lines 0-4, then combined lines 5, 10 and 15 in to one long line. You can also see it changed the variables names, reducing any 2-character names to a single character.

To explain the line combining, take a look at this simple program:

10 PRINT "A"
20 PRINT "B"
30 PRINT "C"

It would pack in to this:

0 PRINT"A":PRINT"B":PRINT"C"

Not all lines can be combined. If a line ends with a GOTO/GOSUB, putting anything after it wouldn’t be executed. The same is true for IF/THEN that go to a line.

For example:

10 IF A=1 THEN 100
20 IF A=2 THEN 200
30 IF A=3 THEN 300
40 END
100 REM 100
105 PRINT "100"
110 END
200 REM 200
205 PRINT "200"
210 END
300 REM 300
305 PRINT "300"
310 END

Above, the “IF A=n” lines can’t have anything after them (without modifying the program to use ELSE), so packing produces this:

0 IFA=1THEN4
1 IFA=2THEN5
2 IFA=3THEN6
3 END
4 PRINT"100":END
5 PRINT"200":END
6 PRINT"300":END

Above, the first three IF statements were left alone, since you couldn’t put anything after the “THEN”. Pack updated any GOTO that targeted a REM line to start after the REM so the REMs could be removed. (That is something it didn’t do a week or two ago when I first started writing this. Alex has made some substantial updates to the scripts lately.)

Side Note: For this example you could do something like “IF A=1 THEN 4 ELSE IF A=2 THEN 6 ELSE IF A=3 THEN 8” on one line. However, there are other examples where if the first condition was not met you still would want the following IFs to execute. There’s really no way for a script to know the intent, so it cannot “just assume” and change code like that.

The pack script is also capable of packing lines well beyond the 249 characters you can type. Consider this program made up of 26 PRINT commands:

10 PRINT "AAAAAAAAAAAAAAAAAAAAAAAAAA"
20 PRINT "BBBBBBBBBBBBBBBBBBBBBBBBBB"
30 PRINT "CCCCCCCCCCCCCCCCCCCCCCCCCC"
40 PRINT "DDDDDDDDDDDDDDDDDDDDDDDDDD"
50 PRINT "EEEEEEEEEEEEEEEEEEEEEEEEEE"
60 PRINT "FFFFFFFFFFFFFFFFFFFFFFFFFF"
70 PRINT "GGGGGGGGGGGGGGGGGGGGGGGGGG"
80 PRINT "HHHHHHHHHHHHHHHHHHHHHHHHHH"
90 PRINT "IIIIIIIIIIIIIIIIIIIIIIIIII"
100 PRINT "JJJJJJJJJJJJJJJJJJJJJJJJJJ"
110 PRINT "KKKKKKKKKKKKKKKKKKKKKKKKKK"
120 PRINT "LLLLLLLLLLLLLLLLLLLLLLLLLL"
130 PRINT "MMMMMMMMMMMMMMMMMMMMMMMMMM"
140 PRINT "NNNNNNNNNNNNNNNNNNNNNNNNNN"
150 PRINT "OOOOOOOOOOOOOOOOOOOOOOOOOO"
160 PRINT "PPPPPPPPPPPPPPPPPPPPPPPPPP"
170 PRINT "QQQQQQQQQQQQQQQQQQQQQQQQQQ"
180 PRINT "RRRRRRRRRRRRRRRRRRRRRRRRRR"
190 PRINT "SSSSSSSSSSSSSSSSSSSSSSSSSS"
200 PRINT "TTTTTTTTTTTTTTTTTTTTTTTTTT"
210 PRINT "UUUUUUUUUUUUUUUUUUUUUUUUUU"
220 PRINT "VVVVVVVVVVVVVVVVVVVVVVVVVV"
230 PRINT "WWWWWWWWWWWWWWWWWWWWWWWWWW"
240 PRINT "XXXXXXXXXXXXXXXXXXXXXXXXXX"
250 PRINT "YYYYYYYYYYYYYYYYYYYYYYYYYY"
260 PRINT "ZZZZZZZZZZZZZZZZZZZZZZZZZZ"

…and PACK would be able to combine them all in to ONE really long line:

0 PRINT"AAAAAAAAAAAAAAAAAAAAAAAAAA":PRINT"BBBBBBBBBBBBBBBBBBBBBBBBBB":PRINT"CCCCCCCCCCCCCCCCCCCCCCCCCC":PRINT"DDDDDDDDDDDDDDDDDDDDDDDDDD":PRINT"EEEEEEEEEEEEEEEEEEEEEEEEEE":PRINT"FFFFFFFFFFFFFFFFFFFFFFFFFF":PRINT"GGGGGGGGGGGGGGGGGGGGGGGGGG":PRINT"HHHHHHHHHHHHHHHHHHHHHHHHHH":PRINT"IIIIIIIIIIIIIIIIIIIIIIIIII":PRINT"JJJJJJJJJJJJJJJJJJJJJJJJJJ":PRINT"KKKKKKKKKKKKKKKKKKKKKKKKKK":PRINT"LLLLLLLLLLLLLLLLLLLLLLLLLL":PRINT"MMMMMMMMMMMMMMMMMMMMMMMMMM":PRINT"NNNNNNNNNNNNNNNNNNNNNNNNNN":PRINT"OOOOOOOOOOOOOOOOOOOOOOOOOO":PRINT"PPPPPPPPPPPPPPPPPPPPPPPPPP":PRINT"QQQQQQQQQQQQQQQQQQQQQQQQQQ":PRINT"RRRRRRRRRRRRRRRRRRRRRRRRRR":PRINT"SSSSSSSSSSSSSSSSSSSSSSSSSS":PRINT"TTTTTTTTTTTTTTTTTTTTTTTTTT":PRINT"UUUUUUUUUUUUUUUUUUUUUUUUUU":PRINT"VVVVVVVVVVVVVVVVVVVVVVVVVV":PRINT"WWWWWWWWWWWWWWWWWWWWWWWWWW":PRINT"XXXXXXXXXXXXXXXXXXXXXXXXXX":PRINT"YYYYYYYYYYYYYYYYYYYYYYYYYY":PRINT"ZZZZZZZZZZZZZZZZZZZZZZZZZZ"

That’s 886 characters of code on one line.

Pack is awesome!

But that’s not all…

Unpacking

An interesting feature of Alex’s tools is the ability to unpack a program down to something that looks pretty … but won’t run on a real CoCo. The unpack script will break everything down to one statement per line, but doesn’t add additional line numbers. This makes code prettier and easy to edit in a text editor:

1000 'USER INPUT
1005 LINE INPUT A$
 A$=LEFT$(A$,64)
 IF UC=0 OR A$=""THEN RETURN
1010 FOR C=1 TO LEN(A$)
 CH=ASC(MID$(A$,C,1))
 IF CH>96 THEN MID$(A$,C,1)=CHR$(CH-32)
1015 IF CH=92 THEN MID$(A$,C,1)="/"
1020 NEXT C
 UC=0
 RETURN
1050 'FUNCTION BORDER
1055 CLS
 PRINT CL$BR$
 PRINT TAB((32-LEN(A$))/2)A$
 PRINT BR$
 RETURN

Then, after editing, you can use pack to turn it back in to a compact program that could be tokenized to run on the CoCo.

But that’s still not all…

Re-ID-ing

It can be frustrating making a bunch of variables make sense when you are limited to one or two character variable names. Alex has a script that takes care of that, allowing you to create “unpacked” source code with long variables like this:

10 NUMBER = RND(100)
20 PRINT "I AM THINKING OF A NUMBER FROM"
30 PRINT "1 TO 100. CAN YOU GUESS IT?"
40 INPUT "WHAT IS YOUR GUESS";GUESS
50 IF GUESS > NUMBER THEN PRINT "THAT IS TOO HIGH!" ELSE IF GUESS < NUMBER THEN PRINT "THAT IS TOO LOW!" ELSE PRINT "YOU GOT IT!":END
60 GOTO 40

Doing a “coco_pack.py program.bas“, it will pack the lines together and create short variables replacing the long ones specified in the source code:

0 A=RND(100):PRINT"I AM THINKING OF A NUMBER FROM":PRINT"1 TO 100. CAN YOU GUESS IT?"
1 INPUT"WHAT IS YOUR GUESS";B:IFB>A THENPRINT"THAT IS TOO HIGH!"ELSEIFB<A THENPRINT"THAT IS TOO LOW!"ELSEPRINT"YOU GOT IT!":END
2 GOTO1

But that’s still not all…

You don’t even have to include line numbers for anything except targets of a GOTO/GOSUB. The original program could also be written like this:

NUMBER = RND(100)

PRINT "I AM THINKING OF A NUMBER FROM"
PRINT "1 TO 100. CAN YOU GUESS IT?"

40 INPUT "WHAT IS YOUR GUESS";GUESS

IF GUESS > NUMBER THEN PRINT "THAT IS TOO HIGH!" ELSE IF GUESS < NUMBER THEN PRINT "THAT IS TOO LOW!" ELSE PRINT "YOU GOT IT!":END

GOTO 40

Pack would turn it in to the same functional program:

0 B=RND(100):PRINT"I AM THINKING OF A NUMBER FROM":PRINT"1 TO 100. CAN YOU GUESS IT?"
1 INPUT"WHAT IS YOUR GUESS";A:IFA>B THENPRINT"THAT IS TOO HIGH!"ELSEIFA<B THENPRINT"THAT IS TOO LOW!"ELSEPRINT"YOU GOT IT!":END
2 GOTO1

But that’s still not all…

You don’t even need to use line numbers. A recent update to the script allows the use of a label (word followed by a colon):

NUMBER = RND(100)

PRINT "I AM THINKING OF A NUMBER FROM"
PRINT "1 TO 100. CAN YOU GUESS IT?"

LOOP: INPUT "WHAT IS YOUR GUESS";GUESS

IF GUESS > NUMBER THEN PRINT "THAT IS TOO HIGH!" ELSE IF GUESS < NUMBER THEN PRINT "THAT IS TOO LOW!" ELSE PRINT "YOU GOT IT!":END

GOTO LOOP

And that too turns in to the same packed program. This opens up some great possibilities, such as writing subroutines without having them tied to a particular line number:

SHOWBASICSTART: PRINT PEEK(25)*256+PEEK(26)
    RETURN

SHOWBASICEND: PRINT PEEK(27)*256+PEEK(28)-1
    RETURN

Then you could paste those routines in the code, and do a “GOTO SHOWBASICSTART” to use them!

Some caveats

When writing code like this, line numbers or labels only need to be included if there is a GOTO/GOSUB/etc. that goes to them. The source still needs to follow the flow of BASIC, meaning if you are writing an IF/THEN/ELSE, that should be all on one line

IF A=1 THEN PRINT "A=1" ELSE PRINT "A<>1"

Trying to spit them up like a modern language will not work:

REM *THIS WILL NOT WORK*

INPUT VALUE

10 IF VALUE=1 THEN
    PRINT "IT IS 1"
ELSE
    PRINT "IT IS NOT 1"

The script treats each line like a BASIC line, so the above test would produce:

0 INPUTVALA:IFVALA=1THEN
1 PRINT"IT IS 1":ELSE:PRINT"IT IS NOT 1"

Close, but no cigar! You can see it treats each line like a separate set of statements, and combines them with a colon between each. (Thus, the “:ELSE:”.)

So while it’s not a way to write modern-looking code, it does allow you to use long variable names and skip line numbers except where necessary.

Side Note: Alex has been adding code to allow concatenation long lines by using a backslash at the end. This was not working for me at the time of this writing, but I expect this last section will be.

Conclusion

These scripts open up some interesting possibilities for writing BASIC without having to deal with some of the limitations of basic (short variables names, line numbers, etc.). Just being able to write a program without having to renumber all the time when adding new lines in between existing lines is worth the price alone. (And the price is free, so there’s that.)

There have also been some new options added recently that I will discuss in future articles.

Until then…

Alex Evans’ BASIC UTILS change everything – part 2

4 Replies

See also: part 1, part 2 and part 3.

In the first part of this series, I didn’t explain who Alex Evans is or what his BASIC UTILS are. And I may not in this part, either. Rest assured, Alex Evans fans, I will be getting to this ~~soon~~ eventually.

The focus right now is looking at ways to make BASIC program lines that are longer than the 249 characters we can type in to the 251-byte input buffer. One way to do that is manually.

Previously I showed a three-line BASIC program that did three PRINT commands. By PEEKing through the memory of that program, we could see how it was stored:

Now that I understand how those bytes are stored, it seems it would be easy to copy them somewhere else in memory, and adjust the BASIC pointers for “start of program” and “start of variables” to reference the new location and see if it works.

And it almost does… Here’s a program that does it:

0 ' BASCLONE.BAS
10 ' START OF PROGRAM
20 ST=PEEK(25)*256+PEEK(26)
30 ' START OF VARIABLES
40 EN=PEEK(27)*256+PEEK(28)
50 ' SIZE OF PROGRAM
60 SZ=EN-ST
70 PRINT "THIS PROGRAM IS AT:"
80 PRINT "START: &H";HEX$(ST),"END: &H";HEX$(EN)
85 'PRINT "ARYTAB: &H";HEX$(PEEK(29)*256+PEEK(30)),;
86 'PRINT "ARYEND: &H";HEX$(PEEK(31)*256+PEEK(32))
90 ' NEW START
100 NS=ST+&H1000
110 PRINT "COPYING TO &H";HEX$(NS)
120 ' CLONE TO HIGHER MEMORY
130 FOR I=0 TO SZ
140 POKE NS+I,PEEK(ST+I)
150 NEXT
160 ' SHOW NEW LOCATION
170 PRINT "PROGRAM COPIED TO:"
180 PRINT "START: &H";HEX$(NS),"END: &H";HEX$(NS+SZ)
190 END
200 PRINT "START: &H";HEX$(PEEK(25)*256+PEEK(26)),"END: &H";HEX$(PEEK(27)*256+PEEK(28))

This program will start by showing the start and end of the BASIC program, and then it will copy that memory to 4K higher in memory (current start plus &H1000). After that, it prints the new start/end locations and ENDs. After running, we’d manually do the POKEs to set locations 25/26 and 27/28 to those values, and then we could RUN 200 to see if it works.

I am using HEX so it’s easy to figure out the MSB (most significant byte) and LSB (least significant byte) of the addresses for the later POKEs. You can see that I POKE 25 and 26 to the first two and last two digits of the new “START” address, and the same with 27 and 28 and the new “END” address.

Then I RUN 200 and it prints where it thinks the program is. It works!

Sorta. If I attempt to EDIT this program or add a new line, I end up with a corrupt program. There’s clearly more that needs to be done for this to really work, but it’s a good proof-of-concept.

Rather than figure out what all I need to do to make this work, I tried using the PCLEAR command. I know it will relocate the current BASIC program and adjust variables as needed. By repeating the same steps as before, I can see the “new” program is higher in memory, then by doing a PCLEAR 4 (which is what it was already set to), it relocates the BASIC back to where it should have been. I can then add a line 81 END and RUN it to see it print the location — matching the original.

Okay, that’s cool. Probably not the correct way for it to be cool, but it does appear to work. For me. Sorta.

Creating BASIC where there was none

The real goal here is to create a BASIC program that has a line that goes well beyond the 249 character tapeable line length.

Going back to my earlier example that had PRINT”A”, we can see that the PRINT token was a value of 135 (&H87), and a quote is 34, then the character(s) to print, followed by another quote 34, then the zero at the end of the line. As a simple test, I will try to create a program that PRINTs the hex digits from 0 (&H0) to 255 (&HFF). I’ll add a semicolon at the end of each PRINT so each PRINT is on the same line. The program would look like this:

PRINT"0";:PRINT"1";: ... :PRINT"A";:PRINT"B";: ... PRINT"FF";

As a reminder a BASIC program has the format of…

2 bytes – Address of next line
2 bytes – Line number
n bytes – Tokenized line
1 byte – 0 end of line marker.

To have BASIC create these bytes, I’ll first have it ask for the location in memory to begin creating the BASIC line(s).

Then I’ll use a variable that tracks the address each line starts and remember that so it can be filled in later when the address of the next line is known. I’ll then POKE out the two bytes representing the line number, and then POKE out the bytes for the tokenized line which will be the PRINT token, quote, the ASCII characters for the HEX$() of the number, another quote, then a semicolon and colon. This will repeat for all 255 hex values. Then the 0 will be stored, and after that, whatever address that is will be the start of the next line. That address will be stored back at the first two bytes of the line entry, then the program will end with two zeros marking the end of the program (no address for the next line).

I came up with this program:

0 'IMPOSSIBLE>BAS
10 CLEAR 300
20 PRINT "START: &H";HEX$(PEEK(25)*256+PEEK(26)),"END: &H";HEX$(PEEK(27)*256+PEEK(28))
30 INPUT "START ADDRESS";ST
40 ' CREATE THE IMPOSSIBLE LINE
50 ' LINE START ADDRESS
60 LS=ST
70 ' STORE LINE NUMBER (10)
80 POKE LS+2,0:POKE LS+3,10
90 ' ADDRESS TO STORE DATA
100 AD=LS+4
110 ' STORE REPEATING TOKENS:
120 ' PRINT"x";:
130 ' WHEN "x" IS A HEX NUMBER
140 FOR I=0 TO 255
150 ' BUILD DATA STRING
160 TK$=CHR$(&H87)+CHR$(34)+HEX$(I)+CHR$(34)+";:"
170 'PRINT TK$
180 FOR J=1 TO LEN(TK$)
190 POKE AD,ASC(MID$(TK$,J,1)):AD=AD+1
200 NEXT
210 NEXT
220 ' STORE FINAL TWO BYTES
230 POKE AD,0:POKE AD+1,0
240 ' STORE NEXT LINE ADDRESS
250 MS=INT(AD/256):LS=AD-(MS*256)
260 POKE LS,MS:POKE LS+1,LS
270 ' SHOW RESULTS
280 PRINT "START: &H";HEX$(ST),"END: &H";HEX$(AD+2)

Similar to the previous example, I RUN this program, then wait as it generates 256 PRINT statements on one huge line 10, and then do the POKEs for 25/26 and 27/28 shown on the screen, then a PCLEAR 4 to move this new program back to where it should be:

Once that is done, I can RUN to see it fill the screen with hex digits “0” to “FF” (more than fits on a screen), and attempting to LIST the program shows only one line, but only shows the first 249 or so bytes of it:

Checking the size of the program by doing:

PRINT (PEEK(27)*256+PEEK(28))-(PEEK(25)*256+PEEK(26))

…shows that this ONE LINE program is 1782 bytes! That’s quite the long program considering it’s only one line!

Due to the limit of the 251 byte input buffer, we cannot EDIT this line without losing everything after the 249 bytes we can type. If you EDIT and press ENTER, then LIST, you’ll see the program has been truncated.

But, it proves BASIC does indeed not care about how long a line can be.

Prove it!

By typing CSAVE”IMPOSS”,A (for tape) or SAVE”IMPOSS”,A (for disk), you can save the program out in ASCII (text). You could then transfer that file (using the toolshed decb utility or other similar program) to a PC/Mac and look at it in a text editor. This is what I see:

10 PRINT"0";:PRINT"1";:PRINT"2";:PRINT"3";:PRINT"4";:PRINT"5";:PRINT"6";:PRINT"7";:PRINT"8";:PRINT"9";:PRINT"A";:PRINT"B";:PRINT"C";:PRINT"D";:PRINT"E";:PRINT"F";:PRINT"10";:PRINT"11";:PRINT"12";:PRINT"13";:PRINT"14";:PRINT"15";:PRINT"16";:PRINT"17";:PRINT"18";:PRINT"19";:PRINT"1A";:PRINT"1B";:PRINT"1C";:PRINT"1D";:PRINT"1E";:PRINT"1F";:PRINT"20";:PRINT"21";:PRINT"22";:PRINT"23";:PRINT"24";:PRINT"25";:PRINT"26";:PRINT"27";:PRINT"28";:PRINT"29";:PRINT"2A";:PRINT"2B";:PRINT"2C";:PRINT"2D";:PRINT"2E";:PRINT"2F";:PRINT"30";:PRINT"31";:PRINT"32";:PRINT"33";:PRINT"34";:PRINT"35";:PRINT"36";:PRINT"37";:PRINT"38";:PRINT"39";:PRINT"3A";:PRINT"3B";:PRINT"3C";:PRINT"3D";:PRINT"3E";:PRINT"3F";:PRINT"40";:PRINT"41";:PRINT"42";:PRINT"43";:PRINT"44";:PRINT"45";:PRINT"46";:PRINT"47";:PRINT"48";:PRINT"49";:PRINT"4A";:PRINT"4B";:PRINT"4C";:PRINT"4D";:PRINT"4E";:PRINT"4F";:PRINT"50";:PRINT"51";:PRINT"52";:PRINT"53";:PRINT"54";:PRINT"55";:PRINT"56";:PRINT"57";:PRINT"58";:PRINT"59";:PRINT"5A";:PRINT"5B";:PRINT"5C";:PRINT"5D";:PRINT"5E";:PRINT"5F";:PRINT"60";:PRINT"61";:PRINT"62";:PRINT"63";:PRINT"64";:PRINT"65";:PRINT"66";:PRINT"67";:PRINT"68";:PRINT"69";:PRINT"6A";:PRINT"6B";:PRINT"6C";:PRINT"6D";:PRINT"6E";:PRINT"6F";:PRINT"70";:PRINT"71";:PRINT"72";:PRINT"73";:PRINT"74";:PRINT"75";:PRINT"76";:PRINT"77";:PRINT"78";:PRINT"79";:PRINT"7A";:PRINT"7B";:PRINT"7C";:PRINT"7D";:PRINT"7E";:PRINT"7F";:PRINT"80";:PRINT"81";:PRINT"82";:PRINT"83";:PRINT"84";:PRINT"85";:PRINT"86";:PRINT"87";:PRINT"88";:PRINT"89";:PRINT"8A";:PRINT"8B";:PRINT"8C";:PRINT"8D";:PRINT"8E";:PRINT"8F";:PRINT"90";:PRINT"91";:PRINT"92";:PRINT"93";:PRINT"94";:PRINT"95";:PRINT"96";:PRINT"97";:PRINT"98";:PRINT"99";:PRINT"9A";:PRINT"9B";:PRINT"9C";:PRINT"9D";:PRINT"9E";:PRINT"9F";:PRINT"A0";:PRINT"A1";:PRINT"A2";:PRINT"A3";:PRINT"A4";:PRINT"A5";:PRINT"A6";:PRINT"A7";:PRINT"A8";:PRINT"A9";:PRINT"AA";:PRINT"AB";:PRINT"AC";:PRINT"AD";:PRINT"AE";:PRINT"AF";:PRINT"B0";:PRINT"B1";:PRINT"B2";:PRINT"B3";:PRINT"B4";:PRINT"B5";:PRINT"B6";:PRINT"B7";:PRINT"B8";:PRINT"B9";:PRINT"BA";:PRINT"BB";:PRINT"BC";:PRINT"BD";:PRINT"BE";:PRINT"BF";:PRINT"C0";:PRINT"C1";:PRINT"C2";:PRINT"C3";:PRINT"C4";:PRINT"C5";:PRINT"C6";:PRINT"C7";:PRINT"C8";:PRINT"C9";:PRINT"CA";:PRINT"CB";:PRINT"CC";:PRINT"CD";:PRINT"CE";:PRINT"CF";:PRINT"D0";:PRINT"D1";:PRINT"D2";:PRINT"D3";:PRINT"D4";:PRINT"D5";:PRINT"D6";:PRINT"D7";:PRINT"D8";:PRINT"D9";:PRINT"DA";:PRINT"DB";:PRINT"DC";:PRINT"DD";:PRINT"DE";:PRINT"DF";:PRINT"E0";:PRINT"E1";:PRINT"E2";:PRINT"E3";:PRINT"E4";:PRINT"E5";:PRINT"E6";:PRINT"E7";:PRINT"E8";:PRINT"E9";:PRINT"EA";:PRINT"EB";:PRINT"EC";:PRINT"ED";:PRINT"EE";:PRINT"EF";:PRINT"F0";:PRINT"F1";:PRINT"F2";:PRINT"F3";:PRINT"F4";:PRINT"F5";:PRINT"F6";:PRINT"F7";:PRINT"F8";:PRINT"F9";:PRINT"FA";:PRINT"FB";:PRINT"FC";:PRINT"FD";:PRINT"FE";:PRINT"FF";:

Pretty cool! A one line program that is 1700+ bytes long. Sweet!

There’s got to be an easier way…

And we’ll do that in the next part.

Until then…

Alex Evans’ BASIC UTILS change everything – part 1

1 Reply

See also: part 1, part 2 and part 3.

Or at least a lot of things.

Or perhaps just one thing, but it’s pretty darned spiffy thing.

I’ll get to that ~~shortly~~ eventually. But first…

Typing in BASIC in BASIC

When it came to typing in BASIC, my Commodore VIC-20 had a full screen editor that let you cursor around the screen and type, pretty much, anywhere you wanted. You could cursor up to a command you just typed, change it, then hit enter and run it again. You could LIST a program, cursor up to a line, make a change, hit ENTER and it would be modified.

This was one thing I missed when I switched from my VIC to a 64K Extended Color BASIC CoCo. Fortunately, the EDIT command in Extended BASIC ended up being much faster for me than cursoring around the screen and inserting/deleting things. It sure would have been nice to have both.

Side note: The irony that the computer with full screen editing did not have arrow keys, and the computer that had no full screen editing did have arrow keys, was not lost on me, even as a kid.

The original Color BASIC did not have the EDIT command. If you had an error or typo in a line, your only option was to retype the whole line. Since a 1980 4K CoCo had very little space for BASIC programs, and since each new line took an extra 5 bytes of overhead, I suppose many programmers had to pack lines as much as possible just to make the program fit… For those writing smaller programs, or with upgraded memory (you could get a 16K upgrade!), maybe they stuck to writing shorter lines…especially if they were used to having errors in the program.

Side note: In addition to saving program memory, packing multiple instructions on a line also sped up the program since it no longer had to scan over line numbers moving from instruction to instruction.

Line length limit

When you begin typing a line on the CoCo, everything you type is going in to a buffer in memory. The Color BASIC Unravelled disassembly book labels this buffer as LINBUF, and describes it as follows:

After line header comes LINBUF which is a 251-byte buffer to store BASIC input line as it is being typed in. This 251-byte area is also used for several different functions but primarily it is used as a line input buffer.
Color BASIC Unravelled, page F3

In the disassembly, I see that this buffer is located just a bit before the 512 bytes used by the 32-column screen.

The buffer is at &H2DC (732), followed by a 41 byte “STRING BUFFER” (whatever that is for) at &H3D7 (983) and then the &H200 (512) bytes for video at &H400 (1024).

The disassembly reserves “LBUFMX+1” bytes for this buffer, but even without looking that up, we could figure out how big the buffer is by subtracting the start of the STRBUF after it (&H3D7) from the start of the LINBUF (&H2DC). That gives us 251 bytes. And, indeed, looking up what LBUFMX is, we find it is indeed 250:

…so “LBUFMX+1” would give us the 251.

I like it when the math checks out.

This means when you go to type in a BASIC program line, you shouldn’t be able to type any more than 251 characters. And, actually, it stops you after typing in the 249th character:

Above, you can see I was able to type seven (7) full 32-character lines (224 characters) and then twenty-five (25) more characters on the final line before BASIC stopped me. 224 + 25 is 249, with the cursor sitting at the 250th position. I’d have to look at the code to see why it stops there, since 249 isn’t the 251 I expected.

Something interesting happens when you press ENTER. That line will get tokenized, and BASIC keywords will be changed from the full word (such as “PRINT”) in to a one or two byte token that represents them. In this case, the PRINT keyword will become a one byte token, so the five bytes I typed for PRINT will become one byte. And then if I try to EDIT the line again, I should be able to “X”tend the line and add four more characters:

You can see after I typed “EDIT 10” and then typed “X” to extend to the end of the line, I could type four more characters.

BUT, if I then LIST the program, you won’t see all four of them — only three:

This is a bug in LIST. The four dots actually are still there, and you can see them PRINT when I run this program:

I suppose the point is, no matter what you do, you can’t enter more than 249 characters on a BASIC line.

Or can you?

Defining the limits

What was the limit set to 251? Why not 256 or 200 or something else? It seems to me that the LINBUF length limit may have been arbitrary based on how much memory was available. I suppose back in 1980 on a 4K machine, you didn’t want to take up half your memory for an input buffer that was unused any time you weren’t actually typing stuff in.

But, the actual BASIC interpreter doesn’t seem to care about line length. Looking at the Unravelled disassembly, here is the description of how a BASIC program is stored in memory:

Let’s ignore #1 for the moment. We’ll use this simple program as an example:

10 PRINT"A"
20 PRINT"B"
30 PRINT"C"

If I type that in, somewhere in memory it will be stored. The keyword PRINT will be turned in to a one byte token, and the rest — the quotes and letters — will be be stored as-is. The somewhere we can figure out by checking some memory locations:

Above, TXTTAB represents two bytes in memory at &H19 (25) and &H20 (26) that contain the address where the BASIC program is in memory. Since variables are stored directly after the BASIC program, we can use VARTAB (the start of variables) to figure out where BASIC ends.

PRINT PEEK(25)*256+PEEK(26)

PRINT PEEK(27)*256+PEEK(28)

This shows that my three line program is in memory from 9729 to 9758. Well, actually, 9757 would be the last byte of the BASIC program, since 9758 is the first byte of variable storage. But close enough!

If I were to PEEK the bytes in that range, I could see what the tokenized program looks like.

FOR I=PEEK(25)*256+PEEK(26) TO PEEK(27)*256+PEEK(28):PRINT PEEK(I);:NEXT

Or, print the two sets of PEEKs first and just use those numbers in the FOR loop:

Above we see the series of bytes that make up the BASIC program. In the earlier list…

…number 4 said the program ends with “two zero link bytes”, and we see a 73. Why? Because that 73 is the first byte of the variables after the program. #TheMoreYouKnow

Looking at those bytes, here is what they represent:

38 10 - address of next line
00 10 - line number 10
135   - PRINT keyword token
34    - quote
64    - A
34    - quote
0     - end of line
38 19 - address of next line
00 20 - line number 20
135   - PRINT keyword token
34    - quote
66    - B
34    - quote
0     - end of line
38 28 - address of next line
00 30 - line number 30
135   - PRINT keyword token
34    - quote
67    - C
34    - quote
0     - end of line
00 00 - address of next line (0 0 means end of program)

The “line number” ones are pretty simple. That’s the line number represented as two bytes.

Side note: Two bytes should allow for lines 0 to 65535, but BASIC only allows lines 0 to 63999. If you try to make a line 64000 or higher, you will get a ?SN ERROR. I guess they didn’t have room for a special “Line Number Too Large” error.

The “address of next line” one corresponds to the location in memory where the next line’s “address of next line” bytes will be. Thus, if you had a BASIC program starting in memory at 10000 (to to make the numbers look nice), it might look like this:

 Mem.        +----- 6 bytes of data ------+
 Addr        |                            |
10000: [4006][10][PRINT_TOKEN]["][A]["][00]
10006: [4013][10][PRINT_TOKEN]["][B]["][00]
10013: [40xx][10][PRINT_TOKEN]["][C]["][00]
10019: [0000]

At least, I think that’s pretty close.

You will notice that BASIC knows the address for the start of the next line, and uses a zero to represent end-of-line. There is no “line length” in there, which means BASIC is kind of like the honey badger… it doesn’t care about line length!

This makes me think that the limit is primarily the LINBUF buffer size. If we had a way to type longer lines, BASIC seems like it would handle them just fine. And this gives me a few ideas:

Patch BASIC to use a larger input buffer so longer lines can be typed. This might also require patching other routines I haven’t looked at. For example, I think there’s some limit to what LIST does. This sounds like work and something that requires more knowledge than I have.
Manually manipulate the BASIC program to create larger lines that can’t be typed. Programs such as Carl England’s CRUNCH will pack lines together to make them longer than you could actually type. (But how long? I dig in to this in a later article.)
Something else…

In the next installment, we will explore some of these options…

Until then…

What is a CoCo? ChatGPT has some thoughts…

Leave a reply

Meanwhile, over at chat.openai.com… I told it:

The CoCo is the nickname for the Radio Shack TRS-80 Color Computer. Can you come up with something that “COCO” stands for, that would be related?
– Me, to ChatGPT

It responded with some interesting suggestions…

Interesting question! The CoCo is indeed a nickname for the Radio Shack TRS-80 Color Computer, a line of home computers developed and sold by Tandy Corporation from 1980 to 1991¹. It was based on the Motorola 6809E processor and used the MC6847 Video Display Generator (VDG) chip¹³. It was also compatible with Microsoft BASIC and OS-9 operating system¹.

Some possible things that COCO could stand for, related to this computer, are:

Color Operating Computer Organization: A name for a group of enthusiasts or hobbyists who use or collect CoCo computers.

Colorful Objects with Computational Outputs: A description of the graphical capabilities of the CoCo, which could display up to 16 colors on screen⁴.

Commonly Observed Computerized Oracle: A humorous way of referring to the CoCo as a source of information or entertainment for its users.

Cooperative Online Colorful Operations: A term for multiplayer games or applications that could run on the CoCo using its serial port or modem⁴.

Source: Conversation with Bing, 3/16/2023
(1) TRS-80 Color Computer – Wikipedia. https://en.wikipedia.org/wiki/TRS-80_Color_Computer Accessed 3/16/2023.
(2) TRS-80 Color Computer – CoCopedia – The Tandy/Radio Shack Color …. https://www.cocopedia.com/wiki/index.php/TRS-80_Color_Computer Accessed 3/16/2023.
(3) Radio Shack TRS-80 Color Computer. https://oldcomputers.net/coco.html Accessed 3/16/2023.
(4) trs 80 computer products for sale | eBay. https://www.ebay.com/b/trs-80-computer/bn_7024741177 Accessed 3/16/2023.
– chat.openai.com

And, there was more…

Colorful Output with Creative Options: A way of highlighting the CoCo’s versatility and potential for different types of programs and projects.

Computerized Optical Colorful Oscilloscope: A reference to the CoCo’s ability to display graphics and sound using its VDG chip and 6-bit DAC.

Compact Original Colorful Object: A simple description of the CoCo’s physical appearance and design.

– chat.openai.com

There are only some many words beginning with C or O that make sense for this topic, but it might be fun to keep asking it until we find the ultimate backronym.

Until then…