These days, I feel like I am regularly saying “I’ve learned more this week about X than I learned in Y years of using it back in the 1980s!”.
This is another one of those.
Each line of a Color BASIC program is tokenized (changing keywords like PRINT to a one or two byte token representing them) and then stored as follows:
- 2-Bytes – Address in memory where next line starts
- 2-Bytes – Line number (0-63999)
- n-Bytes – Tokenized program line.
- 1-Byte – Zero (0), indicating the end of the line
The four byte header and the 1 byte zero terminator mean that each line has an overhead of 5-bytes. You can see this by printing free memory and then adding a line that has a one byte token, such as “REM” or “PRINT”:
Above, you see the amount of memory decreases by 6 bytes after adding a line. That’s five bytes for the overhead, and one byte for the “REM” token.
The BASIC program starts in memory at a location stored in memory locations 25 and 26. You can see this by typing:
PRINT PEEK(25)*256+PEEK(27)
There are other such addresses that point to where variables start (directly after the program), and where string memory is. Here is an example program from an earlier article I wrote that shows them all. (The comments explain what each location is.)
0 ' BASINFO3.BAS
10 ' START OF BASIC PROGRAM
20 ' PEEK(25)*256+PEEK(26)
30 ' START OF VARIABLES
40 ' PEEK(27)*256+PEEK(28)
50 ' START OF ARRAYS
60 ' PEEK(29)*256+PEEK(30)
70 ' END OF ARRAYS (+1)
80 ' PEEK(31)*256+PEEK(32)
90 ' START OF STRING STORAGE
100 ' PEEK(33)*256+PEEK(34)
110 ' START OF STRING VARIABLES
120 ' PEEK(35)*256+PEEK(36)
130 ' TOP OF STRING SPACE/MEMSIZ
140 ' PEEK(39)*256+PEEK(40)
150 ' USING NO VARIABLES
160 PRINT "PROG SIZE";(PEEK(27)*256+PEEK(28))-(PEEK(25)*256+PEEK(26)),;
170 PRINT "STR SPACE";(PEEK(39)*256+PEEK(40))-(PEEK(33)*256+PEEK(34))
180 PRINT "ARRAY SIZE";(PEEK(31)*256+PEEK(32))-(PEEK(29)*256+PEEK(30)),;
190 PRINT " STR USED";(PEEK(39)*256+PEEK(40))-(PEEK(35)*256+PEEK(36))
200 PRINT " VARS SIZE";(PEEK(29)*256+PEEK(30))-(PEEK(27)*256+PEEK(28)),;
210 PRINT " FREE MEM";(PEEK(33)*256+PEEK(34))-(PEEK(31)*256+PEEK(32))
I thought it might be interesting to write a BASIC program that displays information on each line of the BASIC program. That information would include:
- Start address of the line
- Address of the next line
- Line number of the line
Here is what I came up with. It can use generic PRINT in lines 40 and 70 (for Color BASIC) or a nicer formatted PRINT USING (for Extended Color BASIC) in lines 50 an 80.
0 'BASINFO.BAS 1 REM BASINFO.BAS 2 REMBASINFO.BAS 10 PRINT " ADDR NADDR LINE# SIZ" 20 L=PEEK(25)*256+PEEK(26) 30 NL=PEEK(L)*256+PEEK(L+1) 40 'PRINT L;NL; 50 PRINT USING"##### #####";L;NL; 60 IF NL=0 THEN END 70 'PRINT PEEK(L+2)*256+PEEK(L+3);NL-L 80 PRINT USING" ##### ###";PEEK(L+2)*256+PEEK(L+3);NL-L 90 L=NL:GOTO 30
For this program, as shown, running on a virtual 32K Extended Color BASIC CoCo in the XRoar emulator, I see:
The first column (ADDR) is the address of the BASIC line in memory. After that is the address of where the next line begins (NADDR), and it will match the address shown at the start of the following line. The third column is the line number (LINE#), and last is the size of the line (SIZ) which includes the tokenized line AND the terminating zero byte at the end of it.
The final line has a “next address” of zero, indicating the end of the file.
At the start of the program I included three comments:
0 'BASINFO.BAS
1 REM BASINFO.BAS
2 REMBASINFO.BAS
In the output of the program, you see them described as:
ADDR NADDR LINE# SIZ 9729 9747 0 18 <- [0 'BASINFO.BAS] 9747 9765 1 18 <- [1 REM BASINFO.BAS] 9765 9782 2 17 <- [2 REMBASINFO.BAS]
You can see that the length of lines 0 and 1 are both 18, even though one looks like it should be shorter. In this case, the apostrophe (‘) abbreviation for REM seems to take as much space as “REM ” (with a space after it). This is because the apostrophe is encoded as a “:REM” (colon then REM). Alex Evans recently reminded me of this. This behavior would allow you to use it at the end of a line like this:
10 LINE INPUT A$'ASK FOR USERNAME
…instead of having to do:
10 LINE INPUT A$:REM ASK FOR USERNAME
But don’t do either! REMs at the end of the line can be the worst place to have REMs, since BASIC will have to scan past them to get to the next line, even if they are after a GOTO. This makes them slower. (Reminder to self: do an article on this since I’ve learned more since I original covered the topic in one of my Benchmarking BASIC articles…)
But I digress…
If you wanted to run this on your own program, you could do so by making this routine load at a high line of BASIC (higher than any lines you might be using), then you could save it as ASCII (SAVE”BASINFO”,A) and then use MERGE”BASINFO” (from disk) to bring those lines in to your program.
63000 PRINT " ADDR NADDR LINE# SIZ":L=PEEK(25)*256+PEEK(26)
63001 NL=PEEK(L)*256+PEEK(L+1):PRINT USING"##### #####";L;NL;:IF NL=0 THEN END ELSE PRINT USING" ##### ###";PEEK(L+2)*256+PEEK(L+3);NL-L:L=NL:GOTO 63001
Now you could do RUN 63000 to see what your program looks like. (The highest line number Color BASIC allows is 63999 so you could change that to 63998 and 63999 if you wanted absolutely the most line numbers available for your program ;-)
You could also add “IF L=63000 THEN END” somewhere and have it stop when it hits that routine.
What use is this?
For an upcoming article, I expect to use a version of this code to “prove” something as it relates to BASIC and the length of lines.
But, it might also be fun to generate some statistics — longest line, shortest line, a graph of the different line lengths, etc.
Until next time…
I find this fascinating, too. I even wrote a line-aware hexdump program for CoCo basic files. It ends up looking like this:
10 0004 9e
20 000A 89 20 22 4e 41 4d 45 22 3b 41 24
https://www.hoboes.com/Mimsy/hacks/coco/tokenization/
Hey, that’s cool. Yours can fill in all the bytes I leave out with my line size number!
This is a very cool utility.
Pingback: The Coco Nation News stories for Episode 304, March 11 2023 -