See also: part 1, part 2 and part 3.
Or at least a lot of things.
Or perhaps just one thing, but it’s pretty darned spiffy thing.
I’ll get to that shortly eventually. But first…
Typing in BASIC in BASIC
When it came to typing in BASIC, my Commodore VIC-20 had a full screen editor that let you cursor around the screen and type, pretty much, anywhere you wanted. You could cursor up to a command you just typed, change it, then hit enter and run it again. You could LIST a program, cursor up to a line, make a change, hit ENTER and it would be modified.
This was one thing I missed when I switched from my VIC to a 64K Extended Color BASIC CoCo. Fortunately, the EDIT command in Extended BASIC ended up being much faster for me than cursoring around the screen and inserting/deleting things. It sure would have been nice to have both.
Side note: The irony that the computer with full screen editing did not have arrow keys, and the computer that had no full screen editing did have arrow keys, was not lost on me, even as a kid.
The original Color BASIC did not have the EDIT command. If you had an error or typo in a line, your only option was to retype the whole line. Since a 1980 4K CoCo had very little space for BASIC programs, and since each new line took an extra 5 bytes of overhead, I suppose many programmers had to pack lines as much as possible just to make the program fit… For those writing smaller programs, or with upgraded memory (you could get a 16K upgrade!), maybe they stuck to writing shorter lines…especially if they were used to having errors in the program.
Side note: In addition to saving program memory, packing multiple instructions on a line also sped up the program since it no longer had to scan over line numbers moving from instruction to instruction.
Line length limit
When you begin typing a line on the CoCo, everything you type is going in to a buffer in memory. The Color BASIC Unravelled disassembly book labels this buffer as LINBUF, and describes it as follows:
After line header comes LINBUF which is a 251-byte buffer to store BASIC input line as it is being typed in. This 251-byte area is also used for several different functions but primarily it is used as a line input buffer.
Color BASIC Unravelled, page F3
In the disassembly, I see that this buffer is located just a bit before the 512 bytes used by the 32-column screen.
The buffer is at &H2DC (732), followed by a 41 byte “STRING BUFFER” (whatever that is for) at &H3D7 (983) and then the &H200 (512) bytes for video at &H400 (1024).
The disassembly reserves “LBUFMX+1” bytes for this buffer, but even without looking that up, we could figure out how big the buffer is by subtracting the start of the STRBUF after it (&H3D7) from the start of the LINBUF (&H2DC). That gives us 251 bytes. And, indeed, looking up what LBUFMX is, we find it is indeed 250:
…so “LBUFMX+1” would give us the 251.
I like it when the math checks out.
This means when you go to type in a BASIC program line, you shouldn’t be able to type any more than 251 characters. And, actually, it stops you after typing in the 249th character:
Above, you can see I was able to type seven (7) full 32-character lines (224 characters) and then twenty-five (25) more characters on the final line before BASIC stopped me. 224 + 25 is 249, with the cursor sitting at the 250th position. I’d have to look at the code to see why it stops there, since 249 isn’t the 251 I expected.
Something interesting happens when you press ENTER. That line will get tokenized, and BASIC keywords will be changed from the full word (such as “PRINT”) in to a one or two byte token that represents them. In this case, the PRINT keyword will become a one byte token, so the five bytes I typed for PRINT will become one byte. And then if I try to EDIT the line again, I should be able to “X”tend the line and add four more characters:
You can see after I typed “EDIT 10” and then typed “X” to extend to the end of the line, I could type four more characters.
BUT, if I then LIST the program, you won’t see all four of them — only three:
This is a bug in LIST. The four dots actually are still there, and you can see them PRINT when I run this program:
I suppose the point is, no matter what you do, you can’t enter more than 249 characters on a BASIC line.
Or can you?
Defining the limits
What was the limit set to 251? Why not 256 or 200 or something else? It seems to me that the LINBUF length limit may have been arbitrary based on how much memory was available. I suppose back in 1980 on a 4K machine, you didn’t want to take up half your memory for an input buffer that was unused any time you weren’t actually typing stuff in.
But, the actual BASIC interpreter doesn’t seem to care about line length. Looking at the Unravelled disassembly, here is the description of how a BASIC program is stored in memory:
Let’s ignore #1 for the moment. We’ll use this simple program as an example:
10 PRINT"A"
20 PRINT"B"
30 PRINT"C"
If I type that in, somewhere in memory it will be stored. The keyword PRINT will be turned in to a one byte token, and the rest — the quotes and letters — will be be stored as-is. The somewhere we can figure out by checking some memory locations:
Above, TXTTAB represents two bytes in memory at &H19 (25) and &H20 (26) that contain the address where the BASIC program is in memory. Since variables are stored directly after the BASIC program, we can use VARTAB (the start of variables) to figure out where BASIC ends.
PRINT PEEK(25)*256+PEEK(26) PRINT PEEK(27)*256+PEEK(28)
This shows that my three line program is in memory from 9729 to 9758. Well, actually, 9757 would be the last byte of the BASIC program, since 9758 is the first byte of variable storage. But close enough!
If I were to PEEK the bytes in that range, I could see what the tokenized program looks like.
FOR I=PEEK(25)*256+PEEK(26) TO PEEK(27)*256+PEEK(28):PRINT PEEK(I);:NEXT
Or, print the two sets of PEEKs first and just use those numbers in the FOR loop:
Above we see the series of bytes that make up the BASIC program. In the earlier list…
…number 4 said the program ends with “two zero link bytes”, and we see a 73. Why? Because that 73 is the first byte of the variables after the program. #TheMoreYouKnow
Looking at those bytes, here is what they represent:
38 10 - address of next line
00 10 - line number 10
135 - PRINT keyword token
34 - quote
64 - A
34 - quote
0 - end of line
38 19 - address of next line
00 20 - line number 20
135 - PRINT keyword token
34 - quote
66 - B
34 - quote
0 - end of line
38 28 - address of next line
00 30 - line number 30
135 - PRINT keyword token
34 - quote
67 - C
34 - quote
0 - end of line
00 00 - address of next line (0 0 means end of program)
The “line number” ones are pretty simple. That’s the line number represented as two bytes.
Side note: Two bytes should allow for lines 0 to 65535, but BASIC only allows lines 0 to 63999. If you try to make a line 64000 or higher, you will get a ?SN ERROR. I guess they didn’t have room for a special “Line Number Too Large” error.
The “address of next line” one corresponds to the location in memory where the next line’s “address of next line” bytes will be. Thus, if you had a BASIC program starting in memory at 10000 (to to make the numbers look nice), it might look like this:
Mem. +----- 6 bytes of data ------+ Addr | | 10000: [4006][10][PRINT_TOKEN]["][A]["][00] 10006: [4013][10][PRINT_TOKEN]["][B]["][00] 10013: [40xx][10][PRINT_TOKEN]["][C]["][00] 10019: [0000]
At least, I think that’s pretty close.
You will notice that BASIC knows the address for the start of the next line, and uses a zero to represent end-of-line. There is no “line length” in there, which means BASIC is kind of like the honey badger… it doesn’t care about line length!
This makes me think that the limit is primarily the LINBUF buffer size. If we had a way to type longer lines, BASIC seems like it would handle them just fine. And this gives me a few ideas:
- Patch BASIC to use a larger input buffer so longer lines can be typed. This might also require patching other routines I haven’t looked at. For example, I think there’s some limit to what LIST does. This sounds like work and something that requires more knowledge than I have.
- Manually manipulate the BASIC program to create larger lines that can’t be typed. Programs such as Carl England’s CRUNCH will pack lines together to make them longer than you could actually type. (But how long? I dig in to this in a later article.)
- Something else…
In the next installment, we will explore some of these options…
Until then…
The string buffer us actually the string stack. It’s used to keep track of descriptors for anonymous (not stored in variables) strings during expression evaluation. This is needed so garbage collection can know about any extant anonymous strings. If the string stack overflows, you get a ST error (string formula too complex).