NOTE: This article was started a year or two ago, so some references may no longer be pertinent.
NOTE 2: It was then updated in April 2020 before finally being published in November, so some references in the updates may no longer be relevant.
There is some kind of “10 line BASIC” contest, and one of the categories allows for assembly language as long as it can be embedded in a typeable BASIC program. I previously discussed embedded assembly in BASIC in my Interfacing BASIC with Assembly series.
One of the examples I gave was a Pac-Man maze demo that used some assembly code to scroll the screen up and down. The loader looked like this:
2000 REM 2001 REM LOAD ASSEMBLY ROUTINE 2002 2010 READ A,B 2020 IF A=-1 THEN 2070 2030 FOR C = A TO B 2040 READ D:POKE C,D 2050 NEXT C 2060 GOTO 2010 2070 RETURN 'END 2080 DATA 16128,16217,189,179,237,90,39,14,90,39,28,90,39,42,90,39,55,204,255,255,32,67,142,4,32,166,132,167,136,224,48,1,140,5,255,47,244,32,47,142,5,223,166,132,167,136,32,48,31,140,4,0,44,244,32,30,142,4,1,166,132,167,31,48,1,140,5,255,47,245,32,14 2090 DATA 142,5,254,166,132,167,1,48,31,140,4,0,44,245,204,0,0,126,180,244,-1,-1
That particular bit of BASIC code was created by the LWASM assembler. It reads assembly language values from DATA statements and POKEs them into memory where they can be executed.
In another series, I discussed various ways to use DATA statements either for the fastest reading/loading or the smallest size.
Today, I’d like to revisit this subject and offer some more ways to compress data into DATA to store even more than before.
Or something like that.
Knowing our limitations
We know the BASIC input buffer is 249 characters long, so we should be able to type in a single digit line number, the keyword “DATA” and 244 more characters.
Above, I have a one digit line number (“0”), then four characters for the keyword (“DATA”) then seven full lines of 32 characters (32*7=224) plus 20 characters on the final line – so 1+4+224+20 is 249. Hey, it works!
Since loading a BASIC program saved in ASCII is the same as typing it in, that limit should also apply for loading a non-tokenized ASCII program. I will be using that method here to load test programs into the XRoar emulator, so I won’t be able to cheat and load a tokenized line that would have exceeded our typing limit.
If we encode our assembly code as 2-digit hex values, we should have room to type a single digit line number, the keyword “DATA”, and 122 2-digit hex values. We could enter all the hex digits from &H00 to &H79 on a line. This appears to work:
As soon as we press enter, BASIC tokenizes the “DATA” keyword. It no longer takes up four bytes. This means even though we typed the full 249 characters, we could EDIT the line and perhaps type a few more characters. Typing “EDIT 0″…
…then pressing “X” to extend (go to the end) of the line allows us to add two more digits:
I am guessing DATA is tokenized into a 2-byte token. However, if you add these two characters, then re-list the line:
…we see BASIC does not show the final character. However, if you use the READ command to read that line in to a string, and the PRINT it, you see it’s actually there:
The BASIC LIST command has a limit to how much it will display, it seems, and it is one character less than it should be. Bug?
BASIC line packing
It is possible to create BASIC lines that contain much more data than you can type in. They will run fine, but cannot be fully listed. There were several BASIC “crunch” programs available, including one by Carl England that I used often, that did this trick.
However, if we want to stick to how much someone could type in, we need to limit ourselves to that 249 buffer, and not rely on doing the EDIT trick to add more to it.
If that is the case, the most amount of HEX encoded assembly language bytes you can fit on a BASIC line is 122 per single-digit line. The code to read in those lines and POKE them in to memory can easily fit into one line as well, so we could easily fit a 1098 byte assembly language program into a ten line BASIC program.
And, we could even stick a few more bytes on the end of the loader line. Using a simple test program, I figured I could get the loader plus 1167 bytes of assembly code POKEd into memory. It looked like this:
0DATA000102030405060708090A0B0C0D0E0F101112131415161718191A1B1C1D1E1F202122232425262728292A2B2C2D2E2F303132333435363738393A3B3C3D3E3F404142434445464748494A4B4C4D4E4F505152535455565758595A5B5C5D5E5F606162636465666768696A6B6C6D6E6F707172737475767778 1DATA797A7B7C7D7E7F808182838485868788898A8B8C8D8E8F909192939495969798999A9B9C9D9E9FA0A1A2A3A4A5A6A7A8A9AAABACADAEAFB0B1B2B3B4B5B6B7B8B9BABBBCBDBEBFC0C1C2C3C4C5C6C7C8C9CACBCCCDCECFD0D1D2D3D4D5D6D7D8D9DADBDCDDDEDFE0E1E2E3E4E5E6E7E8E9EAEBECEDEEEFF0F1 2DATAF2F3F4F5F6F7F8F9FAFBFCFDFEFF000102030405060708090A0B0C0D0E0F101112131415161718191A1B1C1D1E1F202122232425262728292A2B2C2D2E2F303132333435363738393A3B3C3D3E3F404142434445464748494A4B4C4D4E4F505152535455565758595A5B5C5D5E5F606162636465666768696A 3DATA6B6C6D6E6F707172737475767778797A7B7C7D7E7F808182838485868788898A8B8C8D8E8F909192939495969798999A9B9C9D9E9FA0A1A2A3A4A5A6A7A8A9AAABACADAEAFB0B1B2B3B4B5B6B7B8B9BABBBCBDBEBFC0C1C2C3C4C5C6C7C8C9CACBCCCDCECFD0D1D2D3D4D5D6D7D8D9DADBDCDDDEDFE0E1E2E3 4DATAE4E5E6E7E8E9EAEBECEDEEEFF0F1F2F3F4F5F6F7F8F9FAFBFCFDFEFF000102030405060708090A0B0C0D0E0F101112131415161718191A1B1C1D1E1F202122232425262728292A2B2C2D2E2F303132333435363738393A3B3C3D3E3F404142434445464748494A4B4C4D4E4F505152535455565758595A5B5C 5DATA5D5E5F606162636465666768696A6B6C6D6E6F707172737475767778797A7B7C7D7E7F808182838485868788898A8B8C8D8E8F909192939495969798999A9B9C9D9E9FA0A1A2A3A4A5A6A7A8A9AAABACADAEAFB0B1B2B3B4B5B6B7B8B9BABBBCBDBEBFC0C1C2C3C4C5C6C7C8C9CACBCCCDCECFD0D1D2D3D4D5 6DATAD6D7D8D9DADBDCDDDEDFE0E1E2E3E4E5E6E7E8E9EAEBECEDEEEFF0F1F2F3F4F5F6F7F8F9FAFBFCFDFEFF000102030405060708090A0B0C0D0E0F101112131415161718191A1B1C1D1E1F202122232425262728292A2B2C2D2E2F303132333435363738393A3B3C3D3E3F404142434445464748494A4B4C4D4E 7DATA4F505152535455565758595A5B5C5D5E5F606162636465666768696A6B6C6D6E6F707172737475767778797A7B7C7D7E7F808182838485868788898A8B8C8D8E8F909192939495969798999A9B9C9D9E9FA0A1A2A3A4A5A6A7A8A9AAABACADAEAFB0B1B2B3B4B5B6B7B8B9BABBBCBDBEBFC0C1C2C3C4C5C6C7 8DATAC8C9CACBCCCDCECFD0D1D2D3D4D5D6D7D8D9DADBDCDDDEDFE0E1E2E3E4E5E6E7E8E9EAEBECEDEEEFF0F1F2F3F4F5F6F7F8F9FAFBFCFDFEFF000102030405060708090A0B0C0D0E0F101112131415161718191A1B1C1D1E1F202122232425262728292A2B2C2D2E2F303132333435363738393A3B3C3D3E3F40 9 L=16128:FORA=0TO9:READA$:FORB=1TOLEN(A$)/2:POKEL,VAL(MID$(A$,B*2,2)):L=L+1:NEXT:NEXT:DATA4142434445464748494A4B4C4D4E4F505152535455565758595A5B5C5D5E5F606162636465666768696A6B6C6D6E6F707172737475767778797A7B7C7D7E7F808182838485868788898A8B8C8D8E 100 BT=0:FORA=0TO9:READA$:FORB=1TOLEN(A$)/2:PRINTMID$(A$,B*2,2);:BT=BT+1:NEXT:NEXT:PRINTBT"BYTES":END
Notice LINE 100. It is not part of the program. I can load this and type “RUN 100” to get it to dump all the data and show me what it would be POKEing into memory, as well as the final byte count. For a contest entry, it would only include lines 0-9. (Using the EXIT/eXtend trick, you could probably get another 9 or 10 bytes total, but they might consider that cheating, and maybe someone could optimize the loader code to make even more room.)
We can do better…
Over on the Facebook group, someone (who was it?) suggested using some alternative encoding to get even more data in to the DATA statement. There are existing forms of doing this with 7-bit characters, using the printable range of characters.
The wikipedia has a page showing many different implementations of binary to text encodings.
Since we need to encode things that can be typed in, this puts some limits on what we can do. Characters 0-32 are control characters, so normal printable text characters start with ASCII 32 (space) and go to ASCII 127.
See https://en.wikipedia.org/wiki/ASCII for the list.
There are some other characters in this range that we cannot type on a CoCo keyboard. Here is the printable CoCo character set:
We can type the inverted (lower case) letters “a” through “z”, but we have no way to type the inverted @ symbol, or the inverted left bracket, backslash, right bracket, up arrow or left arrow.
But, just because they are typeable does not mean we could use them all in a DATA statement. Quote, for instance, is okay if it’s in the middle of a string that is read…
10 READ A$:PRINT A$ 20 DATA ABC"DEF
…but if the quote started up at the beginning of a string of characters, BASIC would skip it and join everything after it together until it sees another quote or an end of line:
10 READ A$:PRINT A$
20 DATA “ABCDEF
Also, comma is a problem. READ separates data using the comma UNLESS the data is quoted. This would only read the first ABC:
10 READ A$:PRINT A$
20 DATA ABC,DEF
…but this would read “ABC,DEF” as one string:
10 READ A$:PRINT A$
20 DATA “ABC,DEF”
So even though we can type about 91 characters, we can’t use all of them in a DATA statement.
Reduce the bass! (Er, base…)
It seems we could achieve a base-90 (?) format by having each line start with a quote (thus we could include a comma in the DATA), but the decoder would have to be much larger.
So instead of that, we’ll turn the base down a bit and do something a bit easier in the next installment.
Until next time…