See also: part 1, part 2, part 3, part 4, part 5, part 6 and part 7.
Just when I thought I was out… they pull me back in.Michael Corleon, Godfather III
Sometimes clever isn’t as good as brute force. In this installment, I’ll present a hybrid approach to the challenge of displaying the Logiker 2022 holiday image.
Instead of writing code to handle each section of the pattern, perhaps taking the simpler approach of just doing Run Length Encoding (sorta) might be smaller. I do not know where I first learned about RLE, but I implemented a simple version in Sub-Etha Software’s graphical “CoCoFEST Simulation” text adventure back in the early 1990s. The images in the graphical adventure were large, and I came up with a simple way to represent repeated data in them. (I do not know if someone gave me this idea, or if I worked it out on my own; in those pre-Internet days, knowledge was alot more organic.)
Basically, RLE looks for repeating sequences and replaces them with a count value and the sequence. The Wikipedia article gives this example:
Consider a screen containing plain black text on a solid white background. There will be many long runs of white pixels in the blank space, and many short runs of black pixels within the text. A hypothetical scan line, with B representing a black pixel and W representing white, might read as follows:
With a run-length encoding (RLE) data compression algorithm applied to the above hypothetical scan line, it can be rendered as follows:
Above, if there are twelve “W” characters in a row, you would encode the value 12 followed by the “W”.
I wrote a simple program that will input a string and then scan it looking for repeating characters. When it finds a new character, it prints the count of the previous characters as well as what character that was.
0 ' RLE.BAS 10 LINE INPUT "STRING:";A$ 20 LC=ASC(LEFT$(A$,1)):L=1 30 FOR I=2 TO LEN(A$) 40 C=ASC(MID$(A$,I,1)) 50 IF C<>LC THEN PRINT L;LC:L=0:LC=C 60 L=L+1:NEXT 70 PRINT L;LC
I pasted this code in to chat.openai.com and it correctly recognized it as an RLE routine! Here is how it described my program:
This looks like a BASIC program that encodes a string using run-length encoding (RLE). RLE is a simple form of data compression, where runs of repeated characters are replaced by a count and the character. For example, the string “AAABBCCC” would be encoded as “3A2B3C”.
The program starts by prompting the user to input a string, and then assigns the first character of the string to the variable
LC(which stands for “last character”). The variable
Lis set to 1, and then a
FORloop is used to iterate through the remaining characters in the string.
For each character in the string, the program checks if it is different from the last character. If it is, it prints the length of the previous run and the character itself, and then resets the length and last character variables. If the character is the same as the last character, the length is incremented. This continues until the end of the string is reached, at which point the final run is printed.
For example, if the input string is “AAABBCCC”, the output would be:
3 65– chat.openai.com
This indicates that there were three instances of the character “A” (ASCII value 65), two instances of the character “B” (ASCII value 66), and three instances of the character “C” (ASCII value 67).
Wow. Just wow.
Here is an example of the program running:
Since the Logiker Challenge image only contains spaces and asterisks, we don’t need a full RLE to encode it. I chose to use alternating numbers to indicate how many spaces, then how many asterisks.
Ultimately, I tried to save code space by breaking the image in to sections, and using two main routines to print that section within FOR/NEXT loops.
But … is that overkill? Instead of using multiple PRINT routines, what if I only needed one? By expanding the image data so each line covers the entire width of the CoCo’s 32-column screen, I could do away with the “end of line” markers in the data, and replace them with a larger series of spaces that goes from the end of the data on that line to the start of the data on the next line:
 " * * " " ** ** " " *** *** " " **** **** " " ***************** " " *************** " " ************* " " *********** " " ********* " " *********** " " ************* " " *************** " " ***************** " " **** **** " " *** *** " " ** ** " " * * " 
Above, at the end of the first line’s asterisks, there are 12 spaces to the end of that line. For the next line, there are 11 spaces to get to the start of the next asterisks. That means after printing the last asterisks in line 1 we can just print 23 spaces and be at the start of the next line.
Assuming we start with a SPACE then an ASTERISK then a SPACE and do on, the data for the first two lines would look like this:
11 - print11 spaces 1 - print 1 asterisk 7 - print 7 spaces 1 - print 1 asterisk 23 - print 23 spaces (to move to the start of data in the second line) 2 - print 2 asterisks 5 - print 5 spaces 2 - print 2 asterisks ...and so on...
I was going to convert all the PRINT lines of the original version I started with to DATA statements and write a program to count this for me, but that sounded complicated. I just counted, and came up with the following numbers:
11 1 7 1 23 2 5 2 3 3 23 4 4 4 23 5 16 15 16 17 18 16 5 2 3 3 23 4 4 4 23 5 18 2 5 2 1 1 7 1
I could store those in a DATA statement:
But, that takes up alot of room. There is a comma between each number, so for 50 numbers we’d be adding 49 commas, basically doubling the size of the data. Also, two digit numbers like 10 take up two bytes. I thought about using HEX numbers (0-15 turns in to 0-F) but the data has some values that are larger than 15 (the highest value that fits in a single character of a HEX value).
HEX is BASE-16 (0-F to represent 0-15) and what I really need is at least BASE-23 (0-23, the larger number I need). Since there are 26 letters in the alphabet, I could use all of them and get BASE-26 leaving me room to spare!
If A=1, B=2 and so on, the above series of numbers could be turned in to:
K A G A W B E B W C C C W D A D S Q P O R M T K V I V K T M R O P Q S D A D W C C C W B E B W A G A
I could then turn those in to DATA:
…and read them as a string (READ A$) and then convert that string to a number by subtracting 63 (ASCII for A is 64, so if I read an A and get 64, subtracting 63 turns that in to 1):
READ A$ V=ASC(A$)-64
While this saves a byte for every number that was two digits, the extra code to convert from ASCII to a number may be larger than what we saved.
Since we have 49 commas, we could get rid of those and add code to parse a long string. As long as that code is smaller than 49 bytes, we come out ahead.
Now I could read that as a string and parse it in to numbers:
0 'STRTONUM.BAS 10 READ A$ 20 FOR I=1 TO LEN(A$) 30 PRINT ASC(MID$(A$,I,1))-64; 40 NEXT 50 DATA KAGAWBEBWCCCWDADSQPORMTKVIVKTMROPQSDADWCCCWBEBWAGA
And, if I want to use that series of numbers in a loop that prints alternating strings of spaces and asterisks, I don’t even need to bother with it being in a DATA statement. I could just embed it directly in the MID$() command and hard code the lengthof the string, like this:
0 'STRTONUM2.BAS 20 FOR I=1 TO 50 30 PRINT ASC(MID$("KAGAWBEBWCCCWDADSQPORMTKVIVKTMROPQSDADWCCCWBEBWAGA",I,1))-64; 40 NEXT
And if I can do that, the only thing left is to figure out when to print a space and when to print an asterisks.
An easy way to do that is looking at the I variable in the FOR/NEXT loop. As it counts from 1 to 2 to 3 to 4, I can use AND to check bit 1. For odd numbers, that bit is set. For even numbers, it is not.
0 = 0000000 1 = 0000001 2 = 0000010 3 = 0000011 4 = 0000100 5 = 0000101 ...and so on...
This means a simple check for “I AND 1” in an IF statement can help me decide which to print. Something like:
IF (I AND 1) THEN PRINT space ELSE PRINT asterisk
That gets me to something like this:
0 ' LOGIKER-ALPHA2.BAS 10 FORI=1TO50 20 L=ASC(MID$("KAGAWBEBWCCCWDADSQPORMTKVIVKTMROPQSDADWCCCWBEBWAGA",I))-64 30 IF I AND 1 THEN PRINT STRING$(L,32); ELSE PRINT STRING$(L,42); 40 NEXT
Perhaps I can get rid of one of those PRINT STRING$ commands… Since I know a space is ASCII 32 and an asterisk is ASCII 42, I could start with the 32 and add 10 if it’s the asterisk case. To do that, I need to see the result that comes back from AND:
PRINT 1 AND 1 1 PRINT 2 AND 1 0
So if the condition is TRUE (bit 1 is set, meaning the value is odd), I get a 1. If the condition is FALSE (bit 1 is clear, meaning the value is even), I get a 0.
Since I want to print spaces on the odd values, I need to use the 1 (odd) to mean 32, and the 0 (even) to mean 42. I’ll reverse my logic a bit and always start with 42 (asterisks) and multiply it by 10 times the result of (I AND 1). Something like this should work:
0 ' LOGIKER-ALPHA3.BAS 10 FOR I=1 TO 50 20 L=ASC(MID$("KAGAWBEBWCCCWDADSQPORMTKVIVKTMROPQSDADWCCCWBEBWAGA",I))-64 30 PRINT STRING$(L,42-(I AND 1)*10); 40 NEXT
And that gives me the pattern I want, with far less code. I can remove unneeded spaces and combine everything in to one line and see how big it is.
A quick thing about unneeded spaces. There are spaces that BASIC itself doesn’t need, but the tokenizer that turns what you type in to the program DO need. For example:
FOR I=100 TO 5000
None of those spaces are needed, because BASIC knows where a keyword ends (FOR) and can tell the variable will be whatever is there before the “=”. The same is true for the numbers, since it can tell where a number ends and know to look for “TO”.
BUT, if you were using variables in that loop…
FOR I=B TO E
…and you took the spaces out:
…how does BASIC know what your variable is? Is it “B”? Or “BT”? Or maybe “BTOE”? You will get an “?SN ERROR” if you try that because BASIC sees a non-number after the “=” and switches to parsing it as if it were a variable. To get around this, we have to put a space after it like this:
That allows the tokenizer to work fine.
However… If you were manually creating the BASIC program by packing bytes together in a file, you could omit that space and it will run just fine. Utilities such as Carl England’s CRUNCH do this trick to save a byte. BUT, if you were to CRUNCH the program then try to EDIT that line, you’d no longer have code that would run because updating the line requires it to be re-tokenized. #TheMoreYouKnow
Why is that important?
I mention this because in my above program, I wanted to remove spaces from this line:
30 PRINT STRING$(L,42-(I AND 1)*10);
I can remove all but one, since I need a space between “I” and “AND” for the same reason I just mentioned:
30 PRINTSTRING$(L,42-(I AND1)*10);
But… instead of “I AND 1” I could change it to “1 AND I” and get the same result, but no longer need the space because BASIC can tell where a number stops:
And that, my friends, is how you save one more byte.
Would it be possible to also get rid of those parenthesis? Right now, I need to take my asterisk value (42) and subtract either 0 or 10. I need the results of “1 AND I” multiplied by 10, and if I removed the parens…
42-1 AND I*10
…BASIC would do the math first (42-1 and I*10) and if “I” was 3 at the time, I would get this:
42-1 AND 3*10 41 AND 30
…and that’s not at all what we want.
Can it be done? I moved things around but it really looks like the result of “1 AND I” has to be in parens. Can you figure a way to save those two bytes?
With that said, I present this version:
10 FOR I=1 TO 50 20 L=ASC(MID$("KAGAWBEBWCCCWDADSQPORMTKVIVKTMROPQSDADWCCCWBEBWAGA",I))-64 30 PRINT STRING$(L,42-(1ANDI)*10); 40 NEXT
…which can be packed in to this version:
And that shows up as 114 bytes!
Oh, one thing I should also mention — during last year’s challenge, a comment was made about how ASC() works. If you give it a string, it returns the ASCII value of the first character. So ASC(“A”) returns 64, just like ASC(“ALLEN”) does. They said instead of using MID$(A$,I,1) to get one character, you can leave off that third parameterand MID$ returns the rest of the string:
A$="HELLO" PRINT MID$(A$,2,1) C PRINT MID$(A$,2) ELLO
If we were trying to print or use just one letter, we need that third parameter. But since I am passing it in to ASC, I could still give it the longer string and it would work fine:
PRINT ASC("E") 69 PRINT ASC("ELLO") 69
Thus, I can leave off that third parameter and save the two bytes that “,1” took up.
Are we done? Can we save any more?
Until next time…
This is really cool! That’s some pretty amazing compression. I don’t think I’ve ever realized that you could just pass any ol’ string into ASC without bothering to parse out one character. Out of curiosity, I changed “PRINTSTRING$…” to “?STRING$…” and noticed that it increased to 119 bytes. I wonder why that change takes 5 additional bytes, but I tried the same thing in the one I was working on and it doesn’t seem to change the byte size whether I use “PRINT” or “?”
On a side note, I noticed a while back that there is a new version of the mTCP telnet program for DOS that supports RLE images. The author says RLE was a pre-GIF Compuserve image format, but he digs pretty deep into the BASIC encoding and decoding. I thought it was an interesting read: http://www.brutman.com/RLE/RLE_Graphics.html If I can stay awake, I may play with making a coco display one of these samples, but I’ll probably be snoozing instead.
I’ll check out that link.
I thought that “?” Just turns in to the same token as PRINT. I’ll have to try that. I have no idea why.
When sizing your versions out, I’d get 115 then I’d get 88. I think it was the variables taking memory. I need to be aware of that moving forward to make sure I am getting proper sizes.
Pingback: The Coco Nation News stories for Episode 294, December 31, 2022 -