See also: part 1, part 2, part 3 and part 4.
A Faster Base-64
I had planned to end this series with this third part, giving a simple way to turn 8-bit value DATA statements into Base-64 DATA statements. But smarter folks than I have looked at my previous work, so now my plans have changed. We will need an extra part or two… or three.
Today, let’s highlight some comments made to previous installments.
The always thought-provoking MiaM wrote:
I would had written out the 2 exponent values directly as 4, 16 and 64 rather than using INT(2^2).
The decdode-4-“chars”-to-3-bytes parts could use a modified base 64 thingy where you have the first 6 bits of three bytes in a row, and the fourth “char” contains the upper two bits for the previous three “chars”/bytes. Or the other way around, start with the “char” that contains the upper two bits for the following three chars and put that in a numerical variable. Then have a loop that runs three times. Each time first shift the “upper two bits” variable two steps left, i.e. multiply by 4, and then read a “char” and to that char OR the result of the “upper two bit” variable ANDed with 192 (=%110000).
That format would be incompatible with the standardized representation used by BASE 64, but it would indeed be a format with the same density and using the same characters.
Btw you could use the “BASE 90” as a way to slightly compress some data, by just having the values >63 represent two instances of the actual value minus 64. That might not save much, but perhaps worth investigating.
MiaM
Very good point! Eliminating “power of two” calculations and changing them to hard coded values should offer a noticeable speed increase. Pre-calculating values (i.e. writing 4 instead of 2^2) is a good way to save some time, and possibly space too (since “2^2” takes up more memory than “4”).
Normally I would go on a Benchmarking BASIC tangent, but I will save that for later.
I was more intrigued by the concept of making an easier to parse Base-64 format. Since the original goal of this article was to cram as much type-able DATA numbers as possible in to a BASIC program, there is nothing that says it needs to follow the standard Base-64 encoding format. Any format that gets more bits of data in to type-able DATA values would suffice.
This opens up an opportunity to tweak the encode/decode method to be easier to do in BASIC. Mia suggests something like this:
Instead of the standard Base-64 encoding of three 8-bit values into four 6-bit values:
+- Byte 1 --+- Byte 2 --+- Byte 3 --+ | 000000|00 | 0000|0000 | 00|000000 | | \__A__/\___B___/ \___C___/ \_D__/
We could alter the encoding into a different version:
+- Byte 1 --+- Byte 2 --+- Byte 3 --+ | 00|000000 | 00|000000 | 00|000000 | | \| \_A__/ \| \_B__/ | |/ \_C__/ | | \___________D__________/
The benefit here is that decoding this in BASIC could be done much easier and faster. Rather than all the multiplication/division needed to shift bits and then combine them into bytes, it could be as simple as this a few ANDs and divides. Here’s a rough example of converting three 8-bit (0-255) input values (A, B and C) into four 6-bit (0-63) output values (O1, O2, O3 and O4).
0 REM 6BIT.BAS
1 REM As proposed by MiaM
10 READ A,B,C
15 REM --XXXXXX of A
20 O1=(A AND &H3F)
25 REM --XXXXXX of B
30 O2=(B AND &H3F)
35 REM --XXXXXX of C
40 O3=(C AND &H3F)
45 REM XX------ of A
50 O4=(A AND &HC0)/4
55 REM XX------ of B
60 O4=O4+(B AND &HC0)/16
65 REM XX------ of C
70 O4=O4+(C AND &HC0)/64
75 PRINT "ENCODED:"
80 PRINT A;B;C,O1;O2;O3;O4
85 A=0:B=0:C=0
90 A=O1+INT(O4 AND &H30)*4
100 B=O2+INT(O4 AND &HC)*16
110 C=O3+INT(O4 AND &H3)*64
120 PRINT "DECODED:"
125 PRINT O1;O2;O3;O4,A;B;C
1000 DATA 111,222,123
Running this program displays:
The three 8-bit input values (111, 222 and 123) are converted into four 6-bit output values, then those four are turned back into three 8-bit values to verify it worked.
The conversion is very simple, since the output values O1, O2 and O3 are just the right 6-bits of the input values A, B, and C, which can be obtained by using AND to mask off the top two bits:
20 O1=(A AND &H3F)
30 O2=(B AND &H3F)
40 O3=(C AND &H3F)
Optimization Note: We could save a few bytes by omitting the parenthesis and the space before the &H3F. Due to how the Color BASIC’s parser works, we need the space between the variables (A, B and C) and the keyword “AND”. That space is what tells the tokenizer we want a variable followed by the keyword AND, versus a variable that starts with “AA”:
A=255 PRINT A AND 4 4 PRINT A AND4 4 PRINT AAND4 0
Above, Color BASIC thinks the third example is a variable called “AAND4” which is truncated to just be “AA” since Color BASIC only cares about the first two characters of a variable name:
A=255 AAND4=42 PRINT AAND4 42 PRINT A AND4 4
Oh the fun bugs that must have caused me back in the day!
But I digress…
The fourth byte is built by doing the opposite AND to get only the top two bits of A, then a divided that by 4 to shift them to the right 2 bits (AA—— to —AA—-), then do the same mask to B and divide by 16 to shift them 4 bits to the right (BB—— to ——BB–) and again for C divided by 64 to shift 6 bits to the right (CC—— to ——CC) and then add them together to make the result (–AABBCC).
50 O4=(A AND &HC0)/4 60 O4=O4+(B AND &HC0)/16 70 O4=O4+(C AND &HC0)/64
I think I did a poor job explaining that. But here it is visually:
INPUT (three 8-bit values): A: aaAAAAAA B: bbBBBBBB C: ccCCCCCC OUTPUT (four 6-bit values): O1: --AAAAAA O2: --BBBBBB O3: --CCCCCC O4: --aabbcc
Maybe that helps.
Doing it this was can be done faster and with less code, I think. Some benchmarking needs to be done to see if AND is faster than addition for combining the values, and the O4 line can just be made as one thing without the intermediate line numbers and steps:
50 O4=(A AND &HC0)/4+(B AND &HC0)/16+(C AND &HC0)/64
We will want to make the decoder as small as possible, since if we save 100 bytes doing BASE-64 over HEX and the decoder takes more than 100 bytes it defeats the purpose.
Maybe we can figure out this “base 90” concept in a future article, as well.
To be continued…