Earlier in December, @BASIC10Liners on Twitter retweeted a tweet from Tony Lyon. The original tweet contained a photograph of a ZX81 BASIC program listing found in ‘Computer & Video Games’ magazine. The game ran on a 1K machine, and was a Space Invaders-style game, featuring a single invader.
This tweet was picked up and shared in the Tandy CoCo group on Facebook, and led to a number of folks porting it over to the CoCo. Jim Gerrie also ported it to the MC-10:
Jim also did the research to find the issue that the full article appeared in:
For the past few weeks, folks have been sharing optimizations and game improvements on Facebook, and I wanted to highlight one in particular.
Erico Monteiro is known for doing some impressive graphics work using the 8-color 64×32 resolution “mode” of the CoCo. He took the basic framework of the game, then sped it up and added new colors, animations and features.
David Healey, if you are out there, thank you for inspiring so many with your 1981 ZX81 program in that magazine…
While writing up my article on Color BASIC memory, I ran across something I was unaware of. Special thanks to William “Lost Wizard” Astle for his responses on the CoCo mailing list that got me pointed in the right direction…
BASIC memory looks like this:
+-------------+ 0
| SYSTEM USE |
+-------------+ 1024
| TEXT SCREEN |
+-------------+ 1536
| DISK USE |
+-------------+
| HI-RES GFX |
+-------------+
| BASIC PROG |
+-------------+
| VARIABLES |
+-------------+
| ARRAYS |
+-------------+
| FREE MEMORY |
+-------------+
| STRINGS |
+-------------+ 32767
Without Disk BASIC, “DISK USE” is not there, and without Extended BASIC, neither is “HI-RES GFX”.
To determine free memory available to BASIC (for program, variables or arrays), you would subtract the end of “ARRAYS” from the start of “STRINGS.” The locations that store this are at &H1F-&H20 (31-32) and &H21-&H22 (33-34).
Color BASIC Unraveled, page A1.
Each of those two byte locations holds a 16-bit address to some area of RAM. You can get the values by taking the first byte, multiplying it by 256, and adding the second byte.
PRINT "MEMSIZE:";PEEK(&H27)*256+PEEK(&H28)
Here is a program that prints out some pertinent information:
0 ' BASINFO2.BAS
10 ' START OF BASIC PROGRAM
20 BS=PEEK(25)*256+PEEK(26)
30 ' START OF VARIABLES
40 VS=PEEK(27)*256+PEEK(28)
50 ' START OF ARRAYS
60 SA=PEEK(29)*256+PEEK(30)
70 ' END OF ARRAYS (+1)
80 EA=PEEK(31)*256+PEEK(32)
90 ' START OF STRING STORAGE
100 SS=PEEK(33)*256+PEEK(34)
110 ' START OF STRING VARIABLES
120 SV=PEEK(35)*256+PEEK(36)
130 ' TOP OF STRING SPACE/MEMSIZ
140 ST=PEEK(39)*256+PEEK(40)
150 PRINT "PROG SIZE";(VS-BS),"STR SPACE";(ST-SS)
170 PRINT "ARRAY SIZE";(EA-SA)," STR USED";(ST-SV)
180 PRINT " VARS SIZE";(SA-VS)," FREE MEM";(SS-EA)
999 END
And here is that PRINT code, without using any variables. It could be made a subroutine that you could GOSUB to and see status of your memory usage:
There are some bytes missing somewhere. This made me wonder which was correct, so I consulted the PRINT MEM code in Color BASIC Unravelled. It had comments that shed some light on what is going on:
Color BASIC Unraveled, page B34.
“This is not a true indicator of free memory because BASIC requires a STKBUF size buffer for the stack for which MEM does not allow.”
– Color BASIC Unraveled, page B34.
On page A1, I see the definition for STKBUF:
STKBUF EQU 58 STACK BUFFER ROOM
I see code in the ROM that takes this in to consideration, adding the value of ARYEND plus the value of STKBUF.
Color BASIC Unraveled, page B20.
That routine is used to test if enough memory is available for something, and if there isn’t, it returns the ?OM ERROR.
But the difference I get is 12 bytes, not 58. What it BOTSTK? It looks like it is just before the pointer variables I have been working with. I just did not know what it was.
0017 BOTSTK RMB 2 BOTTOM OF STACK AT LAST CHECK
&H17 would be memory location 23. Let’s see where it is by doing…
PRINT PEEK(23)*256+PEEK(24)
That gives me a location about 30 bytes before FRETOP (start of string storage). Here is the program I am using:
0 ' BASPTRS.BAS
10 ' BOTTOM OF STACK
20 SP=PEEK(23)*256+PEEK(24)
30 ' START OF BASIC PROGRAM
40 BS=PEEK(25)*256+PEEK(26)
50 ' START OF VARIABLES
60 VS=PEEK(27)*256+PEEK(28)
70 ' START OF ARRAYS
80 SA=PEEK(29)*256+PEEK(30)
90 ' END OF ARRAYS (+1)
100 EA=PEEK(31)*256+PEEK(32)
110 ' START OF STRING STORAGE
120 SS=PEEK(33)*256+PEEK(34)
130 ' START OF STRING VARIABLES
140 SV=PEEK(35)*256+PEEK(36)
150 ' TOP OF STRING SPACE/MEMSIZ
160 ST=PEEK(39)*256+PEEK(40)
180 PRINT "START OF PROG",BS;(VS-BS)
190 PRINT "START OF VARS",VS;(SA-VS)
200 PRINT "START OF ARRAYS",SA;(EA-SA)
210 PRINT "END OF ARRAYS+1",EA
215 PRINT "BOTTOM OF STACK",SP
229 PRINT "START/STR STORE",SS;(ST-SS)
230 PRINT "START/STR VARS",SV;(ST-SV)
240 PRINT "TOP OF STRINGS",ST
250 PRINT "FREE MEMORY",(SP-EA)
999 END
William clarified a bit of this in a follow-up post.
Actually, currently available memory would be approximately the difference between BOTSTK and ARYEND less the STKBUF amount. MEM neglects to add the STKBUF adjustment in when it calculates the difference. Total usable memory under current PMODE/FILES/CLEAR settings would be the difference between BOTSTK and TXTTAB.
Note that the stack (and consequently the BOTSTK value) will grow downward from FOR loops, GOSUB, and expression evaluation. Expression evaluation can use a surprising amount of stack space depending on how many times it has to save state to evaluate a higher precedence operation, how many function calls are present, etc.
ARYEND is the top of all the memory used by the program itself, PMODE graphics pages, disk buffers and file descriptors, scalar variables, and arrays.
When calculating OM errors, it takes ARYEND, adds the amount of memory requested, adds the STKBUF amount, and then compares that with the current stack pointer. It does the comparison by storing S since you can’t directly compare two registers.
STKBUF is 58 for whatever reason. That’s sufficient to allow for a full interrupt frame (12 bytes) plus a buffer so routines can use the stack for saving registers, routine calls, etc., within reason, without having to continually do OM checks. It does this to prevent corrupting variables when memory is tight. Even so, there may be a few routines in Disk Basic that may still cause the stack to overflow into variable space when memory is very tight.
– William Astle, 7/8/2022
So how can we test? Using the XRoar emulator, I started with a standard 64K CoCo with Disk BASIC. PRINT MEM returns 22832 bytes free.
On startup, a disk-based CoCo has 22823 bytes available for BASIC.
For testing, I’ll create one line of BASIC that is just a REM statement:
10 REM
Now PRINT MEM shows 22817 bytes free. That is 6 bytes less, which is two bytes for where the next line will be (38 7), two bytes for the line number (00 10), the token for REM (130), and a NULL byte (0). Each new line number takes up 5 bytes plus whatever the content of the line is.
Now I want to reserve most of that memory for strings. I’ll do CLEAR 22800. After this, PRINT MEM shows 207… That can’t be right. Shouldn’t it really be 17?
In this case, the original PRINT MEM was already counting 200 bytes reserved for strings. When I did CLEAR 22800, it was release that 200 bytes, then redoing it as 22800 (so, 228600+200).
Let’s start over…
Okay, so let’s redo this. I restarted the CoCo back to where PRINT MEM shows 22823.
The next thing I did was type CLEAR 0 to reserve NO string space, then check PRINT MEM. It shows 23023 — the largest BASIC program I can have on this disk-based machine, unless I deallocate some graphics memory (6K is reserved for that, by default).
Now I enter in the “10 REM” line, and PRINT MEM again. It shows 23017 — 6 bytes less, as expected.
I try to type CLEAR 23017, which gives ?OM ERROR.
I back that off a bit to CLEAR 22960. PRINT MEM shows ?OM ERROR. And now I have hosed BASIC. Even CLEAR 0 now returns ?OM ERROR.
Is this a bug? I can still EDIT my line 10, and LIST it, but while CLEAR by itself works, it changes nothing. I’ve managed to consume so much memory, I can’t run the command I need to give it back.
Let’s start over again…
CoCo reset. CLEAR 0 entered. PRINT MEM shows 22823. 10 REM typed. PRINT MEM shows 23017.
This time I’ll try CLEAR 22950 to give BASIC a bit more room. PRINT MEM shows 67.
If this is accurate, I should be able to add 67 characters to the end of that REM statement.
I type “EDIT 10”, then “X” to extend to the end. I enter ten characters, then ENTER.
10 REM1234567890
PRINT MEM gives me an ?OM ERROR.
Now, the program still LISTS and would run (if it did anything), but clearly other things are broken.
I EDIT 10 and try to add another ten characters.
10 REM12345678901234567890
?OM ERROR.
And now, LINE 10 is completely gone! EDIT must try to “add” a line when it’s done, and if there’s no memory, the line is … erased?
PRINT MEM still shows 67, even though the “10 REM” seems to be gone. Where did that memory go? I try adding a total of 15 characters after the REM:
10 REM123456789012345
That works. PRINT MEM shows ?OM ERROR.
Let’s start over again again.
I reboot the CoCo, then CLEAR 22950. PRINT MEM shows 73 (there is no line number yet).
This “should” mean I could enter a line (5 bytes) that has 73-5=68 characters. But, if I tried to type anything longer than 16, I get ?OM ERROR (REM token is one byte, then 15 characters).
This tells me that when PRINT MEM shows me 67, I really only had 5+16=21 bytes available. 67-21=46 bytes difference.
Looking back at my BASIC POINTERS program output:
…I see the calculated free memory was 22042, versus PRINT MEM showing 22038. That is only four bytes different.
Questions keep stacking up
It is clear you cannot use PRINT MEM to know how much room you will have for a BASIC program. It is more of a guideline, within 50 or so bytes.
The wildcard is probably this mysterious (to me) “stack” that BASIC needs for operation. Depending on what is going on, the stack in-use may be larger or smaller, but the max limit is 58 bytes. And, STKBOT notes that is is “bottom of stack at last check” so it may not be reflecting the actual stack at the particular moment when the code looked at it.
I really don’t have any conclusions from this. When I started writing, I expected I’d find a set of PEEKs I could subtract to get a more accurate PRINT MEM value.
2022-09-19 – John K pointed out a typo in the program listing, which has been corrected.
This one comes from Carl England. We met him at the 1990 Atlanta CoCoFest. That was the first time Sub-Etha Software appeared anywhere, and we couldn’t afford a full booth so we split on with Carl. He will always be the SuperBoot guy to me (my all time favorite utility), but most may know him as the programmer being The Defeater – a disk copy tool that could make clones of copy protected disk. (Download some of his items here.)
Carl popped up in some Facebook comments recently with this odd bit of code.
10 PRINT"THIS LINE IS INVISIBLE":'%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
20 FORI=PEEK(25)*256+PEEK(26)TOPEEK(27)*256+PEEK(28)
30 IFPEEK(I)=37THENPOKEI,8
40 NEXT
Line 10 is a PRINT statement, followed by a comment and a bunch of percent symbols.
Line 20 is a loop that goes from the Start of BASIC to the End of BASIC.
Line 30 uses PEEK to look for a 37, which is the ASCII code for a percent sign. If it finds one, it POKEs it to 8 — a backspace.
Line 40 is just the NEXT for the loop.
If you run this program, it will modify line to and replace all those %’s with backspace characters. Then, if you LIST it, BASIC displays the line then immediately backspaces over it, too quick to see, making it look like there is no line there.
You could RUN this to modify the program, then DELete lines 20-40, and save it to tape or disk. You could then have a program you could load and run that would print a message, but LISTing it would appear as if nothing was there.
Carl said he used this to hide lines in some of his programs, but mentioned you could still see the code if you sent it to a printer using LLIST. Also, if you knew there was a line 10, you could EDIT 10 and SPACE through the line, revealing the code. I suppose you could SPACE until you got to the colon and then Hack the rest of the line, removing those backspace.
Followers of my ramblings know that I enjoy benchmarking BASIC. It is interesting to see how minor changes can produce major speed differences. And while BASIC is supposedly “completely predictable,” there are so many items that must be considered — line numbers, line length, number of variables, amount of strings, etc. — you can’t really look at any bit of code and know how fast it will run unless it’s something self contained like this:
10 FOR A=1 TO 1000
20 NEXT
Beyond things like that, there’s a lot of trail-and-error needed. Code like this:
...
100 FOR A=1 TO 100
110 Z=Z+1
120 NEXT
...
…can have dramatically different speeds depending on how many other variables there are, and where Z was declared in the list of them.
6809 assembly language is far more predictable. Every machine language instruction has a known amount of CPU cycles it takes to operate — based on variations of that code. For example, loading register A with a value:
lda #42
…should be 100% predictable simply by looking up the cycle counts for “load A” in some 6809 reference guide. The Motorola data sheet for the 6809 tells me that “LDA” takes 2 cycles.
So, really, there’s no benchmarking needed. You just have to look up the instructions (and their specific type — direct, indirect, etc.) and add up some numbers.
But where’s the fun in that?
William “Lost Wizard” Astle‘s LWTOOLS provides us with the lwasm assembler for Mac, Windows, Linux, etc. One of its features is cycle counting. It can generate a list of all the machine language bytes that the assembly source code turns in to and, optionally, include the cycle count for each line of assembly code.
I just learned about this and had to experiment…
Here is a simple assembly loop that clears the 32-column screen. I’ll add some comments that explain what it does, as if it were BASIC…
clear
lda #96 * A=96 (green space)
clearwitha
ldx #1024 * X=1024 (top left of screen)
loop
sta ,x+ * POKE X,A:X=X+1
cmpx #1536 * Compare X to 1536
bne loop * If X<>1536, GOTO loop
rts * RETURN
To clear the screen to spaces (character 96), it is called with:
bsr clear
To clear the screen with a different value, such as 128 for black, it can be called like this:
lda #128
bsr clearwitha
LWASM is able to tell me how many CPU cycles each instruction will take. To generate this, you have to include a special pragma command in the source code, or pass it in on the command line. In source code, it is done by using the special “opt” keyword followed by the pragma. The ones we are interested in are listed in the manual:
opt c - enable cycle counts: [8]
opt cd - enable detailed cycle counts breaking down addressing modes: [5+3]
opt ct - show a running subtotal of cycles
opt cc - clear the running subtotal
Adding ” opt c” at the top of the source code will enable it, and then you would use the “-l” command line option to generate the list file which will not contain cycle counts. (You can also send the list output to a file using -lfilename if you prefer.)
You can also pass in this pragma using a command line “–pragma=c”, like this:
lwasm clear.asm -fbasic -oclear.bas --pragma=c -l
Above, I am assembling the program in to a BASIC loader which I can load from BASIC, and then RUN to load the machine language program in to memory. Here is what that command displays for me:
That’s a bit too wide of a listing for my comfort, so from now on I’ll just include the right portion of it — starting with the (number) in parenthesis. That is the cycle count. If you look at the “lda #96” line you will see it confirms “lda” takes two cycles:
(2) lda #96 * A=96 (green space)
Another pragma of interest is one that will start counting the total number of cycles code takes.
opt ct - show a running subtotal of cycles
If you just turned it on, it would be not be very useful since it would just be adding up all the instructions from top to bottom of the source, not taking in to consideration branching or subroutines or loops. But, we can clear that counter and start it at any point by including the “opt” keyword in the code around routines we are interested in.
opt cc - clear the running subtotal
And we can turn them off by putting “no” in front:
opt noct - STOP showing a running subtotal of cycles
In the case of my clear.asm program, I would want to clear the counter and turn it on right at the start of loop, and turn it off at the end of the loop. This would show me a running count of how many cycles that loop takes:
The numbers to the right of the (cycle) count numbers are the sum of all instructions from the moment the counter was enabled.
The code from “loop” to “bne loop” takes 11 cycles. Since each loop sets one byte on the screen, and since there are 512 bytes on the screen, clearing the screen this way will take 11 * 512 = 5632 cycles (plus a few extra before the loop, setting up X and A).
Instead of clearing the screen 8-bits at a time, I learned that using a 16-bit register would be faster. I changed the code to use a 16-bit D register instead of the 8-bit A register, like this:
clear16
lda #96 * A=96
tfr a,b * B=A (D=A*256+B)
clearA16
ldx #1024 * X=1024 (top left of screen)
opt ct,cc * Clear counter, turn it on.
loop16
std ,x++ * POKE X,A:POKE X+1,B:X=X+1
cmpx #1536 * Compare X to 1536.
bne loop16 * If X<>1536, GOTO loop16
opt noct * Turn off counter.
rts * RETURN
Since 16-bit register D is made up of 8-bit registers A and B, I simply transfer whatever is in A to B and that makes both bytes of D the same as A. Then in the loop, I store D at X, and increment it by 2 (to get to the next two bytes). Looking at cycles again…
The code from “loop16” to “bne loop16” takes 13 cycles, which is longer than the original. But, each loop does two bytes instead of one. Instead of needing 512 times through the loop, it only needs 256. 13 * 256 = 3328 cycles. Progress!
And, if we can do 16-bits at a time, why not 32? Currently, D is the value to store, and X is where to store it. We could just store D twice in a row…
clear32
lda #96 * A=96
tfr a,b * B=A (D=A*256+B)
clearA32
ldx #1024 * X=1024 (top left of screen)
opt ct,cc * Clear counter, turn it on.
loop32
std ,x++ * POKE X,A:POKE X+1,B:X=X+2
std ,x++ * POKE X,A:POKE X+1,B:X=X+2
cmpx #1536 * Compare X to 1536.
bne loop32 * If X<>1536, GOTO loop32
opt noct * Turn off counter.
rts * RETURN
Let’s see what that does…
clear32
(2) lda #96 * A=96
(4) tfr a,b * B=A (D=A*256+B)
clearA32
(3) ldx #1024 * X=1024 (top left of screen)
opt ct,cc * Clear counter, turn it on.
loop32
(7) 7 std ,x++ * POKE X,A:POKE X+1,B:X=X+2
(7) 14 std ,x++ * POKE X,A:POKE X+1,B:X=X+2
(3) 17 cmpx #1536 * Compare X to 1536.
(3) 20 bne loop32 * If X<>1536, GOTO loop32
opt noct * Turn off counter.
(4) rts * RETURN
Above, the “loop32” to “bne loop32” takes 20 cycles. Each loop does four bytes, so only 128 times through the loop to clear all 512 bytes of the screen. 20 * 128 = 2560 cycles. More than double the speed of the original one byte version.
We could do 48-bits at a time by storing three times, but that math doesn’t work out since 512 is not divisible by 6 (I get 85.33333333). Perhaps we could do the loop 85 times to clear the first 510 bytes (6 * 85 = 510), then manually do one last 16-bit store to complete it. Maybe like this:
clear48
lda #96 * A=96
tfr a,b * B=A (D=A*256+B)
clearA48
ldx #1024 * X=1024 (top left of screen)
opt ct,cc * Clear counter, turn it on.
loop48
std ,x++ * POKE X,A:POKE X+1,B:X=X+2
std ,x++ * POKE X,A:POKE X+1,B:X=X+2
std ,x++ * POKE X,A:POKE X+1,B:X=X+2
cmpx #1536 * Compare X to 1536.
bne loop48 * If X<>1536, GOTO loop32
opt noct * Turn off counter.
std ,x * POKE X,A:POKE X+1,B:X=X+2
rts * RETURN
We have jumped to 27 cycles per loop. Each loop stores 6 bytes, and it takes 85 times to get 510 bytes, plus 5 extra after it is over for the last two bytes. 27 * 85 = 2295 cycles + 5 = 2300 cycles! We are still moving in the right direction.
Just for fun, what if we did four stores, 8 bytes at a time?
clear64
lda #96 * A=96
tfr a,b * B=A (D=A*256+B)
clearA64
ldx #1024 * X=1024 (top left of screen)
opt ct,cc * Clear counter, turn it on.
loop64
std ,x++ * POKE X,A:POKE X+1,B:X=X+2
std ,x++ * POKE X,A:POKE X+1,B:X=X+2
std ,x++ * POKE X,A:POKE X+1,B:X=X+2
std ,x++ * POKE X,A:POKE X+1,B:X=X+2
cmpx #1536 * Compare X to 1536.
bne loop64 * If X<>1536, GOTO loop32
opt noct * Turn off counter.
rts * RETURN
34 cycles stores 8 bytes. 64 times through the loop to do all 512 screen bytes, so 64 * 34 = 2176 cycles.
By now, I think you can see where this is going. I believe this is called “loop unrolling”, since, if you wanted the fewest cycles, you could just code 256 “std ,x++” in a row (7 * 256) for 1792 cycles which would be fast but bulky code (each std ,x++ takes two bytes, so 512 bytes just for this copy routine).
There is always some balance between code size and speed. Larger programs took longer to load from tape or disk. But, if you didn’t mind load time, and you had extra memory available, tricks like this could really speed things up.
Blast it…
I have also read about “stack blasting” where you load values in to registers and then, instead of storing each register, you set a stack pointer to the destination and just push the registers on the stack. I’ve never done that before. Let’s see if we can figure it out.
There are two stacks in the 6809 — one is the normal one used by the program (SP, I believe is the register?), and the other is the User Stack (register U). If we aren’t using it for a stack, we can use it as a 16-bit register, too.
The stack grows “up”, so if the stack pointer is 5000, and you push an 8-bit register, the pointer will move to 4999 (pointing to the most recent register pushed). If you then push a 16-bit register, it will move to 4997. This means it will have to work in reverse from our previous examples. By pointing the stack register to the end of the screen, we should be able to push registers on to the stack causing it to grow “up” to the top of the screen.
At first glance, it doesn’t look promising, since pushing D on to the user stack (U) takes more cycles than storing D at U:
(5) std ,u
(6) pshu d
But, it seems we make that up when pushing multiple registers since the cycle count does not grow as much as multiple stores do:
I also I see that STY is one cycle longer than STD or STX. This tells me to maybe avoid using Y like this…?
It looks good, though. 22 cycles compared to 10 seems quite the win. Let me see if I can do a clear routine using the User stack pointer and three 16-bit registers. We’ll compare this to the 48-bit clear shown earlier.
clear48s
lda #96 * A=96
clearA48s
tfr a,b * B=A (D=A*256+B)
tfr d,x * X=D
tfr d,y * Y=D
ldu #1536 * U=1536 (1 past end of screen)
opt ct,cc * Clear counter, turn it on.
loop48s
pshu d,x,y
cmpu #1026 * Compare U to 1026 (two bytes from start).
bgt loop48s * If X<>1026, GOTO loop48s.
opt noct * Turn off counter.
pshu d * Final 2 bytes.
rts * RETURN
And the results are…
clear48s
(2) lda #96 * A=96
clearA48s
(4) tfr a,b * B=A (D=A*256+B)
(4) tfr d,x * X=D
(4) tfr d,y * Y=D
(3) ldu #1536 * U=1536 (1 past end of screen)
opt ct,cc * Clear counter, turn it on.
loop48s
(10) 10 pshu d,x,y
(4) 14 cmpu #1026 * Compare U to 1026 (two bytes from start).
(3) 17 bgt loop48s * If X>1026, GOTO loop48s
opt noct * Turn off counter.
(6) pshu d * Final 2 bytes.
(4) rts * RETURN
From “loop48s” to “bgt loop48s” we end up with 17 cycles compared to 27 using the std method. 85 * 17 = 1445 cycles + 6 final cycles = 1551 cycles. It looks like using stack push/pulls might be a real nice way to do this type of thing, provided the user stack is available, of course.
When setting a register to zero, I have been told to use “CLR” instead of “LDx #0”. Let’s see what that is all about…
(2) lda #0
(1) clra
(3) ldd #0
(2) clrd
Ah, now know a CLRA is twice as fast as LDA #0, and CLRD is one cycle faster than LDD #0. Nice.
Other 16-bit registers such as X, Y, and U do not have a CLR op code, so LDx will be have to be used there, I suppose.
I then wondered if it made more sense to CLR a memory location, or clear a register then store that register there.
(6) clr 1024
(1) clra
(4) sta 1024
It appears in this case, it is less cycles to clear a register then store it in memory. Interesting. And using a 16-bit value:
(3) ldd #0
(5) std 1024
That is one cycle faster than doing a “clra / sta 1024 / sta 1025” it seems. It is also one byte less in size, so win win.
There is a lot to learn here, and from these experiments, I’m already seeing some things are not like I would have guessed.
I hope this inspires you to play with these LWASM options and see what your code is doing. During the writing of this article, I learned how to use that User Stack, and I expect that will come in handy if I decide to do any updates to my Invaders09 game some day…
Over on REDDIT, the subject of the gaping Insta360 WiFi security hole has come up again. User K1N6P1X3l linked to this recent article that summarizes the issue’s history:
Of course, the majority of users simply will not care. “It’s very unlikely to happen to me,” they say. “And if it does, so what, they get my photos.”
In other words, this is just like any other security issue out there. Some folks treat them seriously and take steps to avoid the problems, and others just don’t care. If it were not for the “don’t care” crowd, we wouldn’t have such great malware, viruses and ransomware :)
The exploit allows anyone within WiFi range the ability to connect to your camera and do “stuff.” According to the PetaPixel article, Insta360 has already plugged some of this:
Currently the list_directory has already been terminated and it is no longer possible to access the camera content through the browser.
– Insta360 response, per PetaPixel article
Unfortunately, this is not true. Using the current available firmware, v1.0.59_build1, on my ONE X2 I see it appear on WiFi as expected:
…and if I select this interface, and then open the URL in a browser, I find all my files are indeed able to be listed:
I can then click on one of the .insv video files and play it (or save off a copy):
“And it’s just that easy!”
With this camera, there is no privacy because the camera is broadcasting itself to any WiFi devices around it, and allowing any of them to connect without authentication and then browse and view/download anything on the microSD card.
What’s worse is you can also telnet in to the device. I tried that to see if it still worked:
You will notice it did not ask for a password here, either.
Both of these screenshots (web browser and telnet) were done on my iPad.
WHILE I was connected to my X2 via the iPad, I then connected a Windows PC. Using the default password of “88888888” I was now connected from two devices (which for some reason folks think isn’t possible). Both my Windows PC and my iPad were connected and able to access the files from a web browser.
At least two firmware updates have come out since this first appeared on REDDIT, and it does not appear anything has changed.
ScrewX2: The proof-of-concept that a script kiddie could have written
Shortly after this exploit was first mentioned, someone could have easily created a script that would look for WiFi hotspots following the name “ONE X2 xxxxx” and connect to them. The script could then issue http GET commands to retrieve files on the memory card, or telnet in to delete things, potentially bricking the camera.
Worse, the script could deliver a payload of malware, and if the user ever mounted that memory card in a computer, the malware could have been ran accidentally. Hopefully most of us will never run some random executable or installer found on a camera memory card, but it would be tempting to try to find out what “360VIEWER.EXE” does, or “Insta360MacConverter.dmg” is.
I will not link to any such “screwx2” script, and will delete any links to such posted in the comments here. The cat is firmly out of the bag, with the default WiFi password known, and NO password on the web interface or telnet, so all we can do is hope that Insta360 addresses this issue eventually.
A year or two ago, I ran across some C code at my day that finally got me to do an experiment…
When I was first using a modem to dial in to BBSes, it was strictly a text-only interface. No pictures. No downloads. Just messages. (Heck a physical bulletin board at least would let you put pictures on it! Maybe whoever came up with the term BBS was just forward thinking?)
The first program I ever had that sent a program over the modem was DFT (direct file transfer). It was magic.
Later, I got one that used a protocol known as XMODEM. It seems like warp speed compared to DFT!
XMODEM would send a series of bytes, followed by a checksum of those bytes, then the other end would calculate a checksum over the received bytes and compare. If they matched, it went on to the next series of bytes… If it did not, it would resend those bytes.
Very simple. And, believe it or not, checksums are still being used by modern programmers today, even though newer methods have been created (such as CRC).
Checking the sum…
A checksum is simple the value you get when you add up all the bytes of some data. Checksum values are normally not floating point, so they will be limited to a fixed range. For example, an 8-bit checksum (using one byte) can hold a value of 0 to 255. A 16-bit checksum (2 bytes) can hold a value of 0-65535. Since checksums can be much higher values, especially if using an 8-bit checksum, the value just rolls over.
For example, if the current checksum calculated value is 250 for an 8-bit checksum, and the next byte being counted is a 10, the checksum would be 250+10, but that exceeds what a byte can hold. The value just rolls over, like this:
250 + 10: 251, 252, 253, 254, 255, 0, 1, 2, 3, 4
Thus, the checksum after adding that 10 is now 4.
Here is a simple 8-bit checksum routine for strings in Color BASIC:
0 REM CHKSUM8.BAS
10 INPUT "STRING";A$
20 GOSUB 100
30 PRINT "CHECKSUM IS";CK
40 GOTO 10
100 REM 8-BIT CHECKSUM ON A$
110 CK=0
120 FOR A=1 TO LEN(A$)
130 CK=CK+ASC(MID$(A$,A,1))
140 IF CK>255 THEN CK=CK-255
150 NEXT
160 RETURN
Line 140 is what handles the rollover. If we had a checksum of 250 and the next byte was a 10, it would be 260. That line would detect it, and subtract 255, making it 4. (The value starts at 0.)
The goal of a checksum is to verify data and make sure it hasn’t been corrupted. You send the data and checksum. The received passes the data through a checksum routine, then compares what it calculated with the checksum that was sent with the message. If they do not match, the data has something wrong with it. If they do match, the data is less likely to have something wrong with it.
Double checking the sum.
One of the problems with just adding (summing) up the data bytes is that two swapped bytes would still create the same checksum. For example “HELLO” would have the same checksum as “HLLEO”. Same bytes. Same values added. Same checksum.
A good 8-bit checksum.
However, if one byte got changed, the checksum would catch that.
A bad 8-bit checksum.
It would be quite a coincidence if two data bytes got swapped during transfer, but I still wouldn’t use a checksum on anything where lives were at stake if it processed a bad message because the checksum didn’t catch it ;-)
Another problem is that if the value rolls over, that means a long message or a short message could cause the same checksum. In the case of an 8-bit checksum, and data bytes that range from 0-255, you could have a 255 byte followed by a 1 byte and that would roll over to 0. A checksum of no data would also be 0. Not good.
Checking the sum: Extreme edition
A 16-bit or 32-bit checksum would just be a larger number, reducing how often it could roll over.
For a 16-bit value, ranging from 0-65535, you could hold up to 257 bytes of value 255 before it would roll over:
255 * 257 = 65535
But if the data were 258 bytes of value 255, it would roll over:
255 * 258 = 65790 -> rollover to 255.
Thus, a 258-byte message of all 255s would have the same checksum as a 1-byte message of a 255.
To update the Color BASIC program for 16-bit checksum, change line 140 to be:
140 IF CK>65535 THEN CK=CK-65535
Conclusion
Obviously, an 8-bit checksum is rather useless, but if a checksum is all you can do, at least use a 16-bit checksum. If you were using the checksum for data packets larger than 257 bytes, maybe a 48-bit checksum would be better.
Or just use a CRC. They are much better and catch things like bytes being out of order.
But I have no idea how I’d write one in BASIC.
One more thing…
I almost forgot what prompted me to write this. I found some code that would flag an error if the checksum value was 0. When I first saw that, I thought “but 0 can be a valid checksum!”
For example, if there was enough data bytes that caused the value to roll over from 65535 to 0, that would be a valid checksum. To avoid any large data causing value to add up to 0 and be flagged bad, I added a small check for the 16-bit checksum validation code:
if ((checksum == 0) && (datasize < 258)) // Don't bother doing this.
{
// checksum appears invalid.
}
else if (checksum != dataChecksum)
{
// checksum did not match.
}
else
{
// guess it must be okay, then! Maybe...
}
But, what about a buffer full of 00s? The checksum would also be zero, which would be valid.
On August 21, 1994 I began writing a space invaders game for the Radio Shack TRS-80 Color Computer. The game was written in 6809 assembly language, and ran under the Microware OS-9 operating sytem as opposed to the ROM-based Disk Extended Color BASIC.
It did not initially start out as an OS-9 game. It started out as a NitrOS9 game. NitrOS9 was (and still is) a greatly optimized and enhanced version of the stock OS-9 for the Color Computer. It was initially a set of patches that took advantage of the hidden features of the Hitatchi 6309 chip. Many of us did CPU swaps in our CoCo 3s specifically to be able to run this faster version of OS-9. Years later, NitrOS-9 was backported to run on a stock 6809 and the project continues today with the Ease of Use edition where it comes ready to run and bundled with all kinds of utilities, applications, and games. (I think my game is even on there.)
The reason I chose to write a game was after learning about a new system call that NitrOS9 added. It allowed mapping in graphics screen memory so a program could directly access it — just like from BASIC. With that in mind, I wrote a simple demo that had a peace-sign space ship that could move left and right and fire (multishots!), as well as a scrolling star background.
I believe my game demo source might have been published in The International OS-9 Underground magazine at some point.
As soon as I figure out how to make WordPress allow uploading a .asm source file, I’ll share it here.
But I digress. Again.
Invaders09 Secrets
Version 1.00 was completed on September 24, 1994. It was first sold at the 1994 Atlanta CoCoFest. I don’t remember how many copies the game sold over its lifetime, but I do know it was not enough to retire on. :)
On December 26, 1994, version 1.01 was released. This contained code by Robert Gault that allowed the game to work on machines with more than 512K memory. (Robert was also responsible for code that allowed the game to work on stock OS-9, as well.)
A big update happened on January 29, 1995, when the game was upgraded from a 4-color screen to glorious 16 colors.
1.03 was completed on February 4, 1995 and included bug fixes.
Almost twenty years later, to the day, I did a 1.04 update. The title screen text removed my old P.O. Box from Texas, and replaced it with an e-mail address. I also added the “secret” command line option to the help screen, so it would no longer be secret. There had also been a bug that caused the fonts to sometimes fail to load, which I found and fixed. There were also some bad bits in the graphics I had never noticed (but could see clearly on a modern monitor) which were corrected.
Something old. Something new?
I pulled up this source code today and was looking at it to see what all I’d have to do to convert it to run under Disk Extended Color BASIC. I’d have to learn about keyboard and joystick reading in assembly, as well as how to map in graphics screens. I’d also have to take care of the blips and boops, and create my own graphical text engine for displaying game and title screen messages.
I don’t know how to do any of that, yet.
But I did discover something I have no recollection of… The game contains its own font data, which it loads when the game first runs. (Note to self: Better check and make sure the game cleans that font up and deallocates it when the game exits.)
The font data is a series of fcb byte entries like these:
I went there, and was able to recreate this “hidden stuff” in the font:
https://petscii.krissz.hu/
I had hidden a teeny tiny “42” in the font character set… Something no one would ever see, and that I had forgotten about.
Sub-Etha Software had other hidden 42s in other programs we distributed. I bet I’ve forgotten about some of them, as well…
But wait, there’s more … BASIC!
I took the code I wrote to display VIC-20 font data on a CoCo and updated it a bit, with this font data.
Invaders09 font data displayed on PMODE 0 under Extended Color BASIC.
You can adjust the WD variable in line 10 based on what PMODE you want to see it in. Change that to 32 and PMODE 4 and you get it in the size it would be on a CoCo 32-column screen. Use 16 and that will work with PMODE 0 or PMODE 2. (PMODE 1 and 3 are color modes and just look weird since they take the 8 bits and turn them in to four 2-bit color pixels).
Enjoy…
0 REM INVADERS09 CHARSET
10 WD=16 '16=PMODE 0/2, 32=4
20 PMODE 0,1:PCLS:SCREEN 1,1
30 L=1536+2048:C=0
40 FOR R=0 TO 7:READ D:IF D=-1 THEN 999
50 POKE L+(WD*R),D:NEXT
60 L=L+1:C=C+1:IF C>=WD THEN C=0:L=L+(WD*8)
70 GOTO 40
999 GOTO 999
1000 ' hidden stuff in the font! :)
1010 DATA 0,87,81,119,20,23,0,0
1020 DATA 0,87,81,119,20,23,0,0
1030 DATA 0,87,81,119,20,23,0,0
1040 DATA 0,87,81,119,20,23,0,0
1050 DATA 0,87,81,119,20,23,0,0
1060 DATA 0,87,81,119,20,23,0,0
1070 DATA 0,87,81,119,20,23,0,0
1080 DATA 0,87,81,119,20,23,0,0
1090 DATA 0,87,81,119,20,23,0,0
1100 DATA 0,87,81,119,20,23,0,0
1110 DATA 0,87,81,119,20,23,0,0
1120 DATA 0,87,81,119,20,23,0,0
1130 DATA 0,87,81,119,20,23,0,0
1140 DATA 0,87,81,119,20,23,0,0
1150 DATA 0,87,81,119,20,23,0,0
1160 DATA 0,87,81,119,20,23,0,0
1170 DATA 0,87,81,119,20,23,0,0
1180 DATA 0,87,81,119,20,23,0,0
1190 DATA 0,87,81,119,20,23,0,0
1200 DATA 0,87,81,119,20,23,0,0
1210 DATA 0,87,81,119,20,23,0,0
1220 DATA 0,87,81,119,20,23,0,0
1230 DATA 0,87,81,119,20,23,0,0
1240 DATA 0,87,81,119,20,23,0,0
1250 DATA 0,87,81,119,20,23,0,0
1260 DATA 0,87,81,119,20,23,0,0
1270 DATA 0,87,81,119,20,23,0,0
1280 DATA 0,87,81,119,20,23,0,0
1290 DATA 0,87,81,119,20,23,0,0
1300 DATA 0,87,81,119,20,23,0,0
1310 DATA 0,87,81,119,20,23,0,0
1320 DATA 0,87,81,119,20,23,0,0
1330 ' 32 (space)
1340 DATA 0, 0, 0, 0, 0, 0, 0, 0
1350 DATA 16, 16, 24, 24, 24, 0, 24, 0
1360 DATA 102, 102, 204, 0, 0, 0, 0, 0
1370 DATA 68, 68, 255, 68, 255, 102, 102, 0
1380 DATA 24, 126, 64, 126, 6, 126, 24, 0
1390 DATA 98, 68, 8, 16, 49, 99, 0, 0
1400 DATA 62, 32, 34, 127, 98, 98, 126, 0
1410 DATA 56, 56, 24, 48, 0, 0, 0, 0
1420 DATA 12, 24, 48, 48, 56, 28, 12, 0
1430 DATA 48, 56, 28, 12, 12, 24, 48, 0
1440 DATA 0, 24, 36, 90, 36, 24, 0, 0
1450 DATA 0, 24, 24, 124, 16, 16, 0, 0
1460 DATA 0, 0, 0, 0, 0, 48, 48, 96
1470 DATA 0, 0, 0, 126, 0, 0, 0, 0
1480 DATA 0, 0, 0, 0, 0, 48, 48, 0
1490 ' 47 /
1500 DATA 2, 2, 4, 24, 48, 96, 96, 0
1510 DATA 126, 66, 66, 70, 70, 70, 126, 0
1520 DATA 8, 8, 8, 24, 24, 24, 24, 0
1530 DATA 126, 66, 2, 126, 96, 98, 126, 0
1540 DATA 124, 68, 4, 62, 6, 70, 126, 0
1550 DATA 124, 68, 68, 68, 126, 12, 12, 0
1560 DATA 126, 64, 64, 126, 6, 70, 126, 0
1570 DATA 126, 66, 64, 126, 70, 70, 126, 0
1580 DATA 62, 2, 2, 6, 6, 6, 6, 0
1590 DATA 60, 36, 36, 126, 70, 70, 126, 0
1600 DATA 126, 66, 66, 126, 6, 6, 6, 0
1610 DATA 0, 24, 24, 0, 24, 24, 0, 0
1620 DATA 0, 24, 24, 0, 24, 24, 48, 0
1630 DATA 6, 12, 24, 48, 28, 14, 7, 0
1640 DATA 0, 0, 126, 0, 126, 0, 0, 0
1650 DATA 112, 56, 28, 6, 12, 24, 48, 0
1660 DATA 126, 6, 6, 126, 96, 0, 96, 0
1670 ' 64
1680 DATA 60, 66, 74, 78, 76, 64, 62, 0
1690 DATA 60, 36, 36, 126, 98, 98, 98, 0
1700 DATA 124, 68, 68, 126, 98, 98, 126, 0
1710 DATA 126, 66, 64, 96, 96, 98, 126, 0
1720 DATA 124, 66, 66, 98, 98, 98, 124, 0
1730 DATA 126, 64, 64, 124, 96, 96, 126, 0
1740 DATA 126, 64, 64, 124, 96, 96, 96, 0
1750 DATA 126, 66, 64, 102, 98, 98, 126, 0
1760 DATA 66, 66, 66, 126, 98, 98, 98, 0
1770 DATA 16, 16, 16, 24, 24, 24, 24, 0
1780 DATA 4, 4, 4, 6, 6, 70, 126, 0
1790 DATA 68, 68, 68, 126, 98, 98, 98, 0
1800 DATA 64, 64, 64, 96, 96, 96, 124, 0
1810 DATA 127, 73, 73, 109, 109, 109, 109, 0
1820 DATA 126, 66, 66, 98, 98, 98, 98, 0
1830 DATA 126, 66, 66, 98, 98, 98, 126, 0
1840 DATA 126, 66, 66, 126, 96, 96, 96, 0
1850 DATA 126, 66, 66, 66, 66, 78, 126, 0
1860 DATA 124, 68, 68, 126, 98, 98, 98, 0
1870 DATA 126, 66, 64, 126, 6, 70, 126, 0
1880 DATA 126, 16, 16, 24, 24, 24, 24, 0
1890 DATA 66, 66, 66, 98, 98, 98, 126, 0
1900 DATA 98, 98, 98, 102, 36, 36, 60, 0
1910 DATA 74, 74, 74, 106, 106, 106, 126, 0
1920 DATA 66, 66, 66, 60, 98, 98, 98, 0
1930 DATA 66, 66, 66, 126, 24, 24, 24, 0
1940 DATA 126, 66, 6, 24, 96, 98, 126, 0
1950 ' 91 [
1960 DATA 126, 64, 64, 96, 96, 96, 126, 0
1970 ' 92 \
1980 DATA 64,64,32,24,12,6,6,0
1990 ' 93 ]
2000 DATA 126, 2, 2, 6, 6, 6, 126, 0
2010 ' 94 up arrow
2020 DATA 24,52,98,0,0,0,0,0
2030 ' 95 _
2040 DATA 0, 0, 0, 0, 0, 0, 0, 255
2050 ' 96 `
2060 DATA 96, 48, 0, 0, 0, 0, 0, 0
2070 ' 97 a
2080 DATA 0, 0, 62, 2, 126, 98, 126, 0
2090 DATA 64, 64, 126, 70, 70, 70, 126, 0
2100 DATA 0, 0, 126, 66, 96, 98, 126, 0
2110 DATA 2, 2, 126, 66, 70, 70, 126, 0
2120 DATA 0, 0, 124, 68, 124, 98, 126, 0
2130 DATA 62, 34, 32, 120, 48, 48, 48, 0
2140 DATA 0, 0, 126, 66, 98, 126, 2, 62
2150 DATA 64, 64, 126, 66, 98, 98, 98, 0
2160 DATA 16, 0, 16, 16, 24, 24, 24, 0
2170 DATA 0, 2, 0, 2, 2, 2, 98, 126
2180 DATA 96, 96, 100, 68, 126, 70, 70, 0
2190 DATA 16, 16, 16, 16, 24, 24, 24, 0
2200 DATA 0, 0, 98, 126, 74, 106, 106, 0
2210 DATA 0, 0, 126, 66, 98, 98, 98, 0
2220 DATA 0, 0, 126, 66, 98, 98, 126, 0
2230 DATA 0, 0, 126, 66, 66, 126, 96, 96
2240 DATA 0, 0, 126, 66, 78, 126, 2, 2
2250 DATA 0, 0, 124, 96, 96, 96, 96, 0
2260 DATA 0, 0, 126, 64, 126, 6, 126, 0
2270 DATA 16, 16, 126, 16, 24, 24, 24, 0
2280 DATA 0, 0, 66, 66, 98, 98, 126, 0
2290 DATA 0, 0, 98, 98, 98, 36, 24, 0
2300 DATA 0, 0, 66, 74, 106, 126, 36, 0
2310 DATA 0, 0, 98, 126, 24, 126, 98, 0
2320 DATA 0, 0, 98, 98, 98, 36, 24, 112
2330 DATA 0, 0, 126, 108, 24, 50, 126, 0
2340 DATA 14, 24, 24, 112, 24, 24, 14, 0
2350 DATA 24, 24, 24, 0, 24, 24, 24, 0
2360 DATA 112, 24, 24, 14, 24, 24, 112, 0
2370 DATA 50, 126, 76, 0, 0, 0, 0, 0
2380 DATA 102, 51, 153, 204, 102, 51, 153, 204
2390 DATA 102, 51, 153, 204, 102, 51, 153, 204
2400 DATA -1
Occasionally I see a really “nice little touch” that a programmer took the time to add. For instance, some programs will restore the screen to what it looked like before the program ran. I decided I would do this for a project I was working on, and thought I’d share the super simple routine:
* Save/Restore screen.
* lwasm savescreen.bas -fbasic -osavescreen.bas --map
org $3f00
* Test function.
start
* Save the screen.
bsr savescreensub * GOSUB savescreensub
* Fill screen.
ldx #SCREENSTART * X=Start of screen.
lda #255 * A=255 (orange block).
loop
sta ,x+ * Store A at X, X=X+1.
cmpx #SCREENEND * Compare X to SCREENEND.
ble loop * IF X<=SCREENEND, GOTO loop.
* Wait for keypress.
getkey
jsr [$a000] * Call POLCAT ROM routine.
beq getkey * If no key, GOTO getkey.
* Restore screen.
bsr restorescreensub * GOSUB restorescreensub
rts * RETURN
* Subroutine
SCREENSTART equ 1024 * Start of screen memory.
SCREENEND equ 1536 * Last byte of screen memory.
savescreensub
pshs x,y,d * Save registers we will use.
ldx #SCREENSTART * X=Start of screen.
ldy #screenbuf * Y=Start of buffer.
saveloop
ldd ,x++ * Load D with 2 bytes at X, X=X+2.
std ,y++ * Store D at Y, Y=Y+2.
cmpx #SCREENEND * Compare X to SCREENEND.
blt saveloop * If X<=SCREENEND, GOTO saveloop.
puls x,y,d,pc * Resture used registers and return.
*rts
restorescreensub
pshs x,y,d * Save registers we will use.
ldx #screenbuf * X=Start of buffer.
ldy #SCREENSTART * Y=Start of screen.
restoreloop
ldd ,x++ * Load D with 2 bytes at X, X=X+2.
std ,y++ * Store D at Y, Y=Y+2.
cmpy #1535 * Compare Y to SCREENEND.
blt restoreloop * If Y<=SCREENEND, GOTO restoreloop.
puls x,y,d,pc * Resture used registers and return.
*rts
* This would go in your data area.
screenbuf rmb SCREENEND-SCREENSTART+1
end
There are two routines – savescreensub and restorescreensub – named that way just so I would know they are subroutines designed to be called by bsr/lbsr/jsr.
They make use of a 512-byte buffer (in the case of the CoCo’s 32×16 screen).
savescreensub will copy all the bytes currently on the text screen over to the buffer. restorescreensub will copy all the saved bytes in the buffer back to the screen.