See also: part 1, part 2, part 3, part 4, part 5 and part 6.
Now that I’ve gotten my digressions with BASIC variable access speeds and input speeds and INKEY/INSTR and GOTO/GOSUB speeds and HEX versus DECimal speeds out of the way, I can finally get back to digressing on using assembly language to speed up BASIC.
Where were we?
Oh, right…
In part 4 of this article I presented an example of using BASIC to scroll a PAC-MAN style maze that was too tall to fit on 16 the line screen.
I also presented some assembly code that would scroll the screen much faster than BASIC could ever hope to.
Today, let’s combine these two items and try to create a fast-scrolling maze playfield.
Let’s get started!
START SHOUTING AT ME! (revisited)
But before we get started, let’s revisit the uppercase routine I presented in Part 3.
Simon Jonassen is well-known in the CoCo Community for doing some amazing things on the original CoCo 1 and 2 hardware (and, lately, the CoCo 3 as well). He is quite the master of optimization, and has created some stunning sound players that allow the original CoCo to have cool background music while doing other things (if only the game programmers of 1980 knew about this!). He also has a cool web-based CoCo semigraphics editor. He provided a few enhancements:
* UCASE.ASM v1.01 * by Allen C. Huffman of Sub-Etha Software * www.subethasoftware.com / alsplace@pobox.com * * 1.01 a bit smaller per Simon Jonassen * * DEFUSRx() uppercase output function * * INPUT: VARPTR of a string * RETURNS: # chars processed * * EXAMPLE: * CLEAR 200,&H3F00 * DEFUSR0=&H3F00 * A$="Print this in uppercase." * PRINT A$ * A=USR0(VARPTR(A$)) * ORGADDR EQU $3f00 GIVABF EQU $B4F4 * 46324 INTCNV EQU $B3ED * 46061 CHROUT EQU $A002 opt 6809 * 6809 instructions only opt cd * cycle counting org ORGADDR start jsr INTCNV * get passed in value in D tfr d,x * move value (varptr) to X ldy 2,x * load string addr to Y; ldb ,x * load string len to Bbeq null * exit if strlen is 0 ldb ,x * load string len to B ldx #0 * clear X (count of chars conv) loop lda ,y+ * get next char, inc Y; lda ,y * load char in Acmpa #'a * compare to lowercase A blt nextch * if less, no conv needed cmpa #'z * compare to lowercase Z bgt nextch * if greater, no conv needed lcase suba #32 * subtract 32 to make uppercase leax 1,x * inc count of chars converted nextch jsr [CHROUT] * call ROM output character routine; leay 1,y * increment Y pointercont decb * decrement counter bne loop * not done yet; beq exit * if 0, go to exit ; bra loop * go to loopexit tfr x,d * move chars conv count to D; bra returnjmp GIVABF * return to caller null ldd #-1 * load -2 as error return jmp GIVABF * return to caller * lwasm --decb -o ucase2.bin ucase2.asm -l * lwasm --decb -f basic -o ucase2.bas ucase2.asm -l * lwasm --decb -f ihex -o ucase2.hex ucase2.asm -l * decb copy -2 -r ucase2.bin ../Xroar/dsk/DRIVE0.DSK,UCASE2.BIN
This code is 46 bytes long, compared to my original which was 49 bytes. The changes are:
- Move the initial LDB with string length to after the string length check, since it’s only needed if we get past that check and have a string.
- Change my LDA ,Y to LDA ,Y+ to increment Y there and not need the LEAY 1,Y later.
- Changed my “characters left” check from BEQ EXIT and BRA LOOP to BNE LOOP since it can just fall through and continue otherwise.
- Change a BRA RETURN to JMP GIVABF, since the branch would just end up at a JMP, and doing a JMP is faster than branching to a JMP.
Minor changes, but every little bit helps.
Simon also pointed out an embarrassing oversight in my very first example shown in part 1:
ORGADDR EQU $3f00 GIVABF EQU $B4F4 * 46324 INTCNV EQU $B3ED * 46061 org ORGADDR start jsr INTCNV * get passed in value in D tfr d,x * transfer D to X so we can manipulate it leax 1,x * add 1 to X tfr x,d * transfer X back to D return jmp GIVABF * return to caller
He reminded me about the “addd” instruction which can add to D. For some reason, I was thinking I needed to use LEA to add to a 16-bit register, and since “LEAD 1,D” wasn’t a thing, I did the whole transfer to X, add one to X, transfer back to D thing.
He said I should just do this:
* ADDONE.ASM v1.01 * by Allen C. Huffman of Sub-Etha Software * www.subethasoftware.com / alsplace@pobox.com * * 1.01 made less stupid per Simon Jonassen * * DEFUSRx() add one routine * * INPUT: integer to add one to * RETURNS: value +1 * * EXAMPLE: * CLEAR 200,&H3F00 * DEFUSR0=&H3F00 * A=USR0(42) * PRINT A * ORGADDR EQU $3f00 INTCNV EQU $B3ED * 46061 GIVABF EQU $B4F4 * 46324 org ORGADDR start jsr INTCNV * get passed in value in D; tfr d,x * transfer D to X so we can manipulate it ; leax 1,x * add 1 to X ; tfr x,d * transfer X back to Daddd #1 * add 1 to D return jmp GIVABF * return to caller * lwasm --decb -o -9 addone2.bin addone2.asm * lwasm --decb -f basic -o addone2.bas * decb copy -2 -r addone2.bin ../Xroar/dsk/DRIVE0.DSK,ADDONE2.BIN
See what happens when people who actually know 6809 assembly language look at my code? Thanks, Simon!
Moving Day
My simple examples have been building up to slight less-simple ones that do something more useful, like moving data that would take days to move in BASIC. Previously, I presented a PAC-MAN maze that could “scroll” up and down the screen by PRINTing the whole screen each time with just the lines of the maze that should be visible. I also presented some assembly code that could be used to move the screen up, down, left or right.
Today, the first thing I want to do is integrate that assembly routine in to the PAC-MAN maze code. Instead of redrawing the entire screen each time, BASIC will only need to redraw the top or bottom line depending on which was the screen just scrolled. If my math is correct, printing one line instead of sixteen lines should be at least twice faster.
First, let’s revisit the screen moving assembly code, which, thanks to comments from L. Curtis Boyle, now has a smarter routine for checking which direction the user passed in to scroll (though it could still be thrown off by values larger than 255):
* SCRNMOVE.ASM v1.01 * by Allen C. Huffman of Sub-Etha Software * www.subethasoftware.com / alsplace@pobox.com * * DEFUSRx() screen moving function * * INPUT: direction (1=up, 2=down, 3=left, 4=right) * RETURNS: 0 on success * -1 if invalid direction * * 1.01 better param parsing per L. Curtis Boyle * * EXAMPLE: * CLEAR 200,&H3F00 * DEFUSR0=&H3F00 * A=USR0(1) * ORGADDR EQU $3f00 INTCNV EQU $B3ED * 46061 GIVABF EQU $B4F4 * 46324 UP EQU 1 DOWN EQU 2 LEFT EQU 3 RIGHT EQU 4 SCREEN EQU 1024 * top left of screen END EQU 1535 * bottom right of screen org ORGADDR start jsr INTCNV * get incoming param in D; cmpb #UPdecb * decrement B beq up * if one DEC got us to zero; cmpb #DOWNdecb * decrement B beq down * if two DECs...; cmpb #LEFTdecb * decrement B beq left * if three DECs...; cmpb #RIGHTdecb * decrement B beq right * if four DECs... error ldd #-1 * load D with -1 for error code bra exit up ldx #SCREEN+32 loopup lda ,x sta -32,x leax 1,x cmpx #END ble loopup bra return down ldx #END-32 loopdown lda ,x sta 32,x leax -1,x cmpx #SCREEN bge loopdown bra return left ldx #SCREEN+1 loopleft lda ,x sta -1,x leax 1,x cmpx #END ble loopleft bra return right ldx #END-1 loopright lda ,x sta 1,x leax -1,x cmpx #SCREEN bge loopright return ldd #0 * return code (0=success) exit jmp GIVABF * return to BASIC * lwasm --decb -9 -o scrnmove2.bin scrnmove2.asm * lwasm --decb -f basic -o scrnmove2.bas scrnmove2.asm * decb copy -2 -r scrnmove2.bin ../Xroar/dsk/DRIVE0.DSK,SCRNMOVE2.BIN
The generated BASIC program looks like:
10 READ A,B 20 IF A=-1 THEN 70 30 FOR C = A TO B 40 READ D:POKE C,D 50 NEXT C 60 GOTO 10 70 END 80 DATA 16128,16217,189,179,237,90,39,14,90,39,28,90,39,42,90,39,55,204,255,255,32,67,142,4,32,166,132,167,136,224,48,1,140,5,255,47,244,32,47,142,5,223,166,132,167,136,32,48,31,140,4,0,44,244,32,30,142,4,1,166,132,167,31,48,1,140,5,255,47,245,32,14 90 DATA 142,5,254,166,132,167,1,48,31,140,4,0,44,245,204,0,0,126,180,244,-1,-1
Let’s take the original maze program and modify it to use the assembly routines instead:
0 REM MAZETEST.BAS 10 DIM MZ$(31) 20 FOR A=0 TO 30:READ MZ$(A):NEXT 30 CLS 40 REM SCROLL MAZE DOWN 50 FOR ST=0 TO 15 60 FOR LN=0 TO 15 70 PRINT @LN*32,MZ$(LN+ST); 80 NEXT:NEXT 90 REM SCROLL MAZE UP 100 FOR ST=15 TO 0 STEP-1 110 FOR LN=0 TO 15 120 PRINT @LN*32,MZ$(LN+ST); 130 NEXT:NEXT 140 GOTO 40 999 GOTO 999 1000 DATA "XXXXXXXXXXXXXXXXXXXXXXXXXXXX" 1010 DATA "X XX X" 1020 DATA "X XXXX XXXXX XX XXXXX XXXX X" 1030 DATA "X XXXX XXXXX XX XXXXX XXXX X" 1040 DATA "X XXXX XXXXX XX XXXXX XXXX X" 1050 DATA "X X" 1060 DATA "X XXXX XX XXXXXXXX XX XXXX X" 1070 DATA "X XXXX XX XXXXXXXX XX XXXX X" 1080 DATA "X XX XX XX X" 1090 DATA "XXXXXX XXXXX XX XXXXX XXXXXX" 2100 DATA " X XXXXX XX XXXXX X " 2110 DATA " X XX XX X " 2120 DATA " X XX XXXXXXXX XX X " 2130 DATA "XXXXXX XX X X XX XXXXXX" 2140 DATA " X X " 2150 DATA "XXXXXX XX X X XX XXXXXX" 2160 DATA " X XX XXXXXXXX XX X " 2170 DATA " X XX XX X " 2180 DATA " X XX XXXXXXXX XX X " 2190 DATA "XXXXXX XX XXXXXXXX XX XXXXXX" 3200 DATA "X XX X" 3210 DATA "X XXXX XXXXX XX XXXXX XXXX X" 3220 DATA "X XXXX XXXXX XX XXXXX XXXX X" 3230 DATA "X XX XX X" 3240 DATA "XXX XX XX XXXXXXXX XX XX XXX" 3250 DATA "XXX XX XX XXXXXXXX XX XX XXX" 3260 DATA "X XX XX XX X" 3270 DATA "X XXXXXXXXXX XX XXXXXXXXXX X" 3280 DATA "X XXXXXXXXXX XX XXXXXXXXXX X" 3290 DATA "X X" 4200 DATA "XXXXXXXXXXXXXXXXXXXXXXXXXXXX"
The maze is 31 lines tall. The fake scrolling is done by redrawing the entire screen line-by-line. The screen is 16 lines tall, so initially we draw maze lines 0-15. Then we redraw maze lines 1-16, giving the appearance that the screen is scrolling up and a line has scrolled off the top of the screen. This repeats for lines 2-17, 3-18 and so on until we’ve drawn the last 16 lines of 15-30.
After “scrolling” all the way to the bottom of the maze, a second block of FOR/NEXT loops reverses the process, starting with maze lines 15-30, then 15-30 and so on until it is back to displaying the top lines 0-15.
The scrolling is done by the FOR/NEXT loops using the LN variables in lines 60-80 and 110-130.
Rather than redrawing all sixteen lines each time, we could use the assembly routine to move the screen, and then we’d just draw one line – top or bottom, depending on which was the screen scrolled.
In effect, we’d replace this:
40 REM SCROLL MAZE DOWN 50 FOR ST=0 TO 15 60 FOR LN=0 TO 15 70 PRINT @LN*32,MZ$(LN+ST); 80 NEXT:NEXT 90 REM SCROLL MAZE UP 100 FOR ST=15 TO 0 STEP-1 110 FOR LN=0 TO 15 120 PRINT @LN*32,MZ$(LN+ST); 130 NEXT:NEXT
…with this:
35 FOR LN=0 TO 15:PRINT @LN*32,MZ$(LN+ST);:NEXT 40 REM SCROLL MAZE DOWN 50 FOR ST=0 TO 15 60 Z=USR0(1) 70 PRINT @480,MZ$(ST+15); 80 NEXT 90 REM SCROLL MAZE UP 100 FOR ST=15 TO 0 STEP-1 110 Z=USR0(2) 120 PRINT @0,MZ$(ST); 130 NEXT
Line 35 was added to initially draw the screen. After that, the assembly routine can move it up or down, and let BASIC redraw just the one line that needs to be drawn.
This, of course, requires the assembly routine to be loaded. We can take the BASIC loader of that and renumber it so we can call it from our test program. Here is the scrnmove2.asm updated code from the top of this article, renumbered and changed in to a subroutine:
5000 REM ASSEMBLY ROUTINE 5010 READ A,B 5020 IF A=-1 THEN 5070 5030 FOR C = A TO B 5040 READ D:POKE C,D 5050 NEXT C 5060 GOTO 5010 5070 RETURN 5080 DATA 16128,16217,189,179,237,90,39,14,90,39,28,90,39,42,90,39,55,204,255,255,32,67,142,4,32,166,132,167,136,224,48,1,140,5,255,47,244,32,47,142,5,223,166,132,167,136,32,48,31,140,4,0,44,244,32,30,142,4,1,166,132,167,31,48,1,140,5,255,47,245,32,14 5090 DATA 142,5,254,166,132,167,1,48,31,140,4,0,44,245,204,0,0,126,180,244,-1,-1
Now, I can add this to the end of the mazetest.bas program and set it up so the USR0() calls will work:
10 CLEAR 200,&H3F00:DIM MZ$(31) 25 GOSUB5000:DEFUSR0=&H3F00
Now the program will use the CLEAR command to protect memory starting at &H3F00 (where the assembly will load), then after it reads all the maze strings in to memory (those DATA statements appear first), it will GOSUB 5000 and that READs the assembly code statements and POKEs them in to memory starting at &H3F00. The DEFUSR call is then done to make USR0(x) work.
With just a few lines changed, and getting our assembly routine in memory, now the maze scrolling is very fast! And, if we optimized the BASIC code around it, it could be even faster since most of the time is spent processing the BASIC program.
Here is the full listing:
0 REM MAZETST2.BAS - W/ASM! 10 CLEAR 200,&H3F00:DIM MZ$(31) 20 FOR A=0 TO 30:READ MZ$(A):NEXT 25 GOSUB5000:DEFUSR0=&H3F00 30 CLS 40 REM SCROLL MAZE DOWN 50 FOR ST=0 TO 15 60 FOR LN=0 TO 15 70 PRINT @LN*32,MZ$(LN+ST); 80 NEXT:NEXT 90 REM SCROLL MAZE UP 100 FOR ST=15 TO 0 STEP-1 110 FOR LN=0 TO 15 120 PRINT @LN*32,MZ$(LN+ST); 130 NEXT:NEXT 35 FOR LN=0 TO 15:PRINT @LN*32,MZ$(LN+ST);:NEXT 40 REM SCROLL MAZE DOWN 50 FOR ST=0 TO 15 60 Z=USR0(1) 70 PRINT @480,MZ$(ST+15); 80 NEXT 90 REM SCROLL MAZE UP 100 FOR ST=15 TO 0 STEP-1 110 Z=USR0(2) 120 PRINT @0,MZ$(ST); 130 NEXT 140 GOTO 40 999 GOTO 999 1000 DATA "XXXXXXXXXXXXXXXXXXXXXXXXXXXX" 1010 DATA "X XX X" 1020 DATA "X XXXX XXXXX XX XXXXX XXXX X" 1030 DATA "X XXXX XXXXX XX XXXXX XXXX X" 1040 DATA "X XXXX XXXXX XX XXXXX XXXX X" 1050 DATA "X X" 1060 DATA "X XXXX XX XXXXXXXX XX XXXX X" 1070 DATA "X XXXX XX XXXXXXXX XX XXXX X" 1080 DATA "X XX XX XX X" 1090 DATA "XXXXXX XXXXX XX XXXXX XXXXXX" 2100 DATA " X XXXXX XX XXXXX X " 2110 DATA " X XX XX X " 2120 DATA " X XX XXXXXXXX XX X " 2130 DATA "XXXXXX XX X X XX XXXXXX" 2140 DATA " X X " 2150 DATA "XXXXXX XX X X XX XXXXXX" 2160 DATA " X XX XXXXXXXX XX X " 2170 DATA " X XX XX X " 2180 DATA " X XX XXXXXXXX XX X " 2190 DATA "XXXXXX XX XXXXXXXX XX XXXXXX" 3200 DATA "X XX X" 3210 DATA "X XXXX XXXXX XX XXXXX XXXX X" 3220 DATA "X XXXX XXXXX XX XXXXX XXXX X" 3230 DATA "X XX XX X" 3240 DATA "XXX XX XX XXXXXXXX XX XX XXX" 3250 DATA "XXX XX XX XXXXXXXX XX XX XXX" 3260 DATA "X XX XX XX X" 3270 DATA "X XXXXXXXXXX XX XXXXXXXXXX X" 3280 DATA "X XXXXXXXXXX XX XXXXXXXXXX X" 3290 DATA "X X" 4200 DATA "XXXXXXXXXXXXXXXXXXXXXXXXXXXX" 5000 REM ASSEMBLY ROUTINE 5010 READ A,B 5020 IF A=-1 THEN 5070 5030 FOR C = A TO B 5040 READ D:POKE C,D 5050 NEXT C 5060 GOTO 5010 5070 RETURN 5080 DATA 16128,16217,189,179,237,90,39,14,90,39,28,90,39,42,90,39,55,204,255,255,32,67,142,4,32,166,132,167,136,224,48,1,140,5,255,47,244,32,47,142,5,223,166,132,167,136,32,48,31,140,4,0,44,244,32,30,142,4,1,166,132,167,31,48,1,140,5,255,47,245,32,14 5090 DATA 142,5,254,166,132,167,1,48,31,140,4,0,44,245,204,0,0,126,180,244,-1,-1
Try the original BASIC-only version and then this new assembly-enhanced version and see what you think.
Next time, I will share a version of this scrolling maze that has a character you can control and move through the maze.
Until then…