Category Archives: BASIC Programming

Interfacing assembly with BASIC via DEFUSR, part 5

See also: Part 1, Part 2Part 3, and Part 4.

Now that I’ve gotten my digressions with BASIC variable access speeds and input speeds and INKEY/INSTR and GOTO/GOSUB speeds and HEX versus DECimal speeds out of the way, I can finally get back to digressing on using assembly language to speed up BASIC.

Where were we?

Oh, right…

In part 4 of this article I presented an example of using BASIC to scroll a PAC-MAN style maze that was too tall to fit on 16 the line screen.

I also presented some assembly code that would scroll the screen much faster than BASIC could ever hope to.

Today, let’s combine these two items and try to create a fast-scrolling maze playfield.

Let’s get started!

START SHOUTING AT ME! (revisited)

But before we get started, let’s revisit the uppercase routine I presented in Part 3.

Simon Jonassen is well-known in the CoCo Community for doing some amazing things on the original CoCo 1 and 2 hardware (and, lately, the CoCo 3 as well). He is quite the master of optimization, and has created some stunning sound players that allow the original CoCo to have cool background music while doing other things (if only the game programmers of 1980 knew about this!). He also has a cool web-based CoCo semigraphics editor. He provided a few enhancements:

* UCASE.ASM v1.01
* by Allen C. Huffman of Sub-Etha Software
* www.subethasoftware.com / alsplace@pobox.com
*
* 1.01 a bit smaller per Simon Jonassen
*
* DEFUSRx() uppercase output function
*
* INPUT:   VARPTR of a string
* RETURNS: # chars processed
*
* EXAMPLE:
*   CLEAR 200,&H3F00
*   DEFUSR0=&H3F00
*   A$="Print this in uppercase."
*   PRINT A$
*   A=USR0(VARPTR(A$))
*
ORGADDR     EQU     $3f00

GIVABF      EQU     $B4F4   * 46324
INTCNV      EQU     $B3ED   * 46061
CHROUT      EQU     $A002

            opt	    6809    * 6809 instructions only
            opt	    cd      * cycle counting

            org     ORGADDR

start       jsr     INTCNV  * get passed in value in D
            tfr     d,x     * move value (varptr) to X
            ldy     2,x     * load string addr to Y
;           ldb     ,x      * load string len to B
            beq     null    * exit if strlen is 0
            ldb     ,x      * load string len to B
            ldx     #0      * clear X (count of chars conv)

loop        lda     ,y+	    * get next char, inc Y
;           lda     ,y      * load char in A
            cmpa    #'a     * compare to lowercase A
            blt     nextch  * if less, no conv needed
            cmpa    #'z     * compare to lowercase Z
            bgt     nextch  * if greater, no conv needed
lcase       suba    #32     * subtract 32 to make uppercase
            leax    1,x     * inc count of chars converted
nextch      jsr     [CHROUT] * call ROM output character routine
;           leay    1,y     * increment Y pointer
cont        decb            * decrement counter
            bne	    loop    * not done yet
;           beq     exit    * if 0, go to exit
;           bra     loop    * go to loop

exit        tfr     x,d     * move chars conv count to D
;           bra     return
            jmp     GIVABF  * return to caller

null        ldd     #-1     * load -2 as error
return      jmp     GIVABF  * return to caller

* lwasm --decb -o ucase2.bin ucase2.asm -l
* lwasm --decb -f basic -o ucase2.bas ucase2.asm -l
* lwasm --decb -f ihex -o ucase2.hex ucase2.asm -l
* decb copy -2 -r ucase2.bin ../Xroar/dsk/DRIVE0.DSK,UCASE2.BIN

This code is 46 bytes long, compared to my original which was 49 bytes. The changes are:

  1. Move the initial LDB with string length to after the string length check, since it’s only needed if we get past that check and have a string.
  2. Change my LDA ,Y to LDA ,Y+ to increment Y there and not need the LEAY 1,Y later.
  3. Changed my “characters left” check from BEQ EXIT and BRA LOOP to BNE LOOP since it can just fall through and continue otherwise.
  4. Change a BRA RETURN to JMP GIVABF, since the branch would just end up at a JMP, and doing a JMP is faster than branching to a JMP.

Minor changes, but every little bit helps.

Simon also pointed out an embarrassing oversight in my very first example shown in part 1:

ORGADDR EQU $3f00

GIVABF EQU $B4F4   * 46324
INTCNV EQU $B3ED   * 46061

       org  ORGADDR
start  jsr  INTCNV * get passed in value in D
       tfr  d,x    * transfer D to X so we can manipulate it
       leax 1,x    * add 1 to X
       tfr  x,d    * transfer X back to D
return jmp  GIVABF * return to caller

He reminded me about the “addd” instruction which can add to D. For some reason, I was thinking I needed to use LEA to add to a 16-bit register, and since “LEAD 1,D” wasn’t a thing, I did the whole transfer to X, add one to X, transfer back to D thing.

He said I should just do this:

* ADDONE.ASM v1.01
* by Allen C. Huffman of Sub-Etha Software
* www.subethasoftware.com / alsplace@pobox.com
*
* 1.01 made less stupid per Simon Jonassen
*
* DEFUSRx() add one routine
*
* INPUT:   integer to add one to
* RETURNS: value +1
*
* EXAMPLE:
*   CLEAR 200,&H3F00
*   DEFUSR0=&H3F00
*   A=USR0(42)
*   PRINT A
*
ORGADDR EQU     $3f00

INTCNV  EQU     $B3ED   * 46061
GIVABF  EQU     $B4F4   * 46324

        org     ORGADDR

start   jsr     INTCNV  * get passed in value in D
;       tfr     d,x     * transfer D to X so we can manipulate it
;       leax    1,x     * add 1 to X
;       tfr     x,d     * transfer X back to D
        addd    #1      * add 1 to D
return  jmp     GIVABF  * return to caller

* lwasm --decb -o -9 addone2.bin addone2.asm
* lwasm --decb -f basic -o addone2.bas
* decb copy -2 -r addone2.bin ../Xroar/dsk/DRIVE0.DSK,ADDONE2.BIN

See what happens when people who actually know 6809 assembly language look at my code? Thanks, Simon!

Moving Day

My simple examples have been building up to slight less-simple ones that do something more useful, like moving data that would take days to move in BASIC. Previously, I presented a PAC-MAN maze that could “scroll” up and down the screen by PRINTing the whole screen each time with just the lines of the maze that should be visible. I also presented some assembly code that could be used to move the screen up, down, left or right.

Today, the first thing I want to do is integrate that assembly routine in to the PAC-MAN maze code. Instead of redrawing the entire screen each time, BASIC will only need to redraw the top or bottom line depending on which was the screen just scrolled. If my math is correct, printing one line instead of sixteen lines should be at least twice faster.

First, let’s revisit the screen moving assembly code, which, thanks to comments from L. Curtis Boyle, now has a smarter routine for checking which direction the user passed in to scroll (though it could still be thrown off by values larger than 255):

* SCRNMOVE.ASM v1.01
* by Allen C. Huffman of Sub-Etha Software
* www.subethasoftware.com / alsplace@pobox.com
*
* DEFUSRx() screen moving function
*
* INPUT:   direction (1=up, 2=down, 3=left, 4=right)
* RETURNS: 0 on success
*         -1 if invalid direction
*
* 1.01 better param parsing per L. Curtis Boyle
*
* EXAMPLE:
*   CLEAR 200,&H3F00
*   DEFUSR0=&H3F00
*   A=USR0(1)
*
ORGADDR EQU     $3f00

INTCNV  EQU     $B3ED   * 46061
GIVABF  EQU     $B4F4   * 46324

UP      EQU     1
DOWN    EQU     2
LEFT    EQU     3
RIGHT   EQU     4
SCREEN  EQU     1024    * top left of screen
END     EQU     1535    * bottom right of screen

        org     ORGADDR

start   jsr     INTCNV  * get incoming param in D
;       cmpb    #UP
        decb            * decrement B
        beq     up      * if one DEC got us to zero
;       cmpb    #DOWN
        decb            * decrement B
        beq     down    * if two DECs...
;       cmpb    #LEFT
        decb            * decrement B
        beq     left    * if three DECs...
;       cmpb    #RIGHT
        decb            * decrement B
        beq     right   * if four DECs...
error   ldd     #-1     * load D with -1 for error code
        bra     exit

up      ldx     #SCREEN+32
loopup  lda     ,x
        sta     -32,x
        leax    1,x
        cmpx    #END
        ble     loopup
        bra     return

down    ldx     #END-32
loopdown lda    ,x
        sta     32,x
        leax    -1,x
        cmpx    #SCREEN
        bge     loopdown
        bra     return

left    ldx     #SCREEN+1
loopleft lda    ,x
        sta     -1,x
        leax    1,x
        cmpx    #END
        ble     loopleft
        bra     return

right   ldx     #END-1
loopright lda   ,x
        sta     1,x
        leax    -1,x
        cmpx    #SCREEN
        bge     loopright
    
return  ldd     #0      * return code (0=success)
exit    jmp     GIVABF  * return to BASIC

* lwasm --decb -9 -o scrnmove2.bin scrnmove2.asm
* lwasm --decb -f basic -o scrnmove2.bas scrnmove2.asm
* decb copy -2 -r scrnmove2.bin ../Xroar/dsk/DRIVE0.DSK,SCRNMOVE2.BIN

The generated BASIC program looks like:

10 READ A,B
20 IF A=-1 THEN 70
30 FOR C = A TO B
40 READ D:POKE C,D
50 NEXT C
60 GOTO 10
70 END
80 DATA 16128,16217,189,179,237,90,39,14,90,39,28,90,39,42,90,39,55,204,255,255,32,67,142,4,32,166,132,167,136,224,48,1,140,5,255,47,244,32,47,142,5,223,166,132,167,136,32,48,31,140,4,0,44,244,32,30,142,4,1,166,132,167,31,48,1,140,5,255,47,245,32,14
90 DATA 142,5,254,166,132,167,1,48,31,140,4,0,44,245,204,0,0,126,180,244,-1,-1

Let’s take the original maze program and modify it to use the assembly routines instead:

0 REM MAZETEST.BAS
10 DIM MZ$(31)
20 FOR A=0 TO 30:READ MZ$(A):NEXT
30 CLS
40 REM SCROLL MAZE DOWN
50 FOR ST=0 TO 15
60 FOR LN=0 TO 15
70 PRINT @LN*32,MZ$(LN+ST);
80 NEXT:NEXT
90 REM SCROLL MAZE UP
100 FOR ST=15 TO 0 STEP-1
110 FOR LN=0 TO 15
120 PRINT @LN*32,MZ$(LN+ST);
130 NEXT:NEXT
140 GOTO 40
999 GOTO 999
1000 DATA "XXXXXXXXXXXXXXXXXXXXXXXXXXXX"    
1010 DATA "X            XX            X"    
1020 DATA "X XXXX XXXXX XX XXXXX XXXX X"    
1030 DATA "X XXXX XXXXX XX XXXXX XXXX X"    
1040 DATA "X XXXX XXXXX XX XXXXX XXXX X"    
1050 DATA "X                          X"
1060 DATA "X XXXX XX XXXXXXXX XX XXXX X"   
1070 DATA "X XXXX XX XXXXXXXX XX XXXX X"    
1080 DATA "X      XX    XX    XX      X"   
1090 DATA "XXXXXX XXXXX XX XXXXX XXXXXX"    
2100 DATA "     X XXXXX XX XXXXX X     "    
2110 DATA "     X XX          XX X     "    
2120 DATA "     X XX XXXXXXXX XX X     "   
2130 DATA "XXXXXX XX X      X XX XXXXXX"   
2140 DATA "          X      X          "   
2150 DATA "XXXXXX XX X      X XX XXXXXX"   
2160 DATA "     X XX XXXXXXXX XX X     "   
2170 DATA "     X XX          XX X     "   
2180 DATA "     X XX XXXXXXXX XX X     "   
2190 DATA "XXXXXX XX XXXXXXXX XX XXXXXX"   
3200 DATA "X            XX            X"   
3210 DATA "X XXXX XXXXX XX XXXXX XXXX X"   
3220 DATA "X XXXX XXXXX XX XXXXX XXXX X"   
3230 DATA "X   XX                XX   X"   
3240 DATA "XXX XX XX XXXXXXXX XX XX XXX"   
3250 DATA "XXX XX XX XXXXXXXX XX XX XXX"   
3260 DATA "X      XX    XX    XX      X"   
3270 DATA "X XXXXXXXXXX XX XXXXXXXXXX X"   
3280 DATA "X XXXXXXXXXX XX XXXXXXXXXX X"   
3290 DATA "X                          X"   
4200 DATA "XXXXXXXXXXXXXXXXXXXXXXXXXXXX"

The maze is 31 lines tall. The fake scrolling is done by redrawing the entire screen line-by-line. The screen is 16 lines tall, so initially we draw maze lines 0-15. Then we redraw maze lines 1-16, giving the appearance that the screen is scrolling up and a line has scrolled off the top of the screen. This repeats for lines 2-17, 3-18 and so on until we’ve drawn the last 16 lines of 15-30.

After “scrolling” all the way to the bottom of the maze, a second block of FOR/NEXT loops reverses the process, starting with maze lines 15-30, then 15-30 and so on until it is back to displaying the top lines 0-15.

The scrolling is done by the FOR/NEXT loops using the LN variables in lines 60-80 and 110-130.

Rather than redrawing all sixteen lines each time, we could use the assembly routine to move the screen, and then we’d just draw one line – top or bottom, depending on which was the screen scrolled.

In effect, we’d replace this:

40 REM SCROLL MAZE DOWN
50 FOR ST=0 TO 15
60 FOR LN=0 TO 15
70 PRINT @LN*32,MZ$(LN+ST);
80 NEXT:NEXT
90 REM SCROLL MAZE UP
100 FOR ST=15 TO 0 STEP-1
110 FOR LN=0 TO 15
120 PRINT @LN*32,MZ$(LN+ST);
130 NEXT:NEXT

…with this:

35 FOR LN=0 TO 15:PRINT @LN*32,MZ$(LN+ST);:NEXT
40 REM SCROLL MAZE DOWN
50 FOR ST=0 TO 15
60 Z=USR0(1)
70 PRINT @480,MZ$(ST+15);
80 NEXT
90 REM SCROLL MAZE UP
100 FOR ST=15 TO 0 STEP-1
110 Z=USR0(2)
120 PRINT @0,MZ$(ST);
130 NEXT

Line 35 was added to initially draw the screen. After that, the assembly routine can move it up or down, and let BASIC redraw just the one line that needs to be drawn.

This, of course, requires the assembly routine to be loaded. We can take the BASIC loader of that and renumber it so we can call it from our test program. Here is the scrnmove2.asm updated code from the top of this article, renumbered and changed in to a subroutine:

5000 REM ASSEMBLY ROUTINE
5010 READ A,B
5020 IF A=-1 THEN 5070
5030 FOR C = A TO B
5040 READ D:POKE C,D
5050 NEXT C
5060 GOTO 5010
5070 RETURN
5080 DATA 16128,16217,189,179,237,90,39,14,90,39,28,90,39,42,90,39,55,204,255,255,32,67,142,4,32,166,132,167,136,224,48,1,140,5,255,47,244,32,47,142,5,223,166,132,167,136,32,48,31,140,4,0,44,244,32,30,142,4,1,166,132,167,31,48,1,140,5,255,47,245,32,14
5090 DATA 142,5,254,166,132,167,1,48,31,140,4,0,44,245,204,0,0,126,180,244,-1,-1

Now, I can add this to the end of the mazetest.bas program and set it up so the USR0() calls will work:

10 CLEAR 200,&H3F00:DIM MZ$(31)
25 GOSUB5000:DEFUSR0=&H3F00

Now the program will use the CLEAR command to protect memory starting at &H3F00 (where the assembly will load), then after it reads all the maze strings in to memory (those DATA statements appear first), it will GOSUB 5000 and that READs the assembly code statements and POKEs them in to memory starting at &H3F00. The DEFUSR call is then done to make USR0(x) work.

With just a few lines changed, and getting our assembly routine in memory, now the maze scrolling is very fast! And, if we optimized the BASIC code around it, it could be even faster since most of the time is spent processing the BASIC program.

Here is the full listing:

0 REM MAZETST2.BAS - W/ASM!
10 CLEAR 200,&H3F00:DIM MZ$(31)
20 FOR A=0 TO 30:READ MZ$(A):NEXT
25 GOSUB5000:DEFUSR0=&H3F00
30 CLS

40 REM SCROLL MAZE DOWN
50 FOR ST=0 TO 15
60 FOR LN=0 TO 15
70 PRINT @LN*32,MZ$(LN+ST);
80 NEXT:NEXT
90 REM SCROLL MAZE UP
100 FOR ST=15 TO 0 STEP-1
110 FOR LN=0 TO 15
120 PRINT @LN*32,MZ$(LN+ST);
130 NEXT:NEXT

35 FOR LN=0 TO 15:PRINT @LN*32,MZ$(LN+ST);:NEXT
40 REM SCROLL MAZE DOWN
50 FOR ST=0 TO 15
60 Z=USR0(1)
70 PRINT @480,MZ$(ST+15);
80 NEXT
90 REM SCROLL MAZE UP
100 FOR ST=15 TO 0 STEP-1
110 Z=USR0(2)
120 PRINT @0,MZ$(ST);
130 NEXT
140 GOTO 40
999 GOTO 999
1000 DATA "XXXXXXXXXXXXXXXXXXXXXXXXXXXX"    
1010 DATA "X            XX            X"    
1020 DATA "X XXXX XXXXX XX XXXXX XXXX X"    
1030 DATA "X XXXX XXXXX XX XXXXX XXXX X"    
1040 DATA "X XXXX XXXXX XX XXXXX XXXX X"    
1050 DATA "X                          X"
1060 DATA "X XXXX XX XXXXXXXX XX XXXX X"   
1070 DATA "X XXXX XX XXXXXXXX XX XXXX X"    
1080 DATA "X      XX    XX    XX      X"   
1090 DATA "XXXXXX XXXXX XX XXXXX XXXXXX"    
2100 DATA "     X XXXXX XX XXXXX X     "    
2110 DATA "     X XX          XX X     "    
2120 DATA "     X XX XXXXXXXX XX X     "   
2130 DATA "XXXXXX XX X      X XX XXXXXX"   
2140 DATA "          X      X          "   
2150 DATA "XXXXXX XX X      X XX XXXXXX"   
2160 DATA "     X XX XXXXXXXX XX X     "   
2170 DATA "     X XX          XX X     "   
2180 DATA "     X XX XXXXXXXX XX X     "   
2190 DATA "XXXXXX XX XXXXXXXX XX XXXXXX"   
3200 DATA "X            XX            X"   
3210 DATA "X XXXX XXXXX XX XXXXX XXXX X"   
3220 DATA "X XXXX XXXXX XX XXXXX XXXX X"   
3230 DATA "X   XX                XX   X"   
3240 DATA "XXX XX XX XXXXXXXX XX XX XXX"   
3250 DATA "XXX XX XX XXXXXXXX XX XX XXX"   
3260 DATA "X      XX    XX    XX      X"   
3270 DATA "X XXXXXXXXXX XX XXXXXXXXXX X"   
3280 DATA "X XXXXXXXXXX XX XXXXXXXXXX X"   
3290 DATA "X                          X"   
4200 DATA "XXXXXXXXXXXXXXXXXXXXXXXXXXXX"

5000 REM ASSEMBLY ROUTINE
5010 READ A,B
5020 IF A=-1 THEN 5070
5030 FOR C = A TO B
5040 READ D:POKE C,D
5050 NEXT C
5060 GOTO 5010
5070 RETURN
5080 DATA 16128,16217,189,179,237,90,39,14,90,39,28,90,39,42,90,39,55,204,255,255,32,67,142,4,32,166,132,167,136,224,48,1,140,5,255,47,244,32,47,142,5,223,166,132,167,136,32,48,31,140,4,0,44,244,32,30,142,4,1,166,132,167,31,48,1,140,5,255,47,245,32,14
5090 DATA 142,5,254,166,132,167,1,48,31,140,4,0,44,245,204,0,0,126,180,244,-1,-1

Try the original BASIC-only version and then this new assembly-enhanced version and see what you think.

Next time, I will share a version of this scrolling maze that has a character you can control and move through the maze.

Until then…

Optimizing Color BASIC, part 5

See also: Part 1, Part 2, Part 3 and Part 4.

Updates:

  • 2/14/2017 – Fixed numeric typo (thanks, Geroge P!).

HEX versus DECimal Numbers

As Barbie once said*…

Math is hard! – Barbie

While Mattel’s Math-Is-Hard Barbie never quite made the splash the marketing team had hoped for, her sentiment lives on.

Side Note: *This is in reference to a the Teen Talk Barbie doll released in 1992, and out of the 270 phrases the doll could say, that was not one of them. The real quote was “Math class is tough!”

Earlier in this series, I touched on the fact that dealing with numbers is time consuming for BASIC. Something as simple as B=65535 takes time to process as the interpreter translates that base-10 decimal number in to an internal floating point value. The more digits, the more work. For instance:

0 REM NUMBERS.BAS
10 TIMER=0:TM=TIMER
20 FOR A=1 TO 1000
30 B=1
40 NEXT
50 PRINT TIMER-TM

That prints a value of 183. If you change line 3 to read “B=12345” the number jumps to 485. You can see the increase:

  • B=1 – 183
  • B=12 – 262
  • B=123 – 337
  • B=1234 – 408
  • B=12345 = 485

Obviously, the more numbers to parse and convert, the more time it will take. It also seems to matter if the value has a decimal point in it:

  • B=1.0 – 403
  • B=1.1 – 476

Even though that is only three characters to process, it takes longer than B=123. Clearly, more work is being done on floating point values. Even though all Color BASIC numbers are represented internally as floating point, it still makes sense to avoid using them unless you really need them.

You can also represent a base-16 number in hexadecimal. For the value of 1, it feels like parsing “&H1” should take longer than parsing “1”. Let’s try:

  • B=&H1 – 180
  • B=&H12 – 175
  • B=&H123 – 200
  • B=&H1234 – 203

It seems that parsing a hexadecimal value is much faster than dealing with base-10 values. Using this, you could speed up a program just by switching to hex, provided that your numbers are between 0 and 65535 (the values that can be represented in hex). I was surprised to see that negative values also work:

  • B=&HFFFF – 201
  • B=-&HFFFF – 230

It seems dealing with the negative takes a bit of more time, though, so it makes sense to avoid using them unless you really need them. ;-)

With this in mind, let’s test a FOR/NEXT loop:

10 TIMER=0:TM=TIMER
20 FOR A=&H1 TO &H3E8
30 B=&H1
40 NEXT
50 PRINT TIMER-TM

This prints 182, which is basically the same speed as the original that used 0 TO 1000. I guess hexadecimals don’t really help out FOR/NEXT.

Why? Because the FOR/NEXT statement is only parsed once, then the loop counters are set up and done. It is probably a tad faster to use hex, but that savings only happens once in the “do it 1000 times” test.

But, as you see, USING the variables gets faster. Any place we use a number, it seems using a hex version of that number may speed it up:

10 TIMER=0:TM=TIMER
20 FOR A=1 TO 1000
30 IF A>&HFF THEN REM
40 NEXT
50 PRINT TIMER-TM

This prints 278. Doing it with A>255 prints 427! Imagine if you could speed up every time you used a number in your code:

10 TIMER=0:TM=TIMER
20 FOR A=1 TO 1000
30 PRINT@&H20,"HELLO"
40 NEXT
50 PRINT TIMER-TM

That prints 391, but changing it to PRINT@32 prints 469! If you use a bunch of PRINT@s in your code, you can speed them up just by switching to hex!

Math could be accelerated, too, simply due to the number conversion being faster. The more digits, the better advantage hex has:

  • B=A+&H270F – 285
  • B=A+9999 – 483

And the more numbers, the more time you can save by using hex. A common PRINT thing is to use the length of a string to figure out how to center is on the screen:

0 REM NUMBERS.BAS
10 TIMER=0:TM=TIMER
15 CLS:A$="HELLO, WORLD!":LN=LEN(A$)
20 FOR A=1 TO 1000
25 PRINT@32*8+16-LN/2,A$
40 NEXT
50 PRINT TIMER-TM

That prints 1284. Converting line 24 to HEX:

25 PRINT@&H20*&H8+&H1-LN/&H2,A$

And now it prints 1097.

In a game where you might be PRINTing things on the screen constantly, those savings could really add up.

Pity that math is hard, else we could just use hex in our programs and get a free speed boost.

Until next time…

 

An effect of PAL versus NTSC…

Just a quick note about my Optimizing Color BASIC series…

In later installments, I started doing some benchmarking using BASIC’s TIMER command. TIMER returns an incrementing count value that is based on the 60hz TV sync. At 60hz, the timer counts to 60 approximately every second.

I have been doing my tests in the XRoar emulator, which was created to emulate the Dragon, a CoCo clone manufactured in England. Over there, their TV systems are 50hz, so the Dragon and UK versions of the CoCo operate using a different TV sync rate.

Thus, in Dragon/UK mode, TIMER counts up to 50 every second.

And by default, XRoar starts up in PAL mode.

So it’s likely that some of the TIMER values I have given have been off, because I may have been running in PAL/50hz mode sometime.

So if you catch any, let me know and I will go back and retest and correct. I already did that in a recent part, but I may have missed others.

Mea culpa.

Optmizing Color BASIC, part 4

See also: Part 1, Part 2 and Part 3.

Updates:

  • 2/14/2017 – added section header.

INSTR and GOTO/GOSUB

Here’s a quickie that discusses making INSTR faster, and GOTO versus GOSUB.

Side Note: In the code examples, I am using spaces for readability. Spaces slow things down, so instead of “FOR A=1 TO 1000” you would write “FORA=1TO1000”. If you remove the unnecessary spaces from these examples, they get faster.

In the previous installment, I discussed ways to speed up doing things based on INPUT by using the INSTR command. INSTR will return the position of where a string is inside another string:

PRINT INSTR("ABC", "B")
2

Above, INSTR returns 2, indicating the string “B” was found starting at position 2 in the search string “ABC”. If the string is not found, it returns zero:

PRINT INSTR("ABC", "X")
0

You could use this in weird ways. For instance, if you wanted to match only certain animals, you could do something like this:

10 INPUT "ENTER AN ANIMAL";A$
20 A=INSTR("CATDOGCOWCHICKEN", A$)
30 IF A>0 THEN PRINT "I KNOW THAT ANIMAL!" ELSE PRINT "WHAT'S THAT?"
40 GOTO 10

INSTR can help you identify animals!

…but why would you want to do that? And, it just matches strings, so any combination that appears in the search string will be matched:

…or not.

Above, searching for “A” was a match, since there is an “A” in that weird animal string, as well as a “C”. There was no “B”, so…

Okay, nevermind. Forget I mentioned it.

I am sure there are many good uses for INSTR, but I mostly use it to match single letter commands (as mentioned previously) like this:

10 A$=INKEY$:IF A$="" THEN 10
20 LN=INSTR("ABCD", A$):IF LN=0 THEN 10
30 ON LN GOTO 100,200,300,400

Since INSTR returns 0 when there is no match, it’s an easy way to validate that the character entered is valid.

According to the documentation in the CoCo 3 BASIC manual, the full syntax is this:

INSTR(start-position, search-string, target-string)

You can use the optional start-position to begin scanning later in the string. For instance:

PRINT INSTR(3, "ABCDEF", "A")

That would print 0 since we are searching for “A” in the string “ABCDEF” starting at the third character (so, searching “CDEF”).

The manual also notes conditions where a 0 can be returned:

  • The start-position is greater than the number of characters in the search-string: INSTR(4, “ABC”, “A”)
  • The search-string is null: INSTR(“”, “A”)
  • It cannot find the target: INSTR(“ABC”, “Z”)

I was surprised today to (re)discover that INSTR considers a null (empty) string to be a match, sorta:

PRINT INSTR("ABC", "")
1

If the search-string is empty, it returns with the current search-position (which starts at 1, for the first character). This seems like a bug to me, but indeed, this behavior is the same in later, more advanced Microsoft BASICs.

I bring this up now because I was almost going to show you something really clever. Normally, I use INSTR with a string I get back from INKEY$. But, you can also use INKEY$ directly. And, since ON GOTO/GOSUB won’t go anywhere if the value is 0, I thought it might be clever to use it like this:

10 ON INSTR("ABCD",INKEY$) GOTO 100,200,300,400

…and this is smaller and much faster and works great … if there is a key waiting! If no key is waiting, INKEY$ returns a null (“”) and … INSTR returns a 1, and then ON GOTO goes to 100 even though that was not the intent.

Darnit. I thought I had a great way to speed things up. Consider the speed of this version:

INSTR example … workaround for not using a variable.

I thought by replacing line 30 with…

ON INSTR("ABC",INKEY$)GOSUB70,80,90

…I would be set. But, since an empty INKEY$ is returning “”, it’s always GOSUBing to line 70.

I tried a hacky workaround, by adding a bogus character tho the start of the string, and making that GOSUB to a RETURN located real close to that code (so it didn’t have to search as far to find it):

INSTR example.

…but, the overhead of that extra GOSUB/RETURN that happens EVERY TIME there is no key waiting was enough to make it slightly slower. If it wasn’t for that, we could do this maybe 30% faster and use less variables :)

So, unfortunately, I guess I have no optimization to show you… Just a failed attempt at one.

But wait, there’s more!

I posted about this on the CoCo mailing list and in the CoCo Facebook group to figure out if this behavior was a bug. There were several responses confirming this behavior in other versions of BASIC and languages.

On the list, Robert Hermanek responded with a great solution:

Issue is just getting a return of 1 when searching for empty string? Then why not:

10 ON INSTR(” ABC”,INKEY$) GOTO 10,100,200,300

…notice the space before A.
-RobertH

His brilliant suggestion works by adding a bogus character to the search-string, and making any match of that string (or “”) GOTO the same line. Thus, problem solved!

This won’t work with ON GOSUB since every GOSUB expects a RETURN. Each time you use GOSUB, it takes up seven bytes (?) of memory to remember where to RETURN to in the program. If you did something like this, you’ll see the issue:

10 PRINT MEM:GOSUB 10

I make a quick change to my program to use GOTO instead:

10 TIMER=0:TM=TIMER
20 FOR A=1 TO 1000
30 ON INSTR(" ABC",INKEY$) GOTO 40,70,80,90
40 NEXT
50 PRINT TIMER-TM:END
70 PRINT"A PRESSED":GOTO 40
80 PRINT"B PRESSED":GOTO 40
90 PRINT"C PRESSED":GOTO 40

For my timing test inside the FOR/NEXT loop, I made the first GOTO point to the NEXT in line 40, but if I wanted to wait “forever” until a valid key was pressed, I would make that 30.

This version shows a time of 412, so let’s compare that to doing it with A$:

10 TIMER=0:TM=TIMER
20 FORA=1 TO 1000
30 A$=INKEY$:IFA$="" THEN 40 ELSE ON INSTR("ABC",A$) GOTO 70,80,90
40 NEXT
50 PRINT TIMER-TM:END
70 PRINT"A PRESSED":GOTO 40
80 PRINT"B PRESSED":GOTO 40
90 PRINT"C PRESSED":GOTO 40

This produces 486. We now have a way to avoid using A$ and speed up code just a bit.

This made me wonder … what is faster? Using GOTO, or doing a GOSUB/RETURN? Let’s try to predict…

Both GOTO and GOSUB will have to take time to scan through the program to find the destination line, but GOSUB will also have to take time to store the “where are we” return location so RETURN can get back there. This makes me think GOSUB will be slower.

BUT, we need a GOTO to return from a GOTO, and a RETURN to return from a GOSUB. GOTO always has to scan through the program line by line to find the destination, while RETURN just jumps back to a location that was saved by GOSUB. So, if we have to scan through many lines, the return GOTO is probably slower than a RETURN.

Let’s try.

10 TIMER=0:TM=TIMER
20 FOR A=1 TO 1000
30 GOSUB 1000
40 NEXT
50 PRINT TIMER-TM:END
1000 RETURN

That prints 140.

10 TIMER=0:TM=TIMER
20 FOR A=1 TO 1000
30 GOTO 1000
40 NEXT
50 PRINT TIMER-TM:END
1000 GOTO 40

That prints 150.

I expect it is because line 1000 says “GOTO 40” and since 40 is lower than 1000, BASIC has to start at the top and go line by line looking for 40. If you GOTO to a higher number, it starts from the current line and moves forward. The speed of GOTO (and GOSUB) varies based on where the lines are:

GOTO should be quick when going to a line right after it:

10 TIMER=0:TM=TIMER
20 FOR A=1 TO 1000
30 GOTO 31
31 GOTO 32
32 GOTO 33
40 NEXT
50 PRINT TIMER-TM:END

That prints 170.

Line 30 has to scan one line down to find 31, then 31 scans one line down to find 32, and 32 scans one line down to find 40.

But if you change the order of the GOTOs:

10 TIMER=0:TM=TIMER
20 FOR A=1 TO 1000
30 GOTO 32
31 GOTO 40
32 GOTO 31
40 NEXT
50 PRINT TIMER-TM:END

…that prints 175.

It is the same number of GOTOs, but line 30 scans two lines ahead to find 32, then 32 has to start at the top and scan four lines in to find line 31, then line 31 has to scan two lines ahead to find 40.

If we add more lines (even REMs), more things have to be scanned:

0 REM
1 REM
2 REM
3 REM
4 REM
5 REM
6 REM
7 REM
8 REM
9 REM
10 TIMER=0:TM=TIMER
20 FOR A=1 TO 1000
30 GOTO 32
31 GOTO 40
32 GOTO 31
40 NEXT
50 PRINT TIMER-TM:END

We are now up to 192 and the one with the lines in order is still 170.

The more lines GOTO (or GOSUB) has to search, the slower it gets. So while MAYBE there might be a case where a GOTO could be quicker than a RETURN, it seems that even with a tiny program, GOSUB wins.

So … the question is, is the time we save doing the INKEY like this:

30 ON INSTR(" ABC",INKEY$) GOTO 40,70,80,90

…going to offset the time we lose because those functions all have to GOTO back, rather than using a RETURN?

If this was a normal “wait for a keypress” then it probably wouldn’t matter much. We are just waiting, so there is no time to save by making that faster.

If we were reading keys for an action game, the actual “is there a keypress?” code would be faster, giving more time for the actual program. But, every time a key was pressed, the time taken to get in and out of that code would be slower. I guess it depends on how often the key is pressed.

A game like Frogger, where a key would be pressed every time the frog jumps to the next spot, might be worse than a game like Pac-Man where you press a direction and then don’t press anything again until the character is at the next intersection to turn down.

I am not sure how I would benchmark that, yet, but let’s try this… We’ll modify the code so “” actually is honored as an action:

10 TIMER=0:TM=TIMER
20 FOR A=1 TO 1000
30 ON INSTR(" UDLR",INKEY$) GOTO 100,200,300,400,500
40 NEXT
50 PRINT TIMER-TM:END
100 REM IDLE LOOP
110 GOTO 40
200 REM MOVE UP
210 GOTO 40
300 REM MOVE DOWN
310 GOTO 40
400 REM MOVE LEFT
410 GOTO 40
500 REM MOVE RIGHT
510 GOTO 40

Now if no key is waiting (“”), INSTR will return a 1 causing the code to GOTO 100 where the background (keep objects moving, animate stars, etc.) action would happen. Any other value would go to the handler for up, down, left or right.

This prints 457. Doing the same thing with GOSUB:

10 TIMER=0:TM=TIMER
20 FOR A=1 TO 1000
30 ON INSTR(" UDLR",INKEY$) GOSUB 100,200,300,400,500
40 NEXT
50 PRINT TIMER-TM:END
100 REM IDLE LOOP
110 RETURN
200 REM MOVE UP
210 RETURN
300 REM MOVE DOWN
310 RETURN
400 REM MOVE LEFT
410 RETURN
500 REM MOVE RIGHT
510 RETURN

…prints 487.  It appears GOSUB/RETURN is slower for us than GOTO here. But why? GOSUB seemed faster in the first example. Is ON GOSUB slower than ON GOTO?

Quick side test:

10 TIMER=0:TM=TIMER
20 FOR A=1 TO 1000
30 ON 1 GOSUB 1000
40 NEXT
50 PRINT TIMER-TM:END
1000 RETURN

That prints 249. GOSUB by itself was 140, so ON GOSUB is much slower.

10 TIMER=0:TM=TIMER
20 FOR A=1 TO 1000
30 ON 1 GOTO 1000
40 NEXT
50 PRINT TIMER-TM:END
1000 GOTO 40

…also prints 249, and GOTO by itself was 150. I am a bit surprised by this. I will have to look in to this further for an explanation.

But I digress…

We can still slow down GOTO. If we had a bunch of extra lines for GOTO to have to scan through:

0 REM
1 REM
2 REM
3 REM
4 REM
5 REM
6 REM
7 REM
8 REM
9 REM

…that will slow down every GOTO to an earlier number. With those REMs added, we have:

GOTO/GOTO version with REMs: 473 (up from 457 without REMs)

GOSUB/RETURN version with REMs: 487 (it never passes through the REMs)

It appears that, while GOSUB/RETURN may be faster on it’s own, when I put it in this test program, GOTO/GOTO is slightly faster, but that can change depending on how big the program is. More research is needed…

So I guess, for now, I’m going to avoid using a variable for INKEY$ and use GOTO/GOTO for my BASIC game…

Until next time…

Optimizing Color BASIC, part 3

See also: Part 1 and Part 2.

Updates:

  • 2/14/2017 – Added some section headers, bolded benchmark values.

Since I am right in the middle of a multi-part article on interfacing assembly with BASIC, now is a great time to discuss something completely different.

INPUT, INKEY, INSTR, and POKE

The reason I do this now is because it is going to tie it in with the next part of the assembly article. Since I have been discussing using assembly to speed things up, it is a good time to address a few more things that can be done to speed up BASIC before resorting to 6809 code. Since BASIC will be the weakest link, we should try to make it as strong weak link.

In 1980, Color BASIC offered a simple way to input a string or number:

10 INPUT "WHAT IS YOUR NAME";A$
20 PRINT "HELLO, ";A$
30 INPUT "HOW OLD ARE YOU";A
40 PRINT A;"IS PRETTY OLD."

My First Input

The original INPUT command was a very simple way to get data in to a program, but it was quite limited. It didn’t allow for string input containing commas, for instance, unless you typed it in quotes:

INPUT hates commas.

INPUT likes quotes.

INPUT also prints the question mark prompt.

LINE INPUT

When Extended Color BASIC was introduced, it brought many new features including the LINE INPUT command. This command did not force the question mark prompt, and would accept commas without quoting it:

LINE INPUT likes everything.

If you were trying to write a text based program (or a text adventure game), INPUT or LINE INPUT would be fine.

INKEY$

For times when you just wanted to get one character, without requiring the characters to be echoed as the user types them, and without requiring the user to press ENTER, there was INKEY$.

INKEY$ returns whatever key is being pressed, or nothing (“”) if no key is ready.

10 PRINT "PRESS ANY KEY TO CONTINUE..."
20 IF INKEY$="" THEN 20
30 PRINT "THANK YOU."

It can also be used with a variable:

10 PRINT "ARE YOU READY? ";
20 A$=INKEY$:IF A$="" THEN 20
30 IF A$="Y" THEN PRINT "GOOD!" ELSE PRINT "BAD."

This is the method we might use for a keyboard-controlled BASIC video game. For instance, if we want to read the arrow keys (up, down, left and right), each one of those keys generates an ASCII character when pressed:

UP    - CHR$(94) - ^ character
DOWN  - CHR$(10) - line feed
LEFT  - CHR$(8)  - backspace
RIGHT - CHR$(9)  - tab

Knowing this, we can detect arrow keys using INKEY$:

10 CLS:P=256+16
20 PRINT@P,"*";
30 A$=INKEY$:IF A$="" THEN 30
40 IF A$=CHR$(94) AND P>31 THEN P=P-32
50 IF A$=CHR$(10) AND P<479 THEN P=P+32
60 IF A$=CHR$(8) AND P>0 THEN P=P-1
70 IF A$=CHR$(9) AND P<510 THEN P=P+1
80 GOTO 20

The above program uses PRINT@ to print an asterisk (“*”) in the middle of the screen. Then, it waits until a key is pressed (line 30). Once a key is pressed, it looks at which key it was (up, down, left or right) and then will move the position of the asterisk (assuming it’s not going off the end of the screen).

Side Note: The CoCo’s text screen is 32×16 (512 characters). PRINT@ can print at 0-511, but if you print to 511 (the bottom right location), the screen will scroll up. I have adjusted this code to disallow moving the asterisk to that location.

You now have a really crappy text drawing program. To make it less crappy, you could check for other keys to change the character that is being drawn, or make it use simple color graphics:

10 CLS0:X=32:Y=16:C=0
20 SET(X,Y,C)
30 A$=INKEY$:IF A$="" THEN 30
40 IF A$=CHR$(94) AND Y>0 THEN Y=Y-1
50 IF A$=CHR$(10) AND Y<31 THEN Y=Y+1
60 IF A$=CHR$(8) AND X>0 THEN X=X-1
70 IF A$=CHR$(9) AND X<63 THEN X=X+1
80 IF A$="C" THEN C=C+1:IF C>8 THEN C=0
90 GOTO 20

That program uses the primitive SET command to draw in beautiful 64×32 resolution with eight colors. The arrow keys move the pixel, and pressing C toggles through the colors. Spiffy!

Color graphics from 1980!

Instead of using “C” to just cycle through the colors, you could check the character returned and see if it was between “0” and “8” and use that value to set the color (0=RESET pixel, 1-8=SET pixel to color).

10 CLS0:X=32:Y=16:C=1
20 IF C=0 THEN RESET(X,Y) ELSE SET(X,Y,C)
30 A$=INKEY$:IF A$="" THEN 30
40 IF A$=CHR$(94) AND Y>0 THEN Y=Y-1
50 IF A$=CHR$(10) AND Y<31 THEN Y=Y+1
60 IF A$=CHR$(8) AND X>0 THEN X=X-1
70 IF A$=CHR$(9) AND X<63 THEN X=X+1
80 IF A$=>"0" AND A$<="8" THEN C=ASC(A$)-48
90 GOTO 20

Now that we have refreshed our 1980 BASIC programming, let’s look at lines 40-70 which are used to determine which key has been pressed.

If we are going to be reading the keyboard over and over for an action game, doing so with a bunch of IF/THEN statements is not very efficient. Lets do some tests to find out how not very efficient it is.

For our example, we would be using INKEY$ to read a keypress, then GOSUBing to four different subroutines to handle up, down, left and right actions. To see how fast this is, we will once again use the TIMER command and do our test 1000 times. We’ll skip doing the actual INKEY$ for now, and hard code a keypress. Since we will be checking for keys in the order of up, down, left then right, we will simulate pressing last key check, right, to get the worst possible condition.

Here is version 1 that does a brute-force check using IF/THEN/ELSE.

0 REM KEYBD1.BAS
10 TM=TIMER:FORA=1TO1000
15 A$=CHR$(9)
20 REM A$=INKEY$:IFA$=""THEN20
30 IFA$=CHR$(94)THENGOSUB100ELSEIFA$=CHR$(10)THENGOSUB200ELSEIFA$=CHR$(8)THENGOSUB300ELSEIFA$=CHR$(9)THENGOSUB400
50 NEXT:PRINT TIMER-TM
60 END
100 RETURN
200 RETURN
300 RETURN
400 RETURN

When I run this in the XRoar emulator, I get back 1821. That is now our benchmark to beat.

INSTR

Rather than doing a bunch of IF/THENs, if we are using Extended Color BASIC, there is the INSTR command. It will take a string and a pattern, and return the position of that pattern in the string. For example:

PRINT INSTR("CAT DOG RAT", "DOG")

If you run this line, it will print 5. The string “DOG” appears in “CAT DOG RAT” starting at position 5. You can use INSTR to parse single characters, too:

PRINT INSTR("ABCDEFGHIJ", "F")

This will print 6, because “F” is found in the search string starting at position 6.

If the search string is not found, it returns 0. Using this, you can parse a string containing all the possible keypress options, and turn them in to a number. You could then use that number in an ON GOTO/GOSUB statement, like this:

0 REM KEYBD2.BAS
10 A$=INKEY$:IF A$="" THEN 10
20 A=INSTR("ABCD",A$)
30 IF A=0 THEN 10
40 ON A GOSUB 100,200,300,400
50 GOTO 10
100 PRINT "A WAS PRESSED":RETURN
200 PRINT "B WAS PRESSED":RETURN
300 PRINT "C WAS PRESSED":RETURN
400 PRINT "D WAS PRESSED":RETURN

A long line of four IF/THEN/ELSE statements is now replaced by INSTR and ON GOTO/GOSUB.

Let’s rewrite our test program slightly, this time using INSTR:

10 TM=TIMER:FORA=1TO1000
15 A$=CHR$(9)
20 REM A$=INKEY$:IFA$=""THEN20
30 LN=INSTR(CHR$(94)+CHR$(10)+CHR$(8)+CHR$(9),A$)
35 IF LN=0 THEN 20
40 ONLN GOSUB100,200,300,400
50 NEXT:PRINT TIMER-TM
60 END
100 RETURN
200 RETURN
300 RETURN
400 RETURN

Running this gives me 1724. We are now slightly faster.

We can do better.

One of the reasons this version is so slow is line 30. Every time that line is processed, BASIC has to dynamically build a string containing the four target characters — CHR$(94), CHR$(10), CHR$(8) and CHR$(9). String manipulation in BASIC is slow, and we really don’t need to do it every time. Instead, let’s try a version 3 where we create a string containing those characters at the start, and just use the string later:

0 REM KEYBD3.BAS
5 KB$=CHR$(94)+CHR$(10)+CHR$(8)+CHR$(9)
10 TM=TIMER:FORA=1TO1000
15 A$=CHR$(9)
20 REM A$=INKEY$:IFA$=""THEN20
30 LN=INSTR(KB$,A$)
35 IF LN=0 THEN 20
40 ONLN GOSUB100,200,300,400
50 NEXT:PRINT TIMER-TM
60 END
100 RETURN
200 RETURN
300 RETURN
400 RETURN

Running this gives me 902! It appears to be twice as fast as the original IF/THEN version!

Speed comparisons…

Now we have a much faster way to handle the arrow keys. Let’s go back to the original program and update it:

0 REM INKEY3.BAS
5 KB$=CHR$(94)+CHR$(10)+CHR$(8)+CHR$(9)
10 CLS:P=256+16
20 PRINT@P,"*";
30 A$=INKEY$:IF A$="" THEN 30
40 LN=INSTR(KB$,A$)
50 ONLN GOSUB100,200,300,400
60 GOTO 20
100 IF P>31 THEN P=P-32
110 RETURN
200 IF P<479 THEN P=P+32:RETURN
210 RETURN
300 IF P>0 THEN P=P-1
310 RETURN
400 IF P<510 THEN P=P+1
410 RETURN

Side Note: Using GOSUB/RETURN may be slower than using GOTO, but that will be the subject of another installment.

Now that we have a faster keyboard input routine, let’s do one more thing to try to speed it up.

POKE

We are currently using PRINT@ to print a character on the screen in positions 0-510 (remember, we can’t print to the bottom right position because that will make the screen scroll). Instead of using PRINT, we can also use POKE to put a byte directly in to screen memory:

POKE location,value

Location is an address in the up-to-64K memory space (0-65535) and value is an 8-bit value (0-255).

Let’s see if it’s faster.

First, PRINT@ wants positions 0-511, and POKE wants an actual memory address. The 32 column screen is located from 1024-1535 in memory, so PRINT@0 is like POKE 1024. PRINT@511 is like POKE 1535. Let’s make some changes:

0 REM INKEY4.BAS
5 KB$=CHR$(94)+CHR$(10)+CHR$(8)+CHR$(9)
10 CLS:P=1024+256+16
20 POKE P,106
30 A$=INKEY$:IF A$="" THEN 30
40 LN=INSTR(KB$,A$)
50 ONLN GOSUB100,200,300,400
60 GOTO 20
100 IF P>1024+31 THEN P=P-32
110 RETURN
200 IF P<1024+479 THEN P=P+32:RETURN
210 RETURN
300 IF P>1024+0 THEN P=P-1
310 RETURN
400 IF P<1024+511 THEN P=P+1
410 RETURN

WARNING: While PRINT@ is safe (a bad value just generates an error), POKE is dangerous! If you POKE the wrong value, you could crash the computer. Instead of POKEing a character on the screen, you could accidentally POKE to memory that could crash the system.

This program will behave identically to the original, BUT since we are using POKE, we can now go all the way to the bottom right of the screen :) That is just one of the reasons we might use POKE over PRINT@.

But is it faster, or slower? Let’s find out…

10 TM=TIMER:FORA=1TO1000
20 PRINT@0,"*";
30 NEXT:PRINT TIMER-TM

…versus…

10 TM=TIMER:FORA=1TO1000
20 POKE1024,106
30 NEXT:PRINT TIMER-TM

The PRINT@ version shows 259, and the POKE version shows 655. POKE appears to be significantly slower. Some reasons could be:

  1. POKE has to translate four digits (1024) instead of just one (0) so that’s longer to parse.
  2. POKE has to also translate the value (106) where PRINT can probably just jump to the string that is in the quotes.

Let’s try to test this… By giving PRINT@ a three digit number, 510, it slows down from 259 to 424. Parsing that number is definitely part of the problem. Let’s eliminate the number parsing completely by using variables:

5 P=0
10 TM=TIMER:FORA=1TO1000
20 PRINT@P,"*";
30 NEXT:PRINT TIMER-TM

This gives us 229, so it’s a bit faster than the original. Now let’s try the POKE version:

5 P=1024:
10 TM=TIMER:FORA=1TO1000
20 POKEP,106
30 NEXT:PRINT TIMER-TM

This gives us 400, so it’s faster than the original 655, but still nearly twice as slow as using PRINT@. But wait, there’s still that 106 value. Let’s replace that with a variable, too.

5 P=1024:V=106
10 TM=TIMER:FORA=1TO1000
20 POKEP,V
30 NEXT:PRINT TIMER-TM

This slows it down from 229 to 234!  We are now almost as fast as PRINT@! But now the POKE version has to look up two variables, while the PRINT@ version only looks up one, so that might give PRINT@ an advantage. Let’s test this by making the PRINT@ version also use a variable for the character:

5 P=0:V$="*"
10 TM=TIMER:FORA=1TO1000
20 PRINT@P,V$;
30 NEXT:PRINT TIMER-TM

That slows it down to 231. This seems to indicate the speed difference is really not between PRINT@ and POKE, but between how much number conversion of variable lookup each needs to do. You can use PRINT@ without having to look up the string to print (“*”), but POKE always has to either convert a numeric value (106) or do a variable lookup (L).

So why bother with POKE if the only advantage, so far, is that you can POKE to the bottom right character on the screen?

Because PEEK.

PEEK lets us see what byte is at a specified memory location. If I were writing a game and wanted to tell if the player’s character ran in to an enemy, I’d have to compare the player position (X/Y address, or PRINT@ location) with the locations of all the other objects. The more objects you have, the more compares you have to do and the slower your program becomes.

For example, here’s a simple game where the player (“*”) has to avoid four different enemies (“X”):

0 REM GAME.BAS
5 KB$=CHR$(94)+CHR$(10)+CHR$(8)+CHR$(9)
10 CLS:P=256+16
15 E1=32:E2=63:E3=448:E4=479
20 PRINT@P,"*";:PRINT@E1,"X";:PRINT@E2,"X";:PRINT@E3,"X";:PRINT@E4,"X";
30 A$=INKEY$:IF A$="" THEN 30
40 LN=INSTR(KB$,A$):IF LN=0 THEN 30
45 PRINT@P," ";
50 ONLN GOSUB100,200,300,400
60 IF P=E1 OR P=E2 OR P=E3 OR P=E4 THEN 90
80 GOTO 20
90 PRINT@267,"GAME OVER!":END
100 IF P>31 THEN P=P-32
110 RETURN
200 IF P<479 THEN P=P+32:RETURN
210 RETURN
300 IF P>0 THEN P=P-1
310 RETURN
400 IF P<510 THEN P=P+1
410 RETURN

In this example, the four enemies (“X”) remain static in the corners, but if you move your player (“*”) in to one, the game will end.

It’s not much of a game, but with a few more lines you could make the enemies move around randomly or chase the player.

Take a look at line 60. Every move we have to compare the position of the player with four different enemies. This is a rather brute-force check. We could also use an array for the enemies. Not only would this simplify our code, but it would make the number of enemies dynamic.

Here is a version that lets you have as many enemies as you want. Just set the value of EN in line number 1 to the number of enemies-1.

Side Note: Arrays are base-0, so if you DIM A(10) you get 11 elements — A(0) through A(10). Thus, if you want ten elements in an array, you would do DIM A(9), and cycle through them using base-0 like FOR I=0 TO 9:PRINT A(I):NEXT I.

0 REM GAME2.BAS
1 EN=10-1 'ENEMIES
2 DIM E(EN)
5 KB$=CHR$(94)+CHR$(10)+CHR$(8)+CHR$(9)
10 CLS:P=256+16
15 FOR A=0 TO EN:E(A)=RND(510):NEXT
20 PRINT@P,"*";
25 FOR A=0 TO EN:PRINT@E(A),"X";:NEXT
30 A$=INKEY$:IF A$="" THEN 30
40 LN=INSTR(KB$,A$):IF LN=0 THEN 30
45 PRINT@P," ";
50 ONLN GOSUB100,200,300,400
60 FOR A=0 TO EN:IF P=E(A) THEN 90 ELSE NEXT
80 GOTO 20
90 PRINT@267,"GAME OVER!":END
100 IF P>31 THEN P=P-32
110 RETURN
200 IF P<479 THEN P=P+32:RETURN
210 RETURN
300 IF P>0 THEN P=P-1
310 RETURN
400 IF P<510 THEN P=P+1
410 RETURN

In 1980, this was a game.

Now the program is more flexible, but it has gotten slower. After every move, the code must now compare locations of every enemy. This limits BASIC from being able to do a fast game with a ton of objects.

Which brings me back to PEEK… Instead of comparing the player against every enemy, all we really need to know is if the location of the player is where an enemy is. If we are using POKE to put the player on the screen, we know the location the player is, and can just PEEK that location to see if anything is there.

Let’s change the program to use POKE and PEEK:

0 REM GAME2.BAS
1 EN=10-1 'ENEMIES
2 DIM E(EN)
5 KB$=CHR$(94)+CHR$(10)+CHR$(8)+CHR$(9)
10 CLS:P=1024+256+16:V=106:VS=96:VE=88
15 FOR A=0 TO EN:E(A)=1024+RND(511):NEXT
20 POKEP,V
25 FOR A=0 TO EN:POKEE(A),VE:NEXT
30 A$=INKEY$:IF A$="" THEN 30
40 LN=INSTR(KB$,A$):IF LN=0 THEN 30
45 POKEP,VS
50 ONLN GOSUB100,200,300,400
60 IF PEEK(P)=VE THEN 90
80 GOTO 20
90 PRINT@267,"GAME OVER!":END
100 IF P>1024+31 THEN P=P-32
110 RETURN
200 IF P<1024+479 THEN P=P+32:RETURN
210 RETURN
300 IF P>1024+0 THEN P=P-1
310 RETURN
400 IF P<1024+510 THEN P=P+1
410 RETURN

Now, instead of looping through an array containing the locations of all the enemies, we simply PEEK to our new player location and if the byte there is our enemy value (VE, character 88), game over.

It should be a bit faster now.

We should also change the “1024+XX” things t0 the actual values to avoid doing math each time, but I was being lazy.

Now we know a way to improve the speed of reading key presses, and ways to use POKE/PEEK to avoid having to do manual comparisons of object locations. Maybe this will come in handy someday when you need to write a game where an asterisk is being chased by a bunch of Xs.

Until next time…

Optimizing Color BASIC, part 2

See also: Part 1

Variable Placement

Last year, I posted an article dealing with Optimizing Color BASIC. In it, I covered a variety of techniques that could be done to speed up a BASIC program. While this was specifically written about the Microsoft Color BASIC on the Radio Shack Color Computers, I expect it may also apply to similar BASICs on other systems.

James J. left an interesting comment:

Oh yeah. If anyone cares to experiment with modifying the BASIC interpreter, it might be fun to make the symbol table “adaptive”. When you find a variable in the symbol table, if it’s not the first one, swap it with its predecessor. The idea is that the more frequently looked up symbols migrate towards the front and thus are more quickly found. The question is whether the migration improves things enough to make up for the swapping. – James J.

This got me curious as to how much of a difference this would make, so I did a little experiment.

In this Microsoft BASIC, variables get created when you first use them. Early on, I learned a tip that you could define all your variables at the start of your program and get that out of the way before your actual code begins. You can do this with the DIM statement:

DIM A,B,A$,B$

Originally, I thought DIM was only used to define an array, such as DIM A$(10).

I decided to use this to test how much of a difference variable placement makes. Variables defined first would be found quicker when you access them. Variables defined much later would take more time to find since the interpreter has to walk through all of them looking for a match.

Using the Xroar CoCo/Dragon emulator, I wrote a simple test program that timed two FOR/NEXT loops using two different variables. It looks like this:

In BASIC, variables defined earlier are faster.

As you can see, with just two variables, A and Z, there wasn’t much difference between the time it takes to use them in a small FOR/NEXT loop. I expect if the loop time was much later, you’d see more and more difference.

But what if there were more variables? I changed line 10 to define 26 different variables (A through Z) then ran the same test:

In BASIC, variables defined last take more time to find, so they are slower.

Now we see quite a bit of difference between using A and using Z. If I knew Z was something I would be using the most, I might define it at the start of the DIM. I did another test, where I defined Z first, and A last:

In BASIC, define the most-used variables first to speed things up.

As expected, now the Z variable is faster than A.

Every time BASIC has to access a variable, it makes a linear (I assume*) search through all the variables looking for a match.

Side Note: * There is an excellent Super/Disk/Extended/Color Basic Unraveled book set which contains fully commented disassemblies of the ROMs. I could easily stop assuming and actually know if I was willing to take a few minutes to consult these books.

However, when I first posted these results to the Facebook CoCo group, James responded there:

Didn’t realize it made that much difference–doesn’t the interpreter’s FOR loop stack remember the symbol table entry for the control variable? – James J.

Indeed, this does seem to be a bad test. FOR/NEXT does not need the variable after the NEXT. If you omit the variable (just using NEXT by itself), it does not need to do this lookup and both get faster:

NEXT without a variable is faster.

I guess I need a better test.

How about using the variable directly, such as simple addition?

Variable addition is slower for later variables.

Z, being defined at the end, is slower. And if we reverse that (see line 10, defining Z first), Z becomes faster:

Variable addition is faster for earlier variables.

You can speed up programs by defining often-used variables earlier.

James’ suggestion about modifying the interpreter to do this automatically is a very interesting idea. If it continually did it, the program would adapt based on current usage. If it entered a subroutine that did a bunch of work, those variables would become faster, then when it exited and went back to other code, those variables would become faster.

I do not know if the BASIC language lasted long enough to ever evolve to this level, but it sure would be fun to apply these techniques to the old 8-bit machines and see how much better (er, faster) BASIC could become.

Thanks for the comment, James!

Interfacing assembly with BASIC via DEFUSR, part 4

See also: Part 1, Part 2 and Part 3.

Before we get started, a few comments from the previous installment.

JohnStrong (StrongWare) chimed in on Facebook with another improvement to the screen clearing assembly code. He suggested using a 16-bit register to blast bytes to the screen instead of doing it 8-bits at a time. It looks like this:

* CLEARX.ASM v1.02
* by Allen C. Huffman of Sub-Etha Software
* www.subethasoftware.com / alsplace@pobox.com
*
* 1.01 use TSTA instead of CMPD per L. Curtis Boyle
* 1.02 use STDD for 16-bit copy per John Strong
*
* DEFUSRx() clear screen to character routine
*
* INPUT:   ASCII character to clear screen to
* RETURNS: 0 is successful
*         -1 if error
*
* EXAMPLE:
*   CLEAR 200,&H3F00
*   DEFUSR0=&H3F00
*   A=USR0(42)
*   PRINT A
*
ORGADDR EQU $3f00

INTCNV EQU $B3ED   * 46061
GIVABF EQU $B4F4   * 46324

       org  ORGADDR
start  jsr  INTCNV * get passed in value in D
                   * D is made up of A and B, so if
                   * A has anything in it, it must be
		   * greater than 255.
       tsta        * test for zero
       bne  error  * branch if it is not zero
       ldx  #$400  * load X with start of 
loop   stb  ,x+    * store B register at X and increment X
       tfr  b,a    * transfer B to A
loop   std  ,x++   * store D (A and B) then increment X twice
       cmpx #$600  * compare X to end of screen
       bne  loop   * if not there, keep looping
       bra  return * done
error  ldd  #-1    * load D with -1 for error code
return jmp  GIVABF * return to caller

* lwasm --decb -o clearx3.bin clearx3.asm
* lwasm --decb -f basic -o clearx3.bas clearx3.asm
* decb copy -2 -r clearx3.bin ../Xroar/dsk/DRIVE0.DSK,CLEARX3.BIN

This change takes the byte to clear the screen to (in register B) and duplicates it (to register A), which affects the 16-bit register D since it is made up of A and B combined. Thus, if the desired byte is $2A (42), register D ends up being $2A2A.

Then, instead of copying one byte at a time, the loop copies two bytes at a time. The end result should be a faster screen clear. This ends up being two bytes larger than the second version, but still one byte smaller than my original version:

clearx.hex:
:103F0000BDB3ED108300FF2E0C8E0400E7808C06FD
:0B3F10000026F92003CCFFFF7EB4F474

clearx2.hex:
:103F0000BDB3ED4D260C8E0400E7808C060026F92B
:083F10002003CCFFFF7EB4F496

clearx3.hex
:103F0000BDB3ED4D260E8E04001F98ED818C06008A
:0A3F100026F92003CCFFFF7EB4F475

Cool! Thanks, John.

Meanwhile, back at the article…

So far, we have looked at interfacing assembly language with BASIC to do some useless things (add one to a number), questionably useful things (clear screen to any given character), and actually useful things (high speed uppercasing of text).

In this installment, we will try to do something else actually useful: move the screen around.

But first, let me digress a bit.

The cross compiler I use, lwtools by Lost Wizard Enterprises, is able to compile code to run under COLOR BASIC or OS-9/NitrOS-9. It also has some other options I just learned about (thanks, William!) that I wanted to mention.

Previously, I shared a small bit of assembly that would clear the 32-column screen to any specified character:

ORGADDR EQU $3f00

GIVABF EQU  $B4F4  * 46324
INTCNV EQU  $B3ED  * 46061

       org  ORGADDR
start  jsr  INTCNV * get passed in value in D
                   * D is made up of A and B, so if
                   * A has anything in it, it must be
                   * greater than 255.
       tsta        * test for zero
       bne  error  * branch if it is not zero
       ldx  #$400  * load X with start of screen
loop   stb  ,x+    * store B register at X and increment X
       cmpx #$600  * compare X to end of screen
       bne  loop   * if not there, keep looping
       bra  return * done
error  ldd  #-1    * load D with -1 for error code
return jmp  GIVABF * return to caller

NOTE: This article is using version 2, from the previous article, and does not include John Strong’s updates.

I have been compiling these to .BIN files, copying them over to a disk image, and then loading them in the XRoar emulator. It turns out, the lwasm also has another output option: BASIC. It will actually generate a short BASIC program that will POKE that assembly code in to memory! You use the format (-f) option like this:

lwasm --decb -f basic -o clearx2.bas clearx2.asm

This would assemble clearx2.asm and output it as a BASIC program! It looks like this:

10 READ A,B
20 IF A=-1 THEN 70
30 FOR C = A TO B
40 READ D:POKE C,D
50 NEXT C
60 GOTO 10
70 END
80 DATA 16128,16151,189,179,237,77,38,12,142,4,0,231,128,140,6,0,38,249,32,3,204
,255,255,126,180,244,-1,-1

The assembly is turned in to data statements, and it appears this is even capable of handling programs with multiple ORG statements. The DATA begins with the start memory location and the end memory location for a block of code, and then the actual code bytes. Clever.

This would be an easy way to add assembly code to your BASIC program without needing to LOADM/CLOADM a separate .BIN file. It will also give us a simple way to test this code in the XRoar emulator without copying files to a disk image (more on this in a moment).

But I digress.

Scroll With It, Baby

In all the examples I have shown so far, any parameter passed in was used to do something — a value to add to, a character to clear the screen to, or a string to print in uppercase.

The USR command allows for up to 10 functions to be defined (USR0 through USR9). This lets you easily have ten different assembly routines to call. However, you can also just use the parameter passed in to handle multiple functions.

Suppose you wanted to write a simple maze game using the 32 column text screen. You could limit your maze to be 32×16 (the size of the screen), or you could try to have a much larger maze and scroll it within the viewable screen area.

Scrolling UP is easy  … you just print something at the bottom of the screen, and BASIC moves the whole screen up. Try this:

10 PRINT TAB(RND(30));".":GOTO 10

That code will tab over a random number of spaces (0 to 30) and print a period. Over and over and over. If you run this, you see a cheesy scrolling star field (if stars were black and space was nuclear green).

Scrolling stars!

There was a famous Commodore BASIC program that did something similar using the PETASCII slash characters to generate a maze. There has even been an entire book written about this one liner:

*** COMMODORE 64 CODE ***
10 PRINT CHR$(205.5+RND(1)); : GOTO 10

The CoCo does not have the Commodore character set, but we do have “/” and “\” so we could try this:

10 PRINT CHR$(47+(RND(2)-1)*45);:GOTO 10

This will print either CHR$(47) (a slash) or randomly add 45 to print CHR$(92) (a backslash). We get a similar endless maze that scrolls up, but doesn’t look nearly as nice as the one on the Commodore.

Scrolling maze… Sorta.

See? Easy.

I expect I wasn’t the only kid who wrote simple space games like this, with the ship at the top and objects traveling up the screen towards it.

I think I may be digressing again, so let me get back to the main point.

If we wanted to scroll in the other direction, we could try to do it in BASIC by copying every byte down one line. Here is an attempt to do that by using PEEK and POKE:

10 CLS
20 REM SCROLL UP
30 FOR A=1 TO 100
40 PRINT TAB(RND(30));"."
50 NEXT
60 REM SCROLL DOWN
70 FOR A=1 TO 100
80 PRINT@0,TAB(RND(30));"."
90 GOSUB 2000 'DOWN
100 NEXT
999 GOTO 999
2000 REM SCROLL DOWN
2010 FOR Z=1535-32 TO 1024 STEP-1
2020 POKE Z+32,PEEK(Z)
2030 NEXT
2040 RETURN

XROAR TIP: If you want to try this out in the XRoar emulator, save the above listing out as a text file with the extension of .asc (“scrolldown.asc”). If you do that, in XRoar you can do “File -> Load” and point it to this file. Then, that file will act like a cassette with an ASCII program on it! You can then type “CLOAD” and load the program, without needing to transfer it to a disk image.

This program will let the stars scroll up the screen (100 lines worth) using normal PRINT, then it will try to make them scroll down the screen (100 times) using a PEEK/POKE subroutine.

Scrolling down is painfully slow this way. You can see this would be no way to write a game.

Side Note: If I were trying to write a “space ship flying through space” game, I would just draw the individual stars and other objects, moving them each time, instead of redrawing the entire screen. But that’s not the point of this silly code.

And, if we wanted to also scroll the screen left and right, we’d need similar (and painfully slow) code. Here is a brute-force BASIC program that attempts to move the screen in each direction using POKE and PEEK:

10 CLS
20 FOR A=1 TO 14
30 PRINT @32*A+A,"SCROLLING IS HARD"
40 NEXT
50 GOSUB 1000 'UP
60 GOSUB 2000 'DOWN
70 GOSUB 3000 'LEFT
80 GOSUB 4000 'RIGHT
999 GOTO 999
1000 REM SCROLL UP
1010 FOR A=1024+32 TO 1535
1020 POKE A-32,PEEK(A)
1030 NEXT
1040 RETURN
2000 REM SCROLL DOWN
2010 FOR A=1535-32 TO 1024 STEP-1
2020 POKE A+32,PEEK(A)
2030 NEXT
2040 RETURN
3000 REM SCROLL LEFT
3010 FOR A=1024+1 TO 1535-1
3020 POKE A,PEEK(A+1)
3030 NEXT
3040 RETURN
4000 REM SCROLL RIGHT
4010 FOR A=1535-1 TO 1024 STEP-1
4020 POKE A+1,PEEK(A)
4030 NEXT
4040 RETURN

If you run this, you see it prints a message down the screen, then SLOWLY moves every byte up, then back down, then left, then right. It is very slow. It also leaves leftover characters on the edge of the screen, with the idea being you would be drawing new characters over there if you were making a maze or something scroll.

It’s not elegant, nor is it pretty. Or useful.

Obviously, doing this to scroll a screen is not practical. Clever programmers will try to make large strings and then just print them in the proper position. It’s mush faster letting the BASIC ROM do the work for you. Here’s an example that will scroll a PAC-MAN maze up and down the screen:

10 DIM MZ$(31)
20 FOR A=0 TO 30:READ MZ$(A):NEXT
30 CLS
40 REM SCROLL MAZE DOWN
50 FOR ST=0 TO 15
60 FOR LN=0 TO 15
70 PRINT @LN*32,MZ$(LN+ST);
80 NEXT:NEXT
90 REM SCROLL MAZE UP
100 FOR ST=15 TO 0 STEP-1
110 FOR LN=0 TO 15
120 PRINT @LN*32,MZ$(LN+ST);
130 NEXT:NEXT
140 GOTO 40
999 GOTO 999
1000 DATA "XXXXXXXXXXXXXXXXXXXXXXXXXXXX"    
1010 DATA "X            XX            X"    
1020 DATA "X XXXX XXXXX XX XXXXX XXXX X"    
1030 DATA "X XXXX XXXXX XX XXXXX XXXX X"    
1040 DATA "X XXXX XXXXX XX XXXXX XXXX X"    
1050 DATA "X                          X"
1060 DATA "X XXXX XX XXXXXXXX XX XXXX X"   
1070 DATA "X XXXX XX XXXXXXXX XX XXXX X"    
1080 DATA "X      XX    XX    XX      X"   
1090 DATA "XXXXXX XXXXX XX XXXXX XXXXXX"    
2100 DATA "     X XXXXX XX XXXXX X     "    
2110 DATA "     X XX          XX X     "    
2120 DATA "     X XX XXXXXXXX XX X     "   
2130 DATA "XXXXXX XX X      X XX XXXXXX"   
2140 DATA "          X      X          "   
2150 DATA "XXXXXX XX X      X XX XXXXXX"   
2160 DATA "     X XX XXXXXXXX XX X     "   
2170 DATA "     X XX          XX X     "   
2180 DATA "     X XX XXXXXXXX XX X     "   
2190 DATA "XXXXXX XX XXXXXXXX XX XXXXXX"   
3200 DATA "X            XX            X"   
3210 DATA "X XXXX XXXXX XX XXXXX XXXX X"   
3220 DATA "X XXXX XXXXX XX XXXXX XXXX X"   
3230 DATA "X   XX                XX   X"   
3240 DATA "XXX XX XX XXXXXXXX XX XX XXX"   
3250 DATA "XXX XX XX XXXXXXXX XX XX XXX"   
3260 DATA "X      XX    XX    XX      X"   
3270 DATA "X XXXXXXXXXX XX XXXXXXXXXX X"   
3280 DATA "X XXXXXXXXXX XX XXXXXXXXXX X"   
3290 DATA "X                          X"   
4200 DATA "XXXXXXXXXXXXXXXXXXXXXXXXXXXX"

If you run this, you will see an ASCII maze that is 31 lines tall get scrolled up and down the 16 line screen. Using PRINT to blast out a string of bytes is much faster than PEEK and POKE.

Fancy BASIC programmers would use this trick, storing all their characters in strings and printing them on the screen. If you want to add left and right scrolling, you could do that with longer strings and MID$ to just print the middle 32 characters of the string.

But I digress. Again.

While there are ways to do simulate screen scrolling somewhat fast in BASIC, assembly language will still be much faster. I present this simple code that has assembly versions of the BASIC code I presented earlier. Instead of having four different subroutines to GOSUB to, you can call it by using USR0(z) and giving it a direction code (1=up, 2=down, 3=left and 4=right).

It looks like this:

* SCRNMOVE.ASM v1.00
* by Allen C. Huffman of Sub-Etha Software
* www.subethasoftware.com / alsplace@pobox.com
*
* DEFUSRx() screen moving function
*
* INPUT:   direction (1=up, 2=down, 3=left, 4=right)
* RETURNS: 0 on success
*         -1 if invalid direction
*
* EXAMPLE:
*   CLEAR 200,&H3F00
*   DEFUSR0=&H3F00
*   A=USR0(1)
*
ORGADDR EQU $3f00

INTCNV EQU  $B3ED * 46061
GIVABF EQU  $B4F4 * 46324

UP     EQU  1
DOWN   EQU  2
LEFT   EQU  3
RIGHT  EQU  4
SCREEN EQU  1024 * top left of screen
END    EQU  1535 * bottom right of screen

     org   ORGADDR

start jsr  INTCNV * get incoming param in D
     cmpb  #UP
     beq   up
     cmpb  #DOWN
     beq   down
     cmpb  #LEFT
     beq   left
     cmpb  #RIGHT
     beq   right
     bra   error

up   ldx   #SCREEN+32
loopup lda  ,x
     sta    -32,x
     leax   1,x
     cmpx   #END
     ble    loopup
     bra    return

down ldx    #END-32
loopdown    lda ,x
     sta    32,x
     leax   -1,x
     cmpx   #SCREEN
     bge    loopdown
     bra    return

left ldx    #SCREEN+1
loopleft    lda ,x
     sta    -1,x
     leax   1,x
     cmpx   #END
     ble    loopleft
     bra    return

right ldx   #END-1
loopright   lda ,x
     sta    1,x
     leax   -1,x
     cmpx   #SCREEN
     bge    loopright
     bra    return

error ldd   #-1    * load D with -1 for error code
     bra    exit
    
return ldd  #0
exit jmp  GIVABF      

* lwasm --decb -9 -o scrnmove.bin scrnmove.asm
* lwasm --decb -f basic -o scrnmove.bas scrnmove.asm
* decb copy -2 -r scrnmove.bin ../Xroar/dsk/DRIVE0.DSK,SCRNMOVE.BIN

If I use the “-f basic” option, I can produce a BASIC loader with DATA statements that contain the assembly language routines. I then renumbered them and made them a subroutine so at the top of the example program I can GOSUB to it, then install and use the routine.

1 CLEAR 200,&H3F00
2 GOSUB 1000
3 DEFUSR0=&H3F00
10 CLS
20 FOR A=1 TO 14
30 PRINT @32*A+A,"SCROLLING IS HARD"
40 NEXT
50 Z=USR0(1) 'UP
60 Z=USR0(2) 'DOWN
70 Z=USR0(3) 'LEFT
80 Z=USR0(4) 'RIGHT
999 GOTO 999
1000 REM LOAD ASM ROUTINE
1010 READ A,B
1020 IF A=-1 THEN 1070
1030 FOR C = A TO B
1040 READ D:POKE C,D
1050 NEXT C
1060 GOTO 1000
1070 RETURN
1080 DATA 16128,16225,189,179,237,193,1,39,14,193,2,39,27,193,3,39,40,193,4,39,52,32,66,142,4,32,166,132,167,136,224,48,1,140,5,255,47,244,32,54,142,5,223,166,132,167,136,32,48,31,140,4,0,44,244,32,37,142,4,1,166,132,167,31,48,1,140,5,255,47,245,32,21
1090 DATA 142,5,254,166,132,167,1,48,31,140,4,0,44,245,32,5,204,255,255,32,3,204,0,0,126,180,244,-1,-1

If you run this, you will see the screen jump and then it will look like the original example looked…it just happens almost instantly instead of taking minutes.

Now let’s try the star scrolling example again. Instead of GOSUBing to slow BASIC routines, we will use the assembly scroll up and down routines:

1 CLEAR 200,&H3F00
2 GOSUB 1000
3 DEFUSR0=&H3F00
5 SP$=STRING$(31," ")
10 CLS
20 REM SCROLL UP
30 FOR A=1 TO 100
35 PRINT @32*15,SP$;
40 PRINT @32*15,TAB(RND(30));".";
45 Z=USR0(1) 'UP
50 NEXT
60 REM SCROLL DOWN
70 FOR A=1 TO 100
80 PRINT@0,TAB(RND(30));"."
90 Z=USR0(2) 'DOWN
100 NEXT
110 GOTO 20
999 GOTO 999
1000 REM LOAD ASM ROUTINE
1010 READ A,B
1020 IF A=-1 THEN 1070
1030 FOR C = A TO B
1040 READ D:POKE C,D
1050 NEXT C
1060 GOTO 1000
1070 RETURN
1080 DATA 16128,16225,189,179,237,193,1,39,14,193,2,39,27,193,3,39,40,193,4,39,52,32,66,142,4,32,166,132,167,136,224,48,1,140,5,255,47,244,32,54,142,5,223,166,132,167,136,32,48,31,140,4,0,44,244,32,37,142,4,1,166,132,167,31,48,1,140,5,255,47,245,32,21
1090 DATA 142,5,254,166,132,167,1,48,31,140,4,0,44,245,32,5,204,255,255,32,3,204,0,0,126,180,244,-1,-1

You will notice scrolling up and down now go at the same speed, but it is slightly slower than the normal BASIC PRINT scroll up. This is because of line 35 and 75 that use a PRINT statement to erase a line before the screen scrolls. This is because my simple assembly routines don’t bother to do this (neither did the BASIC version).

If the usage is known, the assembly can easily be made to clear out whichever roll of column is being moved. Doing it inside the routine will be much faster than using a PRINT command (and, PRINT doesn’t help us if the screen is scrolling left or right).

Can we do better? I think so.

Next time … let’s make another pass over this screen scrolling routine and see if we can make it do something more useful.

Interfacing assembly with BASIC via DEFUSR, part 3

See also: Part 1 and Part 2.

A quick update on some code listed in the previous installment. I mentioned that the user was passing in an integer that represented which character (a byte) the screen would be cleared to. It would be passed in as a 16-bit value (register D). Since the screen characters were one byte, a check was added in case the value passed in was larger than 255 (cmpd #255).

In the comments, Justin chimed in:

You could also just do a clra to force the issue and avoid the compare and branch. – Justin

Justin’s suggestion would make the code smaller and ignore the first byte of register D. Thus, if the user did pass in anything higher, it would just chop off the excess. In binary, if the user passed in a value from 0-255, only bits would be set in register B. When the value was larger than 255, it would start setting bits in register A:

     Reg A     |     Reg B
0 0 0 0 0 0 0 0|0 0 0 0 0 0 0 0 = Reg D is 0
0 0 0 0 0 0 0 0|0 0 0 0 0 0 0 1 = Reg D is 1
0 0 0 0 0 0 0 0|1 1 1 1 1 1 1 1 = Reg D is 255
0 0 0 0 0 0 0 1|0 0 0 0 0 0 0 0 = Reg D is 256

If we just “clra”, we ensure the routine will never get a value greater than 8-bits. However, the user will get unexpected results. If they tried to pass in 256 (see above), register A would be cleared, and the value the routine would use would be 0. “Garbage in, garbage out!”

However, if error checking is desired, we still need to do a compare. L. Curtis Boyle suggested:

You could use TSTA. instead of CMPA #$00 to save a byte. – L. Curtis B.

I looked up the TST instruction, and it seems to test a byte in memory location or the A or B register and set some condition code register (CC) bits. If the high bit 7 is set, the CC register’s N bit will be set (testing for a negative value). If any bits are set, the CC register’z Z (zero) bit will be set (not zero). Hopefully I have that correct. The key point here is you can use TST to check for zero, and TSTA is a smaller instruction than CMPD. Here is the code:

* CLEARX.ASM v1.01
* by Allen C. Huffman of Sub-Etha Software
* www.subethasoftware.com / alsplace@pobox.com
*
* 1.01 use TSTA instead of CMPD per L. Curtis Boyle
*
* DEFUSRx() clear screen to character routine
*
* INPUT: ASCII character to clear screen to
* RETURNS: 0 is successful
* -1 if error
*
* EXAMPLE:
* CLEAR 200,&H3F00
* DEFUSR0=&H3F00
* A=USR0(42)
* PRINT A
*
ORGADDR EQU $3f00

GIVABF EQU  $B4F4  * 46324
INTCNV EQU  $B3ED  * 46061

       org  ORGADDR
start  jsr  INTCNV * get passed in value in D
       cmpd #255   * compare passed in value to 255
       bgt  error  * if greater, error
                   * D is made up of A and B, so if
                   * A has anything in it, it must be
                   * greater than 255.
       tsta        * test for zero
       bne  error  * branch if it is not zero
       ldx  #$400  * load X with start of screen
loop   stb  ,x+    * store B register at X and increment X
       cmpx #$600  * compare X to end of screen
       bne  loop   * if not there, keep looping
       bra  return * done
error  ldd  #-1    * load D with -1 for error code
return jmp  GIVABF * return to caller

* lwasm --decb -o clearx2.bin clearx2.asm
* decb copy -2 -r clearx2.bin ../Xroar/dsk/DRIVE0.DSK,CLEARX2.BIN

When I build this in to a .BIN file, the original showed 37 bytes, and this version shows 34 bytes. Here are the hex bytes that were generated:

clearx.hex:
:103F0000BDB3ED108300FF2E0C8E0400E7808C06FD
:0B3F10000026F92003CCFFFF7EB4F474

clearx2.hex:
:103F0000BDB3ED4D260C8E0400E7808C060026F92B
:083F10002003CCFFFF7EB4F496

It appears to save three bytes. Curtis mentioned saving one byte which I think is the case between a “CMPA #0” and “TSTA”.

Best of all, with this change, it still works and rejects larger values:

CLEARX2: Electric Boogaloo

Thanks, Justin and Curtis, for those suggestions.

By the way, I know programmers often don’t bother with error checking. I mean, our code is perfect, right? And, clearing a screen is hardly anything that requires error checking. And while I agree, I noticed that even COLOR BASIC has error checking for it’s CLS command:

CLS with a value greater than 255 returns a Function Call error.

And, since the CoCo’s VDG chip supported nine colors, you only get colors for CLS 0 through CLS 8. If you try to clear to any value between 9 and 255, you get an easter egg:

CLS 9 through 255 present a Microsoft easter egg.

Bonus Question: There is also an additional CLS easter egg in the CoCo 3’s BASIC, Do you know what it is?

But I digress…

String Theory

You can really speed up a BASIC program by using assembly routines. For instance, while BASIC has great string manipulation routines, doing something simple like converting a string to uppercase can be painfully slow.

Suppose you were trying to write a text-based program and you wanted it to work on all Color Computer models. The original Color Computer 1 and early Color Computer 2 models could not display true lowercase – they displayed inverse characters instead. Later Tandy-branded CoCo 2s and the CoCo 3 could support lowercase.

To work on all systems, you might simply choose to put all your menu text in UPPERCASE. Or, you might store every string twice with an uppercase and mixed case version, and use a variable to know which one to print:

IF UC=1 PRINT "ENTER YOUR NAME:" ELSE PRINT "Enter your name:"

That would be one brute-force way to do it, but if your program used many strings, it would needlessly increase the size of your program. Instead, it might make sense to store all the strings in mixed case, and convert them to uppercase on output if needed.

Here is a very simple brute-force subroutine that does just this:

BASIC uppsercase subroutine.

And it works just fine…

Output of BASIC uppercase subroutine.

…but it’s slow. If no conversion is needed, the mixed case text instantly appears, but when conversion is needed, it crawls through the line character-by-character at speeds we haven’t seen text display at since the days of dial-up BBSes.

In an earlier series of articles, we discussed word-wrap routines in BASIC. Several folks contributed their versions, and we ranked them based on code size, RAM size and speed. There were many different approaches to the same thing, and the same applies to uppercasing a string, so please don’t take my brute-force example as the best way it can be done. It certainly isn’t, and can surely be improved.

But even the fastest BASIC routine won’t compare to doing the same thing in assembly.

Unfortunately, the USRx() command only allows you to pass in a numeric value, and not a string, so we can’t simply do something like:

A$=USR0("Convert this to all uppercase.") 'THIS WILL NOT WORK!

Pity. But, I was able to find the solution, and it involves another BASIC command known as VARPTR. This command gets the address of a variable in memory. You may recall that Darren Atkinson (creator of the CoCoSDC interface) used VARPTR in his version of the word-wrap routine:

http://subethasoftware.com/2015/01/05/more-basic-word-wrap-versions/

This is our solution to passing in a string to USRx(). We can pass in the address of the string, and then the assembly code can figure it out from there. Here is how it works:

A$="This is a string in memory"
X = VARPTR(A$)
PRINT "A$ IS LOCATED AT ";X

If you run that code, you will see the address of that string in memory. We just need to understand how a string is stored.

The address does not point to the actual string data. Instead, it points to a few bytes of information that describe the string and where it is.

The first byte where the string is stored will be the size of that string:

A$="THIS IS A STRING IN MEMORY"
X = VARPTR(A$)
PRINT "A$ IS LOCATED AT";X
PRINT "A$ IS";PEEK(X);"LONG"

I forget what the second byte is used for, but bytes three and four are the actual address of the string character data:

PRINT "STRING DATA IS AT";PEEK(X+2)*256+PEEK(X+3)

On my system, it looks like this:

VARPTR of a string.

Once you know the actual starting place for the string data, you can see what that is in memory. In my case, the string length was 26 bytes, and the data started at 32709. I could use a FOR/NEXT loop and display the contents of that memory:

VARPTR string data example.

You will notice that the string information (length of string, location of string characters) is nowhere near the actual string data is. This is because the string characters could actually be in your program code, rather than in string memory. For example:

10 A$="THIS STRING IS IN THE PROGRAM"

Somewhere in RAM will be a string identification block with the length of the string and an address that points inside the program space. This makes it sort of an “embedded string” that lives inside your program. However, if you manipulate this string, BASIC will then make a copy of it in other memory and make the pointer go there. Thus, if you have a 10 character string like this:

10 A$="1234567890"

…and inside your code you do something like this:

20 A$=A$+"!"

…at that point, BASIC will no longer be pointing A$ to inside your code. It will be copied (pluy the “!”) to a new memory location inside of string space:

10 A$="1234567890"
20 GOSUB 1000
30 A$=A$+1
40 GOSUB 1000
50 END
1000 X=VARPTR(A$)
1010 PRINT"VARP:";X
1020 PRINT"SIZE:";PEEK(X)
1030 PRINT"LOC :";PEEK(X+2)*256+PEEK(X+3)
1040 PRINT
1050 RETURN

Here you can see that the location of the string (initially inside the program code space) moves to higher string memory RAM:

VARPTR shows you where the string moves to.

You won’t see memory decrease when this happens, because print MEM is showing you available program space. Strings live in a special section at the end of program memory. You may have seen the CLEAR command uses to reserve space for strings like this:

CLEAR 200

I believe 200 is the default if you don’t specify. In this case, the string started out inside the program’s code space and was not using any of that 200 bytes, and then after altering the string, it was copied in to the 200 bytes of string space.

Thus, if you want to see the impact, try running with “CLEAR 0” so there is NO ROOM for strings!

5 CLEAR 0 ' NO STRING SPACE

Now when we run that program, we see that the initial string works, because it is stored inside the program space, but the moment we try to add one character to it, there is no string memory available to copy the string to and it fails with an ?OS ERROR (out of string space).

?OS ERROR showing strings move from code space to string space.

This is something to be aware of if you are ever writing large programs with many strings. Rather than do something like this:

10 A$="[DELTA BBS]"
20 B$="MAIN MENU"
30 C$=" >"
40 PR$=A$+B$+C$:PRINT PR$

…which would then allocate string space to hold the length of A$, B$ and C$, you could keep those strings in program code space by just printing them out each time:

40 PRINT A$;B$;C$

The trick is to avoid BASIC having to allocate string memory and copy things over. If you need to do this, you can re-use a temporary string:

40 TS$=A$+B$+C$:GOSUB 1000:TS$=""

I think something like that would create a temporary string (TS) and copy all those code space strings over, then you could use it, and then setting it back to “” at the end would release that memory. If string memory is limited, tricks like this can really help out.

But I digress.

Now that we know how strings are stored, we can create an assembly routine that will serve as a UPPERCASE PRINT command.

Our assembly routine will be passed the address of the string, and then use byte 1 to get the length, and bytes 3 and 4 to get the location of the actual string characters. We can then walk through that memory and use the CHROUT ROM routine to output each character one-by-one, the same way BASIC does for PRINT.

Here is the routine:

* UCASE.ASM v1.00
* by Allen C. Huffman of Sub-Etha Software
* www.subethasoftware.com / alsplace@pobox.com
*
* DEFUSRx() uppercase output function
*
* INPUT: VARPTR of a string
* RETURNS: # chars processed
*
* EXAMPLE:
* CLEAR 200,&H3F00
* DEFUSR0=&H3F00
* A$="Print this in uppercase."
* PRINT A$
* A=USR0(VARPTR(A$))
*
ORGADDR EQU $3f00
dir
GIVABF EQU   $B4F4    * 46324
INTCNV EQU   $B3ED    * 46061
CHROUT EQU   $A002

       org   ORGADDR
start  jsr   INTCNV   * get passed in value in D
       tfr   d,x      * move value (varptr) to X
foo    ldb   ,x       * load string len to B
       ldy   2,x      * load string addr to Y
       beq   null     * exit if strlen is 0
       ldx   #0       * clear X (count of chars conv)
loop   lda   ,y       * load char in A
       cmpa  #'a      * compare to lowercase A
       blt   nextch   * if less, no conv needed
       cmpa  #'z      * compare to lowercase Z
       bgt   nextch   * if greater, no conv needed
lcase  suba  #32      * subtract 32 to make uppercase
       leax  1,x      * inc count of chars converted
nextch jsr   [CHROUT] * call ROM output character routine
       leay  1,y      * increment Y pointer
cont   decb           * decrement counter
       beq   exit     * if 0, go to exit
       bra   loop     * go to loop
exit   tfr   x,d      * move chars conv count to D
       bra   return   * return D to caller
null   ldd   #-1      * load -2 as error
return jmp   GIVABF   * return to caller

* lwasm --decb -o ucase.bin ucase.asm
* decb copy -2 -r ucase.bin ../Xroar/dsk/DRIVE0.DSK,UCASE.BIN

W will call our assembly routine like this:

A$="Convert this to uppercase."
A=USR0(VARPTR(A$))

And it should work on upper and lowercase strings automatically:

Uppercase output routine in assembly.

Now our uppercasing output routine is lightning fast.

To be continued…

Interfacing assembly with BASIC via DEFUSR, part 2

Previously, we took a look at using the EXTENDED COLOR BASIC DEFUSR command to interface a bit of assembly language with a BASIC program. The example I gave simply added one to a value passed in:

Using DEFUSR to call assembly from BASIC.

That’s not very useful, so let’s do something a bit more visual.

One of my favorite bits of CoCo 6809 assembly code is this:

      org  $3f00
start ldx  #$400 * load X with start of 32-column screen
loop  inc  ,x+   * increment whatever is at X, then increment X
      cmpx #$600 * compare X with end of screen
      bne  loop  * if not end, go back to loop
      bra  start * go back to start

This endless loop will start incrementing every byte on the screen over and over making a fun display. I ran this code in the Mocha emulator (which has EDTASM available):

http://www.haplessgenius.com/mocha/

Then I compiled it (“A/IM/WE/AO” – assemble, in memory, wait for errors, absolute origin – how can I still remember this???), and ran it in the debugger (“Z” for debugger, then “G START” to start it):

Mocha emulator running silly screen code.

This inspired me to make a small assembly routine to do something similar from BASIC. The CLS command can take an optional value (0-8) to specify what color to clear the screen to. Let’s make an assembly routine that will allow specifying ANY character to clear the screen to:

ORGADDR EQU $3f00

GIVABF EQU  $B4F4  * 46324
INTCNV EQU  $B3ED  * 46061

       org  ORGADDR
start  jsr  INTCNV * get passed in value in D
       cmpd #255   * compare passed in value to 255
       bgt  error  * if greater, error
       ldx  #$400  * load X with start of screen
loop   stb  ,x+    * store B register at X and increment X
       cmpx #$600  * compare X to end of screen
       bne  loop   * if not there, keep looping
       bra  return * done
error  ldd  #-1    * load D with -1 for error code
return jmp  GIVABF * return to caller

First, I added a bit of error checking so if the user passed in anything greater than 255, it will return -1 as an error code. Otherwise, it returns back the value passed in (that the screen was cleared to.)

Side Note: Hmmm. Since I know register D is register A and B combined, all I really need to do is make sure A is 0. i.e, “D=00xx”. If anything is in A, it is greater than the one byte value in B. I suppose I could also have done “cmpa #0 / bne error”. Doing something like that might be smaller and/or faster than comparing a 16-bit register. Anyone want to provide me a better way?

Since the 16-bit register D is made up of the two 8-bit registers A and B, I can just use B as the value passed in (0-255).

Here is what it would do with a bad value:

Clear X routine, bad value error.

And here is it with a valid value of 42:

Clear X with a value of 42.

So far so good.

In the next part, we’ll look at how to pass in a string instead of an integer.

Interfacing assembly with BASIC via DEFUSR, part 1

This article series will demonstrate how to interface some 6809 assembly code with Microsoft BASIC on a Tandy/Radio Shack TRS-80 Color Computer.

BASIC on the Color Computer is easy, but not fast. 6809 assembly language is fast, but not easy. Fortunately, it’s easy (and fast?) to combine them, allowing you to write a BASIC program that makes use of some assembly language to speed things up.

Assembly code can be loaded (or POKEd) in to memory at a specific address and then invoked by the EXEC command. This is fine for a “go do this” type of routine. But, if you want the assembly code to interact with BASIC by returning values or modifying a variable or string, you can use a special BASIC command designed for this purpose.

The Color Computer’s original 1980 COLOR BASIC had a USR command which could be used to call an assembly language routine via a BASIC interface. From the Wikipedia entry:

USR(num) calls a machine language subroutine whose address is stored in memory locations 275 and 276. num is passed to the routine, and a return value is assigned when the routine is done

This allowed passing a numeric parameter in to the assembly routine, and getting back a status value.

When EXTENDED COLOR BASIC came out, USR was enhanced to allow defining multiple routines. It looks like this:

DEFUSR0=&H3F00
A=USR0(42)

That code would define USR0 to call an assembly routine starting at memory location &H3F00 and pass it the value of 42. That routine could then return a value back to the caller which would end up in the variable A.

There are two ROM routines that enable receiving a value from BASIC, and returning one back:

  1. INTCNV will convert the integer passed in the USRx() call and store it in register D.
  2. GIVABF will take whatever is in register D and return it to the USR0() call.

Here is a very simple assembly routine that would receive a value, add one to it, and return it.

ORGADDR EQU $3f00

GIVABF EQU $B4F4   * 46324
INTCNV EQU $B3ED   * 46061

       org  ORGADDR
start  jsr  INTCNV * get passed in value in D
       tfr  d,x    * transfer D to X so we can manipulate it
       leax 1,x    * add 1 to X
       tfr  x,d    * transfer X back to D
return jmp  GIVABF * return to caller

Using the lwtools 6809 cross compiler, I can compile it in to a .BIN file that is loadable in DISK BASIC:

lwasm --decb -o addone.bin addone.asm

I could then use the toolshed decb command to copy the binary to a .DSK image to run in an amulator such as Xroar. In my case, I have an image called DRIVE0.DSK I want to copy it to:

decb copy -2 -r addone.bin ../Xroar/dsk/DRIVE0.DSK,ADDONE.BIN

Now I can run the Xroar emulator and mount this disk image and test it:

Using DEFUSR to call assembly from BASIC.

It works! Of course, I could have just done…

A=A+1

…so maybe this isn’t the best use of assembly language. ;-)

Up next … a look at doing something actually useful.