Interfacing assembly with BASIC via DEFUSR, part 4

See also: Part 1, Part 2 and Part 3.

Before we get started, a few comments from the previous installment.

JohnStrong (StrongWare) chimed in on Facebook with another improvement to the screen clearing assembly code. He suggested using a 16-bit register to blast bytes to the screen instead of doing it 8-bits at a time. It looks like this:

* CLEARX.ASM v1.02
* by Allen C. Huffman of Sub-Etha Software
* www.subethasoftware.com / alsplace@pobox.com
*
* 1.01 use TSTA instead of CMPD per L. Curtis Boyle
* 1.02 use STDD for 16-bit copy per John Strong
*
* DEFUSRx() clear screen to character routine
*
* INPUT:   ASCII character to clear screen to
* RETURNS: 0 is successful
*         -1 if error
*
* EXAMPLE:
*   CLEAR 200,&H3F00
*   DEFUSR0=&H3F00
*   A=USR0(42)
*   PRINT A
*
ORGADDR EQU $3f00

INTCNV EQU $B3ED   * 46061
GIVABF EQU $B4F4   * 46324

       org  ORGADDR
start  jsr  INTCNV * get passed in value in D
                   * D is made up of A and B, so if
                   * A has anything in it, it must be
		   * greater than 255.
       tsta        * test for zero
       bne  error  * branch if it is not zero
       ldx  #$400  * load X with start of 
loop   stb  ,x+    * store B register at X and increment X
       tfr  b,a    * transfer B to A
loop   std  ,x++   * store D (A and B) then increment X twice
       cmpx #$600  * compare X to end of screen
       bne  loop   * if not there, keep looping
       bra  return * done
error  ldd  #-1    * load D with -1 for error code
return jmp  GIVABF * return to caller

* lwasm --decb -o clearx3.bin clearx3.asm
* lwasm --decb -f basic -o clearx3.bas clearx3.asm
* decb copy -2 -r clearx3.bin ../Xroar/dsk/DRIVE0.DSK,CLEARX3.BIN

This change takes the byte to clear the screen to (in register B) and duplicates it (to register A), which affects the 16-bit register D since it is made up of A and B combined. Thus, if the desired byte is $2A (42), register D ends up being $2A2A.

Then, instead of copying one byte at a time, the loop copies two bytes at a time. The end result should be a faster screen clear. This ends up being two bytes larger than the second version, but still one byte smaller than my original version:

clearx.hex:
:103F0000BDB3ED108300FF2E0C8E0400E7808C06FD
:0B3F10000026F92003CCFFFF7EB4F474

clearx2.hex:
:103F0000BDB3ED4D260C8E0400E7808C060026F92B
:083F10002003CCFFFF7EB4F496

clearx3.hex
:103F0000BDB3ED4D260E8E04001F98ED818C06008A
:0A3F100026F92003CCFFFF7EB4F475

Cool! Thanks, John.

Meanwhile, back at the article…

So far, we have looked at interfacing assembly language with BASIC to do some useless things (add one to a number), questionably useful things (clear screen to any given character), and actually useful things (high speed uppercasing of text).

In this installment, we will try to do something else actually useful: move the screen around.

But first, let me digress a bit.

The cross compiler I use, lwtools by Lost Wizard Enterprises, is able to compile code to run under COLOR BASIC or OS-9/NitrOS-9. It also has some other options I just learned about (thanks, William!) that I wanted to mention.

Previously, I shared a small bit of assembly that would clear the 32-column screen to any specified character:

ORGADDR EQU $3f00

GIVABF EQU  $B4F4  * 46324
INTCNV EQU  $B3ED  * 46061

       org  ORGADDR
start  jsr  INTCNV * get passed in value in D
                   * D is made up of A and B, so if
                   * A has anything in it, it must be
                   * greater than 255.
       tsta        * test for zero
       bne  error  * branch if it is not zero
       ldx  #$400  * load X with start of screen
loop   stb  ,x+    * store B register at X and increment X
       cmpx #$600  * compare X to end of screen
       bne  loop   * if not there, keep looping
       bra  return * done
error  ldd  #-1    * load D with -1 for error code
return jmp  GIVABF * return to caller

NOTE: This article is using version 2, from the previous article, and does not include John Strong’s updates.

I have been compiling these to .BIN files, copying them over to a disk image, and then loading them in the XRoar emulator. It turns out, the lwasm also has another output option: BASIC. It will actually generate a short BASIC program that will POKE that assembly code in to memory! You use the format (-f) option like this:

lwasm --decb -f basic -o clearx2.bas clearx2.asm

This would assemble clearx2.asm and output it as a BASIC program! It looks like this:

10 READ A,B
20 IF A=-1 THEN 70
30 FOR C = A TO B
40 READ D:POKE C,D
50 NEXT C
60 GOTO 10
70 END
80 DATA 16128,16151,189,179,237,77,38,12,142,4,0,231,128,140,6,0,38,249,32,3,204
,255,255,126,180,244,-1,-1

The assembly is turned in to data statements, and it appears this is even capable of handling programs with multiple ORG statements. The DATA begins with the start memory location and the end memory location for a block of code, and then the actual code bytes. Clever.

This would be an easy way to add assembly code to your BASIC program without needing to LOADM/CLOADM a separate .BIN file. It will also give us a simple way to test this code in the XRoar emulator without copying files to a disk image (more on this in a moment).

But I digress.

Scroll With It, Baby

In all the examples I have shown so far, any parameter passed in was used to do something — a value to add to, a character to clear the screen to, or a string to print in uppercase.

The USR command allows for up to 10 functions to be defined (USR0 through USR9). This lets you easily have ten different assembly routines to call. However, you can also just use the parameter passed in to handle multiple functions.

Suppose you wanted to write a simple maze game using the 32 column text screen. You could limit your maze to be 32×16 (the size of the screen), or you could try to have a much larger maze and scroll it within the viewable screen area.

Scrolling UP is easy  … you just print something at the bottom of the screen, and BASIC moves the whole screen up. Try this:

10 PRINT TAB(RND(30));".":GOTO 10

That code will tab over a random number of spaces (0 to 30) and print a period. Over and over and over. If you run this, you see a cheesy scrolling star field (if stars were black and space was nuclear green).

Scrolling stars!

There was a famous Commodore BASIC program that did something similar using the PETASCII slash characters to generate a maze. There has even been an entire book written about this one liner:

*** COMMODORE 64 CODE ***
10 PRINT CHR$(205.5+RND(1)); : GOTO 10

The CoCo does not have the Commodore character set, but we do have “/” and “\” so we could try this:

10 PRINT CHR$(47+(RND(2)-1)*45);:GOTO 10

This will print either CHR$(47) (a slash) or randomly add 45 to print CHR$(92) (a backslash). We get a similar endless maze that scrolls up, but doesn’t look nearly as nice as the one on the Commodore.

Scrolling maze… Sorta.

See? Easy.

I expect I wasn’t the only kid who wrote simple space games like this, with the ship at the top and objects traveling up the screen towards it.

I think I may be digressing again, so let me get back to the main point.

If we wanted to scroll in the other direction, we could try to do it in BASIC by copying every byte down one line. Here is an attempt to do that by using PEEK and POKE:

10 CLS
20 REM SCROLL UP
30 FOR A=1 TO 100
40 PRINT TAB(RND(30));"."
50 NEXT
60 REM SCROLL DOWN
70 FOR A=1 TO 100
80 PRINT@0,TAB(RND(30));"."
90 GOSUB 2000 'DOWN
100 NEXT
999 GOTO 999
2000 REM SCROLL DOWN
2010 FOR Z=1535-32 TO 1024 STEP-1
2020 POKE Z+32,PEEK(Z)
2030 NEXT
2040 RETURN

XROAR TIP: If you want to try this out in the XRoar emulator, save the above listing out as a text file with the extension of .asc (“scrolldown.asc”). If you do that, in XRoar you can do “File -> Load” and point it to this file. Then, that file will act like a cassette with an ASCII program on it! You can then type “CLOAD” and load the program, without needing to transfer it to a disk image.

This program will let the stars scroll up the screen (100 lines worth) using normal PRINT, then it will try to make them scroll down the screen (100 times) using a PEEK/POKE subroutine.

Scrolling down is painfully slow this way. You can see this would be no way to write a game.

Side Note: If I were trying to write a “space ship flying through space” game, I would just draw the individual stars and other objects, moving them each time, instead of redrawing the entire screen. But that’s not the point of this silly code.

And, if we wanted to also scroll the screen left and right, we’d need similar (and painfully slow) code. Here is a brute-force BASIC program that attempts to move the screen in each direction using POKE and PEEK:

10 CLS
20 FOR A=1 TO 14
30 PRINT @32*A+A,"SCROLLING IS HARD"
40 NEXT
50 GOSUB 1000 'UP
60 GOSUB 2000 'DOWN
70 GOSUB 3000 'LEFT
80 GOSUB 4000 'RIGHT
999 GOTO 999
1000 REM SCROLL UP
1010 FOR A=1024+32 TO 1535
1020 POKE A-32,PEEK(A)
1030 NEXT
1040 RETURN
2000 REM SCROLL DOWN
2010 FOR A=1535-32 TO 1024 STEP-1
2020 POKE A+32,PEEK(A)
2030 NEXT
2040 RETURN
3000 REM SCROLL LEFT
3010 FOR A=1024+1 TO 1535-1
3020 POKE A,PEEK(A+1)
3030 NEXT
3040 RETURN
4000 REM SCROLL RIGHT
4010 FOR A=1535-1 TO 1024 STEP-1
4020 POKE A+1,PEEK(A)
4030 NEXT
4040 RETURN

If you run this, you see it prints a message down the screen, then SLOWLY moves every byte up, then back down, then left, then right. It is very slow. It also leaves leftover characters on the edge of the screen, with the idea being you would be drawing new characters over there if you were making a maze or something scroll.

It’s not elegant, nor is it pretty. Or useful.

Obviously, doing this to scroll a screen is not practical. Clever programmers will try to make large strings and then just print them in the proper position. It’s mush faster letting the BASIC ROM do the work for you. Here’s an example that will scroll a PAC-MAN maze up and down the screen:

10 DIM MZ$(31)
20 FOR A=0 TO 30:READ MZ$(A):NEXT
30 CLS
40 REM SCROLL MAZE DOWN
50 FOR ST=0 TO 15
60 FOR LN=0 TO 15
70 PRINT @LN*32,MZ$(LN+ST);
80 NEXT:NEXT
90 REM SCROLL MAZE UP
100 FOR ST=15 TO 0 STEP-1
110 FOR LN=0 TO 15
120 PRINT @LN*32,MZ$(LN+ST);
130 NEXT:NEXT
140 GOTO 40
999 GOTO 999
1000 DATA "XXXXXXXXXXXXXXXXXXXXXXXXXXXX"    
1010 DATA "X            XX            X"    
1020 DATA "X XXXX XXXXX XX XXXXX XXXX X"    
1030 DATA "X XXXX XXXXX XX XXXXX XXXX X"    
1040 DATA "X XXXX XXXXX XX XXXXX XXXX X"    
1050 DATA "X                          X"
1060 DATA "X XXXX XX XXXXXXXX XX XXXX X"   
1070 DATA "X XXXX XX XXXXXXXX XX XXXX X"    
1080 DATA "X      XX    XX    XX      X"   
1090 DATA "XXXXXX XXXXX XX XXXXX XXXXXX"    
2100 DATA "     X XXXXX XX XXXXX X     "    
2110 DATA "     X XX          XX X     "    
2120 DATA "     X XX XXXXXXXX XX X     "   
2130 DATA "XXXXXX XX X      X XX XXXXXX"   
2140 DATA "          X      X          "   
2150 DATA "XXXXXX XX X      X XX XXXXXX"   
2160 DATA "     X XX XXXXXXXX XX X     "   
2170 DATA "     X XX          XX X     "   
2180 DATA "     X XX XXXXXXXX XX X     "   
2190 DATA "XXXXXX XX XXXXXXXX XX XXXXXX"   
3200 DATA "X            XX            X"   
3210 DATA "X XXXX XXXXX XX XXXXX XXXX X"   
3220 DATA "X XXXX XXXXX XX XXXXX XXXX X"   
3230 DATA "X   XX                XX   X"   
3240 DATA "XXX XX XX XXXXXXXX XX XX XXX"   
3250 DATA "XXX XX XX XXXXXXXX XX XX XXX"   
3260 DATA "X      XX    XX    XX      X"   
3270 DATA "X XXXXXXXXXX XX XXXXXXXXXX X"   
3280 DATA "X XXXXXXXXXX XX XXXXXXXXXX X"   
3290 DATA "X                          X"   
4200 DATA "XXXXXXXXXXXXXXXXXXXXXXXXXXXX"

If you run this, you will see an ASCII maze that is 31 lines tall get scrolled up and down the 16 line screen. Using PRINT to blast out a string of bytes is much faster than PEEK and POKE.

Fancy BASIC programmers would use this trick, storing all their characters in strings and printing them on the screen. If you want to add left and right scrolling, you could do that with longer strings and MID$ to just print the middle 32 characters of the string.

But I digress. Again.

While there are ways to do simulate screen scrolling somewhat fast in BASIC, assembly language will still be much faster. I present this simple code that has assembly versions of the BASIC code I presented earlier. Instead of having four different subroutines to GOSUB to, you can call it by using USR0(z) and giving it a direction code (1=up, 2=down, 3=left and 4=right).

It looks like this:

* SCRNMOVE.ASM v1.00
* by Allen C. Huffman of Sub-Etha Software
* www.subethasoftware.com / alsplace@pobox.com
*
* DEFUSRx() screen moving function
*
* INPUT:   direction (1=up, 2=down, 3=left, 4=right)
* RETURNS: 0 on success
*         -1 if invalid direction
*
* EXAMPLE:
*   CLEAR 200,&H3F00
*   DEFUSR0=&H3F00
*   A=USR0(1)
*
ORGADDR EQU $3f00

INTCNV EQU  $B3ED * 46061
GIVABF EQU  $B4F4 * 46324

UP     EQU  1
DOWN   EQU  2
LEFT   EQU  3
RIGHT  EQU  4
SCREEN EQU  1024 * top left of screen
END    EQU  1535 * bottom right of screen

     org   ORGADDR

start jsr  INTCNV * get incoming param in D
     cmpb  #UP
     beq   up
     cmpb  #DOWN
     beq   down
     cmpb  #LEFT
     beq   left
     cmpb  #RIGHT
     beq   right
     bra   error

up   ldx   #SCREEN+32
loopup lda  ,x
     sta    -32,x
     leax   1,x
     cmpx   #END
     ble    loopup
     bra    return

down ldx    #END-32
loopdown    lda ,x
     sta    32,x
     leax   -1,x
     cmpx   #SCREEN
     bge    loopdown
     bra    return

left ldx    #SCREEN+1
loopleft    lda ,x
     sta    -1,x
     leax   1,x
     cmpx   #END
     ble    loopleft
     bra    return

right ldx   #END-1
loopright   lda ,x
     sta    1,x
     leax   -1,x
     cmpx   #SCREEN
     bge    loopright
     bra    return

error ldd   #-1    * load D with -1 for error code
     bra    exit
    
return ldd  #0
exit jmp  GIVABF      

* lwasm --decb -9 -o scrnmove.bin scrnmove.asm
* lwasm --decb -f basic -o scrnmove.bas scrnmove.asm
* decb copy -2 -r scrnmove.bin ../Xroar/dsk/DRIVE0.DSK,SCRNMOVE.BIN

If I use the “-f basic” option, I can produce a BASIC loader with DATA statements that contain the assembly language routines. I then renumbered them and made them a subroutine so at the top of the example program I can GOSUB to it, then install and use the routine.

1 CLEAR 200,&H3F00
2 GOSUB 1000
3 DEFUSR0=&H3F00
10 CLS
20 FOR A=1 TO 14
30 PRINT @32*A+A,"SCROLLING IS HARD"
40 NEXT
50 Z=USR0(1) 'UP
60 Z=USR0(2) 'DOWN
70 Z=USR0(3) 'LEFT
80 Z=USR0(4) 'RIGHT
999 GOTO 999
1000 REM LOAD ASM ROUTINE
1010 READ A,B
1020 IF A=-1 THEN 1070
1030 FOR C = A TO B
1040 READ D:POKE C,D
1050 NEXT C
1060 GOTO 1000
1070 RETURN
1080 DATA 16128,16225,189,179,237,193,1,39,14,193,2,39,27,193,3,39,40,193,4,39,52,32,66,142,4,32,166,132,167,136,224,48,1,140,5,255,47,244,32,54,142,5,223,166,132,167,136,32,48,31,140,4,0,44,244,32,37,142,4,1,166,132,167,31,48,1,140,5,255,47,245,32,21
1090 DATA 142,5,254,166,132,167,1,48,31,140,4,0,44,245,32,5,204,255,255,32,3,204,0,0,126,180,244,-1,-1

If you run this, you will see the screen jump and then it will look like the original example looked…it just happens almost instantly instead of taking minutes.

Now let’s try the star scrolling example again. Instead of GOSUBing to slow BASIC routines, we will use the assembly scroll up and down routines:

1 CLEAR 200,&H3F00
2 GOSUB 1000
3 DEFUSR0=&H3F00
5 SP$=STRING$(31," ")
10 CLS
20 REM SCROLL UP
30 FOR A=1 TO 100
35 PRINT @32*15,SP$;
40 PRINT @32*15,TAB(RND(30));".";
45 Z=USR0(1) 'UP
50 NEXT
60 REM SCROLL DOWN
70 FOR A=1 TO 100
80 PRINT@0,TAB(RND(30));"."
90 Z=USR0(2) 'DOWN
100 NEXT
110 GOTO 20
999 GOTO 999
1000 REM LOAD ASM ROUTINE
1010 READ A,B
1020 IF A=-1 THEN 1070
1030 FOR C = A TO B
1040 READ D:POKE C,D
1050 NEXT C
1060 GOTO 1000
1070 RETURN
1080 DATA 16128,16225,189,179,237,193,1,39,14,193,2,39,27,193,3,39,40,193,4,39,52,32,66,142,4,32,166,132,167,136,224,48,1,140,5,255,47,244,32,54,142,5,223,166,132,167,136,32,48,31,140,4,0,44,244,32,37,142,4,1,166,132,167,31,48,1,140,5,255,47,245,32,21
1090 DATA 142,5,254,166,132,167,1,48,31,140,4,0,44,245,32,5,204,255,255,32,3,204,0,0,126,180,244,-1,-1

You will notice scrolling up and down now go at the same speed, but it is slightly slower than the normal BASIC PRINT scroll up. This is because of line 35 and 75 that use a PRINT statement to erase a line before the screen scrolls. This is because my simple assembly routines don’t bother to do this (neither did the BASIC version).

If the usage is known, the assembly can easily be made to clear out whichever roll of column is being moved. Doing it inside the routine will be much faster than using a PRINT command (and, PRINT doesn’t help us if the screen is scrolling left or right).

Can we do better? I think so.

Next time … let’s make another pass over this screen scrolling routine and see if we can make it do something more useful.

5 thoughts on “Interfacing assembly with BASIC via DEFUSR, part 4

  1. L. Curtis Boyle

    You could optimize the parameter check like this:

    start jsr INTCNV * get incoming param in D
    decb
    beq up
    decb
    beq down
    decb
    beq left
    decb
    beq right
    bne error
    (could also move “error” routine here to eliminate bne)

    And, as John Strong mentioned in an earlier reply, you could use 16 bit load/stores.

    Reply
    1. Allen Huffman Post author

      Nice! DEC is faster than a CMP, I take it? If I follow, it’s basically counting how many times it takes to DEC to get to 0. If one, must have been 1, so up. If not, do it again. If it’s zero then, it did it twice, so must be down, etc. COOL! I guess CMP would only be needed when using arbitrary values (100, 150, 200, etc.). Thus, planning the values to use can help make the code faster.

      I have John’s STD ,X++ added in a follow up. Did I miss a different comment?

      Can you improve my screen moving in Part 4? I need to make it no wrap (so anything pushed off the left does not show up on the right).

      But I have a real cool idea coming up.

      Reply
  2. William Astle

    You can improve the vertical screen scrolling by copying two bytes at a time (with D) though that might not be as practical for sidewise scrolling done properly. You can also eliminate the LEAX in the scroll loops through creative use of how you pick your offsets and start/end points and the post-increment and pre-decrement indexing operations. A pre-inc/post-dec is faster than a separate LEA instruction.

    Reply
    1. Allen Huffman Post author

      Cool. I think this has it:

      up ldx #SCREEN+32
      loopup ldd ,x++ * load D with 2-bytes at X, inc++
      std -34,x * store
      cmpx #END
      ble loopup
      bra return

      down ldx #END-33
      loopdown ldd ,–x
      std 32,x
      cmpx #SCREEN+2
      bge loopdown
      bra return

      I added some +32/-32 for testing, to make sure I wasn’t overwriting anything before or after the screen.

      Ideally, I wouldn’t wrap the left/right bytes when shifting that way, but I wasn’t sure how to approach that.

      Reply
    2. Allen Huffman Post author

      I think I have it now. For left/right, whatever byte falls off the edge will reappear at the opposite corner. I am not sure how to do that with up/down yet, unless I copy vertically or something.


      up ldx #SCREEN+32
      loopup ldd ,x++ * load D with 2-bytes at X, inc++
      std -34,x * store
      cmpx #END
      ble loopup

      bra return

      down ldx #END-33
      loopdown ldd ,x
      std 32,x
      leax -2,x
      cmpx #SCREEN
      bge loopdown
      bra return

      left ldx #SCREEN
      lda ,x+
      pshs a * push A on user stack
      loopleft ldd ,x++
      std -3,x
      cmpx #END-1
      ble loopleft
      lda ,x
      sta -1,x
      puls a * pull A from user stack
      sta ,x
      bra return

      right ldx #END
      lda ,x
      pshs a
      leax -2,x
      loopright ldd ,x
      std 1,x
      leax -2,x
      cmpx #SCREEN+1
      bge loopright
      leax 1,x
      puls a
      ldb ,x
      std ,x

      Reply

Leave a Reply