Category Archives: 6809 Assembly

Interfacing assembly with BASIC via DEFUSR, part 7

See also: part 1, part 2, part 3, part 4, part 5, part 6, part 7 and part 8.

UPDATE: After I wrote this up, Sean Conner replied on the CoCo mailing list about his extensive website article diving into how all the DEF USR stuff works. If you really want to know how this works, you should read his writeup. Unlike me, he actually seems to understand I! I am just playing around trying to make stuff work: https://boston.conman.org/2024/11/26.2-3

Also, he has summarized some things I missed (in this article) in a post so I will have to do a Part 8 to this article and tie everything together. Kudos to Sean Connor for figuring all this out and sharing it with all of us! Wish I had remembered he was the one who did this before I wrote all this up. BUT, I had fun, and I learned (including stuff that is wrong). To be continued…


Meanwhile, back in 2017 (!) I shared some posts about how to use DEF USR to call assembly language routines from Color BASIC. Today I present something new I learned, mostly thanks to folks like William “Lost Wizard” Astle pointing me in useful directions.

First, here is a refresher on calling assembly language from BASIC.

Method 1 – EXEC

You can load (or POKE) some assembly into memory at some address and run that code using the EXEC command. Even CoCo folks who don’t program assembly probably have used the EXEC command. It is used to start a machine language program you loaded from tape (using CLOADM) or disk (using LOADM). Those load commands set the EXEC address to be wherever the program loaded to run so you just type “EXEC” and go.

You can also specific a memory location after the EXEC command, such as “EXEC 40999”. That will execute whatever code is in memory at 40999. You may recognize that one, since 40999 is the address of the “reset” routine in Color BASIC. Typing “EXEC 40999” is like hitting the hardware RESET button on the back of the CoCo.

But I digress.

You could create a set of assembly routines that start at different locations in memory, then run those routines using EXEC and the address of the routine. This is cumbersome, since you have to know where each routine starts. For example, here are some stupid routines that change the screen color to red, green, or blue:

* lwasm cls.asm -fbasic -ocls.bas --map

* Test of EXEC

ORGADDR equ $3f00

VIDRAM equ $400 VIDEO DISPLAY AREA

org ORGADDR

clsred
lda #191 * Red block.
bsr clear
rts

clsgreen
lda #223 * Green block.
bsr clear
rts

clsblue
lda #175 * Blue block
bsr clear
rts

clear
ldx #VIDRAM * Top left of 32 column screen.
loop
sta ,x+ * Store at X and increment X.
cmpx #VIDRAM+512 * Compare X to bottom right of screen
bne loop
rts

end

This cheesy program has three routines at the start – clsred, clsgreen and clsblue. If I compile this and load it into memory, I could then EXEC the startling location of each of those routines to see the screen clear to those colors. But how do I know where they start?

One way is to use an assembler that tells you. I am using the Lost Wizard lwasm assembler. It has an option that shows the locations of stuff in the program. When I build, it shows me this output:

allenh@Mac asm % lwasm cls.asm -fbasic -ocls.bas --map
Symbol: clear (cls.bas) = 3F0F
Symbol: clsblue (cls.bas) = 3F0A
Symbol: clsgreen (cls.bas) = 3F05
Symbol: clsred (cls.bas) = 3F00
Symbol: loop (cls.bas) = 3F12
Symbol: ORGADDR (cls.bas) = 3F00
Symbol: VIDRAM (cls.bas) = 0400

This tells me I can EXEC &H3F00 to run the clsred code, EXEC &H3F05 to run clsgreen, and EXEC &H3F0A to run clsblue. To test, I use the compile option that generates a BASIC program that loads the code into memory using POKE commands. I built it and then loaded that .bas into the XRoar emulator to try it out, and it works. I’d show a screen shot, but since it clears the screen when I run it, it wouldn’t really show anything useful.

But the problem is any changes to the routines that alter their size can change the locations of any routines after them. If I made the first routine, clsred, five bytes longer than it is now, the addresses of the following routines clsgreen and clsblue would now be five bytes higher in memory.

I’d have to keep making notes of all the locations I wanted to use in my BASIC program and update the BASIC program each time things changed in the assembly code.

Yuck.

One solution to this problem is to make some entries at the top of the assembly code that will never change. They could just jump to the specific routines, and nothing else. For example:

* lwasm cls2.asm -fbasic -ocls2.bas --map

* Test of EXEC

ORGADDR equ $3f00

VIDRAM equ $400 VIDEO DISPLAY AREA

org ORGADDR

clsred bra clearred
clsgreen bra cleargreen
clsblue bra clearblue


clearred
lda #191 * Red block.
bsr clear
rts

cleargreen
lda #223 * Green block.
bsr clear
rts

clearblue
lda #175 * Blue block
bsr clear
rts

clear
ldx #VIDRAM * Top left of 32 column screen.
loop
sta ,x+ * Store at X and increment X.
cmpx #VIDRAM+512 * Compare X to bottom right of screen
bne loop
rts

end

I added some stuff at the top of the program that branch to the actual routines later in the code. Now the routines later in the code can grow, shrink, or move, but those initial branch instructions will not need to change:

lwasm cls.asm -fbasic -ocls.bas --map
Symbol: clear (cls.bas) = 3F0F
Symbol: clsblue (cls.bas) = 3F0A
Symbol: clsgreen (cls.bas) = 3F05
Symbol: clsred (cls.bas) = 3F00
Symbol: loop (cls.bas) = 3F12
Symbol: ORGADDR (cls.bas) = 3F00
Symbol: VIDRAM (cls.bas) = 0400

I suppose we would call this a “dispatch table.” This also has the advantage of being able to add new entries at the end as more routines are added. Only if you add something new would the BASIC program using it have to change:

clsred   bra  clearred
clsgreen bra cleargreen
clsblue bra clearblue
clswhite bra clearwhite

You could have a program with routines to clear the screen, or scroll the screen Up, Down, Left and Right (ah, a callback to the earlier parts of this article series) and use EXEC to get to each one.

NOTE: I used the “bra” branch op code, which can only “get to” nearby code. As the program grows, it may not be able to reach all the routines. Using “lbra” (long branch always) instead may be a better approach if your code might grow in the future.

EXEC might be enough to get the job done, but that is not what this series is about.

Method 2 – DEF USR

Using DEFUSR to call assembly from BASIC.

You can have up to ten (DEF USR0 to DEF USR9) assembly routines defined and call them using the USR command. You still have the same problem of needing to know where the functions are. If I wanted to use the three cls routines, I’d have something like:

DEF USR0=&H3F00
DEF USR1=&H3F05
DEF USR2=&H3F0A

…and then I could call them in the program using:

A=USR0(0) 'CLS RED
A=USR1(0) 'CLS GREEN
A=USR2(0) 'CLS BLUE

…but that really is not much better. The real use of DEF USR is that you can pass a parameter into the assembly language (the value in the parens) and return a value back to BASIC (the variable before the equal). Instead of needing separate routines to clear the screen, as mentioned earlier in this series, you could simple have one and pass in the parameter that tells it what to do:

A=USR0(1) '1=RED
A=USR0(2) '2=GREEN
A=USR0(3) '3=BLUE

…or whatever you want to do. Now you just have one DEF USR define and can run many different routines based on the number you pass in to it. One entry point (“DEF USR0=&H3F00”) could have dozens or hundreds of functions which run based on what number is passed in.

And now the reason for this new part…

In part 3 of this series I said this:

Unfortunately, the USRx() command only allows you to pass in a numeric value, and not a string, so we can’t simply do something like:

A$=USR0("Convert this to all uppercase.") 'THIS WILL NOT WORK!

So imagine my surprise when I saw — somewhere — someone post that you actually could do that. I have yet to track down where I saw this, but I stashed it in the back of my mind until I had time to revisit this subject.

And that time is now.

While this is true that Color BASIC cannot do this, Extended BASIC can indeed pass in a string. The code that parses the USR routine confirms this as it checks the variable type being a number or a string:

* PROCESS A USR CALL
L892C BSR L891C GET STORAGE LOC OF EXEC ADDRESS FOR USR N
LDX ,X * GET EXEC ADDRESS AND
PSHS X * PUSH IT ONTO STACK
JSR >LB262 SYNTAX CHECK FOR ‘(‘ & EVALUATE EXPR
LDX #FP0EXP POINT X TO FPA0
LDA VALTYP GET VARIABLE TYPE
BEQ L8943 BRANCH IF NUMERIC, STRING IF <> 0
JSR >LB657 GET LENGTH & ADDRESS OF STRING VARIABLE
LDX FPA0+2 GET POINTER TO STRING DESCRIPTOR

LDA VALTYP GET VARIABLE TYPE
L8943 RTS JUMP TO USR ROUTINE (PSHS X ABOVE)

That code snippet comes from the Toolshed repository on Github where you can find disassemblies of the CoCo ROMs:

https://github.com/n6il/toolshed/blob/master/cocoroms/extbas.asm

Many things once thought impossible start happening once people learn they are not actually impossible.

My mistake. It is possible.

All that is needed is to check the VARTYP at the start of your assembly routine. If it is numeric, you can call the INTCVT routine to convert the passed-in number to register D and use it like before. If it is a string, X will point to the VARPTR location of that string descriptor and you can parse it to get the length of the string and the location of the string data.

See this post about Color BASIC and VARPTR for an explanation

Here is my test code:

* lwasm defusr.asm -fbasic -odefusr.bas --map

* Test of DEF USR.

ORGADDR equ $3f00

VIDRAM equ $400 VIDEO DISPLAY AREA
CHROUT EQU $A002
INTCNV EQU $B3ED * 46061
GIVABF EQU $B4F4 * 46324
REGDOUT EQU $BDCC * convert the value in ACCD into a decimal number
* and send it to CONSOLE OUT.

org ORGADDR

start
checknumber
cmpa #0 * 0=number
beq donumber

cmpa #255 * 255=string
beq dostring

ldx #msgunknown
bsr print

error
ldd #-1 * Return -1 as an error code
return
jmp GIVABF * Value in reg D will be returned to BASIC.

donumber
jsr INTCNV * Load D with number
jsr REGDOUT * Display number
ldd #0 * D=0 (no error code)
bra return

dostring * X will be VARPTR
ldb ,x * B=string length
ldy 2,x * Y=string data address
beq stringdone
loop
lda ,y+ * A=byte of string data, increment Y
jsr [CHROUT] * Output character in A
decb * Decrement B (length of string)
bne loop * If not 0, go back to loop
stringdone
bsr printspace

tfr x,d * Transfer X into D register
jsr printd * Print VARPTR address
bsr printspace

clra * A=0
ldb ,x * B=string len (making D)
bsr printd * Print string length
bsr printspace

ldd 2,X * Load D with string address
bsr printd * Print the number

ldd #0 * D=0 (no error code)
bra return

* PRINT integer in D
printd
pshs a,b,x,u
jsr REGDOUT
puls a,b,x,u
rts

* PRINT space
printspace
pshs a
lda #32
jsr [CHROUT]
puls a
rts

* PRINT subroutine. Prints the string pointed to by X.
print
lda ,x+
beq printdone
jsr [CHROUT]
bra print
printdone
lda #13
jsr [CHROUT]
rts

msgunknown
fcc "UNKNOWN"
fcb 0

end

And here is it running with some examples:

For this demo, my assembly code detects if the passed-in parameter is a number or a string. If a number, it prints out that number using a ROM routine. If a string, it figures out where the string is in memory then prints that string, followed by the VARPTR address of the string variable, the string size, and the address of the string data in memory (just so you can see how it all works).

I will try to find time to clean this up a bit. I want to use this in my Hacking PRINT series.

Until then…

Hacking the Color BASIC PRINT command – part 6

See Also: part 1, part 2, part 3, part 4, part 5 and part 6 (and maybe more to come…)

In part 5, I presented an update to the “PRINT can move the cursor” hack which would turn that off when you were typing from outside a running program. It did this by checking a Color BASIC “variable” that contained the current line being processed. When the program is not running, that value is set to 65535 (&HFFFF in hex). My simple check should have been enough to skip processing the special characters when in this direct mode:

* Do this only if NOT in Direct mode. Problem: After a BREAK, CURLIN
* has not been updated yet, so the very first line you type will be
* processing the special characters. Lines after that will not. Trying
* to find a different way to detect this.
pshs a save A
lda CURLIN GET CURRENT LINE NUMBER (CURLIN)
inca TEST FOR DIRECT MODE
puls a restore A
beq continue if 0, in direct mode.

I quickly learned that when a program stops running, this value is not updated to &HFFFF until AFTER the next line is entered. This snippet is from the Github archive of tomctomc:

https://github.com/tomctomc/coco_roms/blob/master/bas.asm

; THIS IS THE MAIN LOOP OF BASIC WHEN IN DIRECT MODE
LAC73 JSR >LB95C ; MOVE CURSOR TO START OF LINE
LAC76 LDX #LABEE-1 ; POINT X TO OK, CR MESSAGE
JSR >LB99C ; PRINT OK, CR
LAC7C JSR >LA390 ; GO GET AN INPUT LINE
LDU #$FFFF ; THE LINE NUMBER FOR DIRECT MODE IS $FFFF
STU CURLIN ; SAVE IT IN CURLIN

BASIC does not update the value until after the first line is entered, which means my attempt to ignore cursor movements when typing would not work for the first line you typed after a program stopped (BREAK, END, STOP, ?SN ERROR, etc.).

William “Lost Wizard” Astle pointed me to another vector I could use to determine when a program stopped running: RVEC12. This is called the “line input” routine, which confused me at first since LINE INPUT did not exist until Extended Color BASIC ROMs were added. But, the naming intent appears to just be “input a line” versus “for the LINE INPUT command”.

It looks like this (again, from the tomctomc disassembly):

; THIS IS THE ROUTINE THAT GETS AN INPUT LINE FOR BASIC
; EXIT WITH BREAK KEY: CARRY = 1
; EXIT WITH ENTER KEY: CARRY = 0
LA38D JSR >CLRSCRN ; CLEAR SCREEN
LA390 JSR >RVEC12 ; HOOK INTO RAM
CLR IKEYIM ; RESET BREAK CHECK KEY TEMP KEY STORAGE
LDX #LINBUF+1 ; INPUT LINE BUFFER
LDB #1 ; ACCB CHAR COUNTER: SET TO 1 TO ALLOW A
; BACKSPACE AS FIRST CHARACTER
LA39A JSR >LA171 ; GO GET A CHARACTER FROM CONSOLE IN

The code at LA390 is called when BASIC wants to input a line. That code jumps out to a RAM hook RVEC12 so that code could run anything it needed to first, such as new code that changes CURLIN to FFFF right then.

I added a new bit of code to my program to save whatever is in RVEC12, then make it point to my new code:

* Hijack the BASIC line input routine.
lda RVEC12 get op code
sta savedrvec12 save it
ldx RVEC12+1 get address
stx savedrvec12+1 save it

lda #$7e op code for JMP
sta RVEC12 store it in RAM hook
ldx #newcode2 address of new code
stx RVEC12+1 store it in RAM hook

Then, in my program, I added a “newcode2” routine:

* William Astle:
* "RVEC12 would be right. You can clobber X in this case. You would check 4,s
* to see if it's $AC7F. If it is, you just set CURLIN to $FFFF. This works
* around the unfortunate ordering of the instructions in the immediate mode
* loop."
newcode2
ldx 2,s get what called us
cmpx #$ac7f
bne continue2
ldx #$ffff
stx CURLIN

continue2
savedrvec12 rmb 3 call regular RAM hook
rts just in case...

The “lda 2,s” retrieves whatever is on the stack which would be the return address we go back to at an rts. (I think the 4 in William’s comment may be a typo; I checked there and did not get an address match, but I do at 2,s.)

AC7F is this bit in BASIC:

; THIS IS THE MAIN LOOP OF BASIC WHEN IN DIRECT MODE
LAC73 JSR >LB95C ; MOVE CURSOR TO START OF LINE
LAC76 LDX #LABEE-1 ; POINT X TO OK, CR MESSAGE
JSR >LB99C ; PRINT OK, CR
LAC7C JSR >LA390 ; GO GET AN INPUT LINE
LDU #$FFFF ; THE LINE NUMBER FOR DIRECT MODE IS $FFFF
STU CURLIN ; SAVE IT IN CURLIN
BCS LAC7C ; BRANCH IF LINE INPUT TERMINATED BY BREAK
TST CINBFL ; CHECK CONSOLE INPUT BUFFER STATUS

At label LAC7C is “jsr >LA390”. This does a jump subroutine to code that calls the RAM hook:

; THIS IS THE ROUTINE THAT GETS AN INPUT LINE FOR BASIC
; EXIT WITH BREAK KEY: CARRY = 1
; EXIT WITH ENTER KEY: CARRY = 0
LA38D JSR >CLRSCRN ; CLEAR SCREEN
LA390 JSR >RVEC12 ; HOOK INTO RAM
CLR IKEYIM ; RESET BREAK CHECK KEY TEMP KEY STORAGE
LDX #LINBUF+1 ; INPUT LINE BUFFER
LDB #1 ; ACCB CHAR COUNTER: SET TO 1 TO ALLOW A

My “newcode2” at RVEC12 looks like it should expect the rts value on the stack of be after LA390, which I think would be at “2,s” on the stack (?), making the “2,s” be the address that called LA390, matching William’s “4,s” to get to it. Not sure if I understand this, but that didn’t work so I did some debug code to put the stack values on the 32 column screen bytes and PEEKed them out to see what was there. That is how I found it at “2,s”.

But I digress… The point seems to be when I am running my code, IF I can tell it was called from this block:

LAC7C           JSR         >LA390          ; GO GET AN INPUT LINE
LDU #$FFFF ; THE LINE NUMBER FOR DIRECT MODE IS
STU CURLIN ; SAVE IT IN CURLIN

…then I know it is the correct spot where I can safely (?) store FFFF in CURLIN, then call whatever code was in the original RAM hook to do the actual line input (which is now running with FFFF in CURLIN). Then it returns from that and sets CURLIN to FFFF in the ROM (which has already been done by my newcode2).

This seems to work, but perhaps William can chime in and explain what I missed with my stack stuff.

Here is the new version of my program:

* lwasm consmove4.asm -fbasic -oconsmove4.bas --map

* Allow embedded characters to move the cursor in a PRINT.

UP equ 'u character for up
DOWN equ 'd character for down
LEFT equ 'l character for left
RIGHT equ 'r character for right

CURLIN equ $68 *PV CURRENT LINE # OF BASIC PROGRAM, $FFFF = DIRECT
DEVNUM equ $6f device number being used for I/O
CURPOS equ $88 location of cursor position in RAM
RVEC3 equ $167 console out RAM hook
RVEC12 equ $182 inputting a BASIC line
VIDRAM equ $400 VIDEO DISPLAY AREA

org $7f00

init
* Hijack the CONOUT routine.
lda RVEC3 get op code
sta savedrvec save it
ldx RVEC3+1 get address
stx savedrvec+1 save it

lda #$7e op code for JMP
sta RVEC3 store it in RAM hook
ldx #newcode address of new code
stx RVEC3+1 store it in RAM hook

* Hijack the BASIC line input routine.
lda RVEC12 get op code
sta savedrvec12 save it
ldx RVEC12+1 get address
stx savedrvec12+1 save it

lda #$7e op code for JMP
sta RVEC12 store it in RAM hook
ldx #newcode2 address of new code
stx RVEC12+1 store it in RAM hook

rts done

uninstall
* TODO

newcode
* Do this only if DEVNUM is 0 (console)
tst DEVNUM is DEVNUM 0?
bne continue not device #0 (console)

* Do this only if NOT in Direct mode. Problem: After a BREAK, CURLIN
* has not been updated yet, so the very first line you type will be
* processing the special characters. Lines after that will not. Trying
* to find a different way to detect this.
pshs a save A
lda CURLIN GET CURRENT LINE NUMBER (CURLIN)
inca TEST FOR DIRECT MODE
puls a restore A
beq continue if 0, in direct mode.

leas 2,s remove PC from stack since we won't be returning there.

* Now this is the start of what Color BASIC ROM does for PUTCHR:
* PUT A CHARACTER ON THE SCREEN
LA30A
PSHS X,B,A SAVE REGISTERS
LDX CURPOS POINT X TO CURRENT CHARACTER POSITION

checkup
cmpa #UP
bne checkdown
CMPX #VIDRAM+32 second line or lower?
blt goLA35D disallow if on top line.
leax -32,x move up one line
bra done

checkdown
cmpa #DOWN
bne checkleft
cmpx #VIDRAM+512-32
bge goLA35D disallow if on bottom line.
leax 32,X move down one line
bra done

checkleft
cmpa #LEFT
bne checkright
cmpx #VIDRAM top left of screen?
beq goLA35D
leax -1,X move left one character
bra done

checkright
cmpa #RIGHT
bne goLA30E
cmpx #VIDRAM+511 bottom right of screen
beq goLA35D
leax 1,x increment X, skipping that location.
bra done

goLA30E
jmp $A30E jump back into Color BASIC ROM code.

done
stx CURPOS update cursor position
goLA35D
jmp $A35D jump back into Color BASIC ROM code.

continue
savedrvec rmb 3 call regular RAM hook
rts just in case...


* William Astle:
* "RVEC12 would be right. You can clobber X in this case. You would check 4,s
* to see if it's $AC7F. If it is, you just set CURLIN to $FFFF. This works
* around the unfortunate ordering of the instructions in the immediate mode
* loop."
newcode2
ldx 2,s get what called us
cmpx #$ac7f
bne continue2
ldx #$ffff
stx CURLIN

continue2
savedrvec12 rmb 3 call regular RAM hook
rts just in case...

end

And this now lets me hit BREAK (or whatever) in my program and then type those “u”, “d”, “l” and “r” characters and see them as lowercase as I type them:

But there are still issues…

But there are still issues. One thing I did not consider is that now I cannot “test” an embedded PRINT from the command line. Typing this:

PRINT "XXXlllYYY";

…should print “XXX” then move left three times and print “YYY” so it only shows YYY. But with the PRINT hack not displaying cursor moves in direct mode, you just get:

So, depending on your preference, you may want to NOT have this extra code active so you just see cursor movements even when you are typing in the program.

Thoughts? Let me know in the comments.

Here is the current BASIC loader:

5 CLEAR 200,&H7F00
10 READ A,B
20 IF A=-1 THEN 70
30 FOR C = A TO B
40 READ D:POKE C,D
50 NEXT C
60 GOTO 10
70 END
80 DATA 32512,32639,182,1,103,183,127,128,190,1,104,191,127,129,134,126,183,1,103,142,127,47,191,1,104,182,1,130,183,127,144,190,1,131,191,127,145,134,126,183,1,130,142,127,132,191,1,131,57,13,111,38,77,52,2,150,104,76,53,2,39,68,50,98,52,22
90 DATA 158,136,129,117,38,10,140,4,32,45,50,48,136,224,32,43,129,100,38,10,140,5,224,44,36,48,136,32,32,29,129,108,38,9,140,4,0,39,22,48,31,32,16,129,114,38,9,140,5,255,39,9,48,1,32,3,126,163,14,159,136,126,163,93,32643,32655,57,174,98,140,172
100 DATA 127,38,5,142,255,255,159,104,32659,32659,57,-1,-1

I added the CLEAR 200,&H7F00 at the top. Just load this, RUN it, then EXEC &H7F00 and then you have the new PRINT stuff with cursor movements.

What next? I’d like to add the ability to assign which characters it uses by making the routine work with DEF USR so you could do something like:

X=USR0("udlr")

Then you could pass in whatever four characters you wanted for the cursor movements. Maybe this could also be used to disable it with something like X=USR0(“”) that did not specify anything to use.

Again, thoughts?

Until next time…

Hacking the Color BASIC PRINT command – part 5

See Also: part 1, part 2, part 3, part 4, part 5 and part 6 (and maybe more to come…)

Version 2 of this code added support for moving the cursor UP, DOWN, LEFT and RIGHT. BUT, it was processing these special characters at all times, including when you typed them in. This was confusing to me, so I wanted to add some way to make them only process when the program is PRINTing them, not when you are just typing them from a command line.

I was unsure how to approach this but ended up looking at CURLIN. This is what line number BASIC is processing or &HFFFF if in “direct mode” (where you type in program lines).

From checking the disassembly, I should be able to add a quick check when deciding to process the special characters and only do that if CURLIN was &HFFFF. The code I added looked like this:

newcode
...snip...

* Do this only if NOT in Direct mode. Problem: After a BREAK, CURLIN
* has not been updated yet, so the very first line you type will be
* processing the special characters. Lines after that will not. Trying
* to find a different way to detect this.
pshs a save A
lda CURLIN GET CURRENT LINE NUMBER (CURLIN)
inca TEST FOR DIRECT MODE
puls a restore A
beq continue if 0, in direct mode.

...snip...

I was unsure what registers I could use, so I saved A and used it, then restored it. If you look at the ROM code, you will see sometimes it will load the bull 16-bit value and check against FFFF, and other times it just loads the first byte and checks that. I decided to do it that way.

This code loads the first byte then increments it. If it was FF, then it should increment/wrap around back to 0, which will set the zero bit (I think) and match a BEQ (branch if equal). I expect there is a better way to do this, but this is what I came up with.

Now, typing the lowercase “u”, “d”, “l” and “r” do not move the cursor around the screen, but PRINTing them in a program will. The downside is if you try to type a PRINT from the command line, it will not be processed — only the PRINTs from the program when it is RUNning.

Is this good enough?

Ah, but there was still a problem. When BASIC returns from running a program, I found it would still process those characters! Then it would stop. I eventually figured out it would process the FIRST line after returning to direct mode, then would behave as I wanted it after that.

I found this code in the ROM:

* THIS IS THE MAIN LOOP OF BASIC WHEN IN DIRECT MODE
LAC73 JSR LB95C MOVE CURSOR TO START OF LINE
LDX #LABE1-1 POINT X TO ‘OK’, CR MESSAGE
JSR LB99C PRINT ‘OK’, CR
LAC7C JSR LA390 GO GET AN INPUT LINE
LDU #FFFF THE LINE NUMBER FOR DIRECT MODE IS $FFFF
STU CURLIN SAVE IT IN CURLIN

For “some reason”, the routine first jumps (JSR) to the code to read a line, THEN it sets the direct mode line number to FFFF. This means the first line you type still has CURLIN set to the last line the program executed. I am sure there is reason for this, but that explains the problem.

I now present the full version of the code, but will show a way William Astle suggested I make it more better in the next installment:

* lwasm consmove3.asm -fbasic -oconsmove3.bas --map

* Allow embedded characters to move the cursor in a PRINT.

UP equ 'u character for up
DOWN equ 'd character for down
LEFT equ 'l character for left
RIGHT equ 'r character for right

CURLIN equ $68 *PV CURRENT LINE # OF BASIC PROGRAM, $FFFF = DIRECT
DEVNUM equ $6f device number being used for I/O
CURPOS equ $88 location of cursor position in RAM
RVEC3 equ $167 console out RAM hook
VIDRAM equ $400 VIDEO DISPLAY AREA

org $7f00

init
lda RVEC3 get op code
sta savedrvec save it
ldx RVEC3+1 get address
stx savedrvec+1 save it

lda #$7e op code for JMP
sta RVEC3 store it in RAM hook
ldx #newcode address of new code
stx RVEC3+1 store it in RAM hook

rts done

uninstall
* TODO

newcode
* Do this only if DEVNUM is 0 (console)
tst DEVNUM is DEVNUM 0?
bne continue not device #0 (console)

* Do this only if NOT in Direct mode. Problem: After a BREAK, CURLIN
* has not been updated yet, so the very first line you type will be
* processing the special characters. Lines after that will not. Trying
* to find a different way to detect this.
pshs a save A
lda CURLIN GET CURRENT LINE NUMBER (CURLIN)
inca TEST FOR DIRECT MODE
puls a restore A
beq continue if 0, in direct mode.

leas 2,s remove PC from stack since we won't be returning there.

* Now this is the start of what Color BASIC ROM does for PUTCHR:
* PUT A CHARACTER ON THE SCREEN
LA30A
PSHS X,B,A SAVE REGISTERS
LDX CURPOS POINT X TO CURRENT CHARACTER POSITION

checkup
cmpa #UP
bne checkdown
CMPX #VIDRAM+32 second line or lower?
blt goLA35D disallow if on top line.
leax -32,x move up one line
bra done

checkdown
cmpa #DOWN
bne checkleft
cmpx #VIDRAM+512-32
bge goLA35D disallow if on bottom line.
leax 32,X move down one line
bra done

checkleft
cmpa #LEFT
bne checkright
cmpx #VIDRAM top left of screen?
beq goLA35D
leax -1,X move left one character
bra done

checkright
cmpa #RIGHT
bne goLA30E
cmpx #VIDRAM+511 bottom right of screen
beq goLA35D
leax 1,x increment X, skipping that location.
bra done

goLA30E
jmp $A30E jump back into Color BASIC ROM code.

done
stx CURPOS update cursor position
goLA35D
jmp $A35D jump back into Color BASIC ROM code.

continue
savedrvec rmb 3 call regular RAM hook
rts just in case...

end

And here is a BASIC loaded:

5 CLEAR 200,&H7F00
10 READ A,B
20 IF A=-1 THEN 70
30 FOR C = A TO B
40 READ D:POKE C,D
50 NEXT C
60 GOTO 10
70 END
80 DATA 32512,32616,182,1,103,183,127,105,190,1,104,191,127,106,134,126,183,1,103,142,127,24,191,1,104,57,13,111,38,77,52,2,150,104,76,53,2,39,68,50,98,52,22,158,136,129,117,38,10,140,4,32,45,50,48,136,224,32,43,129,100,38,10,140,5,224,44,36
90 DATA 48,136,32,32,29,129,108,38,9,140,4,0,39,22,48,31,32,16,129,114,38,9,140,5,255,39,9,48,1,32,3,126,163,14,159,136,126,163,93,32620,32620,57,-1,-1

To be continued…

Hacking the Color BASIC PRINT command – part 4

See Also: part 1, part 2, part 3, part 4, part 5 and part 6 (and maybe more to come…)

One thing that bugs me about this series is the title. We are not actually hacking the PRINT command. Instead, we are changing the CHROUT character output routine. Close enough, I guess, since PRINT uses this routine…

Now let’s take a tangent to my “Attract Screen” series of posts from a few years ago, specifically the final installment:

Color BASIC Attract Screen – part 6

In that series, I was playing with ways to create the classic CoCo attract screens with the colored blocks that went around the screen:

War by James Garon (title screen)

In that article, Robert Gault had pointed me to this specific “WAR!” program sold by Radio Shack. It had embedded stuff in a PRINT command. I dissected those lines. I found some contained embedded CHR$ graphics characters, and another contained an assembly language routine. Very cool BASIC hacking.

In BASIC, there is an “uncrunch” routine that converts tokenized BASIC to full-text output via the LIST command. LIST will not have a good time LISTing such programs. In BASIC, any byte with the high-bit set (128-255 value) is seen as a token and the tokenizer takes over to convert that one (or two) byte sequence to a word.

Instead of showing a CHR$(128) graphics block, LIST will show the token keyword that is 128.

This led me to wonder if I could create a patch for BASIC that would allow it to properly LIST lines with embedded characters like this. And I did. Here is the test program I wrote back then and completely forgot about until a few days ago.

UPPERCASE stuff is taken from the Color BASIC ROM code. You will see I had to clone most of the token parsing code in my program so I could modify its behavior.

* lwasm uncrunch.asm -fbasic -ouncrunch.bas --map
*
* 0.00 2022-07-04 allenh - initial klunky version.
*

* Allow LIST to display graphics characters inside of quoted strings.

RVEC24  equ $1A6     UNCRUNCH BASIC LINE RAM hook

COMVEC  EQU $0120    Some BASIC locations we need.
LINBUF  EQU $02DC
SKP2    EQU $8C
LBUFMX  EQU 250

    org $3f00

init
    lda RVEC24      get op code
    sta savedrvec   save it
    ldx RVEC24+1    get address
    stx savedrvec+1 save it

    lda #$7e        op code for JMP
    sta RVEC24      store it in RAM hook
    ldx #newcode    address of new code
    stx RVEC24+1    store it in RAM hook
    rts             done

newcode
* UNCRUNCH A LINE INTO BASIC'S LINE INPUT BUFFER
LB7C2
    clr     AREWEQUOTED
    *JSR    >RVEC24     HOOK INTO RAM
    LEAS    2,S         Remove JSR from stack
    LEAX    4,X         MOVE POINTER PAST ADDRESS OF NEXT LINE AND LINE NUMBER
    LDY     #LINBUF+1   UNCRUNCH LINE INTO LINE INPUT BUFFER
LB7CB
    LDA     ,X+         GET A CHARACTER
    LBEQ    LB820       BRANCH IF END OF LINE

    * Check for quote/unquote
    cmpa    #34         Is A a quote character?
    bne     quotedone

togglequote
    tst     AREWEQUOTED
    bne     quoteoff
quoteon
    inc     AREWEQUOTED
    bra     quotedone
quoteoff
    clr     AREWEQUOTED Toggle quote mode off.

quotedone
    tst     AREWEQUOTED
    beq     notquoted

quoted
    * If we are quoted, just store whatever it is.
    lda     -1,x

    CMPY    #LINBUF+LBUFMX  TEST FOR END OF LINE INPUT BUFFER
    BCC     LB820   BRANCH IF AT END OF BUFFER
    *ANDA   #$7F    MASK OFF BIT 7
    STA     ,Y+     * SAVE CHARACTER IN BUFFER AND
    CLR     ,Y      * CLEAR NEXT CHARACTER SLOT IN BUFFER
    BRA     LB7CB   GET ANOTHER CHARACTER

notquoted
    lda     -1,x

    LBMI    LB7E6   BRANCH IF IT'S A TOKEN
    CMPA    #':     CHECK FOR END OF SUB LINE
    BNE     LB7E2   BRNCH IF NOT END OF SUB LINE
    LDB     ,X      GET CHARACTER FOLLOWING COLON
    CMPB    #$84    TOKEN FOR ELSE?
    BEQ     LB7CB   YES - DON'T PUT IT IN BUFFER
    CMPB    #$83    TOKEN FOR REMARK?
    BEQ     LB7CB   YES - DON'T PUT IT IN BUFFER
    FCB     SKP2    SKIP TWO BYTES
LB7E0
    LDA     #'!     EXCLAMATION POINT
LB7E2
    BSR     LB814   PUT CHARACTER IN BUFFER
    BRA     LB7CB   GET ANOTHER CHARACTER
* UNCRUNCH A TOKEN
LB7E6
    LDU     #COMVEC-10  FIRST DO COMMANDS
    CMPA    #$FF    CHECK FOR SECONDARY TOKEN
    BNE     LB7F1   BRANCH IF NON SECONDARY TOKEN
    LDA     ,X+     GET SECONDARY TOKEN
    LEAU    5,U     BUMP IT UP TO SECONDARY FUNCTIONS
LB7F1
    ANDA    #$7F    MASK OFF BIT 7 OF TOKEN
LB7F3
    LEAU    10,U    MOVE TO NEXT COMMAND TABLE
    TST     ,U      IS THIS TABLE ENABLED?
    BEQ     LB7E0   NO - ILLEGAL TOKEN
LB7F9
    SUBA    ,U      SUBTRACT THE NUMBER OF TOKENS FROM THE CURRENT TOKEN NUMBER
    BPL     LB7F3   BRANCH IF TOKEN NOT IN THIS TABLE
    ADDA    ,U      RESTORE TOKEN NUMBER RELATIVE TO THIS TABLE
    LDU     1,U     POINT U TO COMMAND DICTIONARY TABLE
LB801
    DECA            DECREMENT TOKEN NUMBER
    BMI     LB80A   BRANCH IF THIS IS THE CORRECT TOKEN
* SKIP THROUGH DICTIONARY TABLE TO START OF NEXT TOKEN
LB804
    TST     ,U+     GRAB A BYTE
    BPL     LB804   BRANCH IF BIT 7 NOT SET
    BRA     LB801   GO SEE IF THIS IS THE CORRECT TOKEN
LB80A
    LDA     ,U      GET A CHARACTER FROM DICTIONARY TABLE
    BSR     LB814   PUT CHARACTER IN BUFFER
    TST     ,U+     CHECK FOR START OF NEXT TOKEN
    BPL     LB80A   BRANCH IF NOT DONE WITH THIS TOKEN
    BRA     LB7CB   GO GET ANOTHER CHARACTER
LB814
    CMPY    #LINBUF+LBUFMX  TEST FOR END OF LINE INPUT BUFFER
    BCC     LB820   BRANCH IF AT END OF BUFFER
    ANDA    #$7F    MASK OFF BIT 7
    STA     ,Y+     * SAVE CHARACTER IN BUFFER AND
    CLR     ,Y      * CLEAR NEXT CHARACTER SLOT IN BUFFER
LB820
    RTS

* Unused at the moment.

savedrvec   rmb 3   call regular RAM hook
    rts             just in case...

AREWEQUOTED rmb 1

    end     $3f00

And here is a BASIC loader for this routine:

10 READ A,B
20 IF A=-1 THEN 70
30 FOR C = A TO B
40 READ D:POKE C,D
50 NEXT C
60 GOTO 10
70 EXEC 16128
80 DATA 16128,16290,182,1,166,183,63,163,190,1,167,191,63,164,134,126,183,1,166,142,63,24,191,1,167,57,127,63,167,50,98,48,4,16,142,2,221,166,128,16,39,0,121,129,34,38,13,125,63,167,38,5,124,63,167,32,3,127,63,167,125,63,167,39,14,166,31,16,140
90 DATA 3,214,36,91,167,160,111,164,32,214,166,31,16,43,0,21,129,58,38,13,230,132,193,132,39,198,193,131,39,194,140,134,33,141,48,32,187,206,1,22,129,255,38,4,166,128,51,69,132,127,51,74,109,196,39,231,160,196,42,246,171,196,238,65,74,43,6,109
100 DATA 192,42,252,32,247,166,196,141,6,109,192,42,248,32,141,16,140,3,214,36,6,132,127,167,160,111,164,57,16294,16294,57,-1,-1

If you RUN that, then install it using EXEC &H3F00, you can now LIST and see characters embedded in strings:

I bring this up now because 1) I forgot to post about it back in 2022, and 2) because I think I want to do something similar with my cursor movement PRINT patch. Ideally, you should be able to LIST a program and see the original characters in it, and just have them move around when the program is running and PRINTs those characters. This matches how the VIC-20 worked with embedded characters inside a PRINT:

Screenshot from the Javascript Vic20 Emuator: https://www.mdawson.net/vic20chrome/vic20.php

Since I do not remember how this worked, I thought we could go through this program and see what it does.

* lwasm uncrunch.asm -fbasic -ouncrunch.bas --map
*
* 0.00 2022-07-04 allenh - initial klunky version.
*

* Allow LIST to display graphics characters inside of quoted strings.

RVEC24 equ $1A6 UNCRUNCH BASIC LINE RAM hook

COMVEC EQU $0120 Some BASIC locations we need.
LINBUF EQU $02DC
SKP2 EQU $8C
LBUFMX EQU 250

The start is very similar to the consmove.asm code presented in the first parts of this series. RVEC24 is the RAM hook for the UNCRUNCH routine. Microsoft made this available so future BASICs (like Extended, Disk, etc.) could support LISTING their new tokens, I assume.

LINBUF is the BASIC input buffer. This is where things go when you are typing at the OK prompt.

SKP2 is a thing Microsoft used in the ROM as a shortcut to skip two bytes. It is $8C, which is “CMPX #”. It seems they could place that in the code and it would be an “Load X” then whatever two bytes were behind it, allowing a shortcut to branch over those two bytes. Can anyone explain this better (or more accurately, or just accurately)? Leave a comment if you can.

And lastly was a define that was the max size of the line buffer – 250 bytes. I have not looking into why it was 250 instead of, say, 255, but this is as many characters as you can type before input stops and you can only backspace or press ENTER.

    org $3f00

init
    lda RVEC24      get op code
    sta savedrvec   save it
    ldx RVEC24+1    get address
    stx savedrvec+1 save it

    lda #$7e        op code for JMP
    sta RVEC24      store it in RAM hook
    ldx #newcode    address of new code
    stx RVEC24+1    store it in RAM hook
    rts             done

This is the code that installs the RAM hook. It should look close to what I have in the previous routine, just with a different vector.

Now we get to the new code. For this I needed to duplicate some code in the BASIC ROM. The code in UPPERCASE represents code I brought in from the Unravelled disassembly listing. The ROM code has no provision for doing something different if it is a quoted string, so I needed to use the original code, with some extra code around that it turns on or off the detokenizer if we are listing something within quotes.

Things are about to get messy…

newcode
* UNCRUNCH A LINE INTO BASIC'S LINE INPUT BUFFER
LB7C2
    clr     AREWEQUOTED
    *JSR    RVEC24      HOOK INTO RAM
    LEAS    2,S         Remove JSR from stack
    LEAX    4,X         MOVE POINTER PAST ADDRESS OF NEXT LINE AND LINE NUMBER
    LDY     #LINBUF+1   UNCRUNCH LINE INTO LINE INPUT BUFFER
LB7CB
    LDA     ,X+         GET A CHARACTER
    LBEQ    LB820       BRANCH IF END OF LINE

This code is mostly from the BASIC ROM except for that first “clr”. I reserve a byte of memory to use as a flag for when we are or are not parsing something in quotes. If a quote is seen, that flag is set. It stays set until another quote is seen or the end of line is reached. The commented-out line is in the original ROM and that would be the line that jumps to this hook. I kept the line in for clarity, but it is not used in my function.

Now we have my custom code. If we are not at the end of line, I check for quotes and turn the flag on or off:

    * Check for quote/unquote
    cmpa    #34         Is A a quote character?
    bne     quotedone

togglequote
    tst     AREWEQUOTED
    bne     quoteoff
quoteon
    inc     AREWEQUOTED
    bra     quotedone
quoteoff
    clr     AREWEQUOTED Toggle quote mode off.
quotedone
    tst     AREWEQUOTED
    beq     notquoted

This flag will be used in a moment. I clone some of the ROM code that outputs the un-crunched line but will bypass the un-crunch for items within quotes.

quoted
    * If we are quoted, just store whatever it is.
    lda     -1,x

    CMPY    #LINBUF+LBUFMX  TEST FOR END OF LINE INPUT BUFFER
    BCC     LB820   BRANCH IF AT END OF BUFFER
    *ANDA   #$7F    MASK OFF BIT 7
    STA     ,Y+     * SAVE CHARACTER IN BUFFER AND
    CLR     ,Y      * CLEAR NEXT CHARACTER SLOT IN BUFFER
    BRA     LB7CB   GET ANOTHER CHARACTER

For quoted items, A is loaded with a character. Then the ROM code runs but I commented out the thing that would mask off bit 7. Tokens have that bit set, but I wanted to leave high-bit bytes alone.

This batch of code is more Color BASIC ROM code I duplicated here after that first “lda” line:

notquoted
lda -1,x

LBMI LB7E6 BRANCH IF IT'S A TOKEN
CMPA #': CHECK FOR END OF SUB LINE
BNE LB7E2 BRNCH IF NOT END OF SUB LINE
LDB ,X GET CHARACTER FOLLOWING COLON
CMPB #$84 TOKEN FOR ELSE?
BEQ LB7CB YES - DON'T PUT IT IN BUFFER
CMPB #$83 TOKEN FOR REMARK?
BEQ LB7CB YES - DON'T PUT IT IN BUFFER
FCB SKP2 SKIP TWO BYTES
LB7E0
LDA #'! EXCLAMATION POINT
LB7E2
BSR LB814 PUT CHARACTER IN BUFFER
BRA LB7CB GET ANOTHER CHARACTER

* UNCRUNCH A TOKEN
LB7E6
LDU #COMVEC-10 FIRST DO COMMANDS
CMPA #$FF CHECK FOR SECONDARY TOKEN
BNE LB7F1 BRANCH IF NON SECONDARY TOKEN
LDA ,X+ GET SECONDARY TOKEN
LEAU 5,U BUMP IT UP TO SECONDARY FUNCTIONS
LB7F1
ANDA #$7F MASK OFF BIT 7 OF TOKEN
LB7F3
LEAU 10,U MOVE TO NEXT COMMAND TABLE
TST ,U IS THIS TABLE ENABLED?
BEQ LB7E0 NO - ILLEGAL TOKEN
LB7F9
SUBA ,U SUBTRACT THE NUMBER OF TOKENS FROM THE CURRENT TOKEN NUMBER
BPL LB7F3 BRANCH IF TOKEN NOT IN THIS TABLE
ADDA ,U RESTORE TOKEN NUMBER RELATIVE TO THIS TABLE
LDU 1,U POINT U TO COMMAND DICTIONARY TABLE
LB801
DECA DECREMENT TOKEN NUMBER
BMI LB80A BRANCH IF THIS IS THE CORRECT TOKEN
* SKIP THROUGH DICTIONARY TABLE TO START OF NEXT TOKEN
LB804
TST ,U+ GRAB A BYTE
BPL LB804 BRANCH IF BIT 7 NOT SET
BRA LB801 GO SEE IF THIS IS THE CORRECT TOKEN
LB80A
LDA ,U GET A CHARACTER FROM DICTIONARY TABLE
BSR LB814 PUT CHARACTER IN BUFFER
TST ,U+ CHECK FOR START OF NEXT TOKEN
BPL LB80A BRANCH IF NOT DONE WITH THIS TOKEN
BRA LB7CB GO GET ANOTHER CHARACTER
LB814
CMPY #LINBUF+LBUFMX TEST FOR END OF LINE INPUT BUFFER
BCC LB820 BRANCH IF AT END OF BUFFER
ANDA #$7F MASK OFF BIT 7
STA ,Y+ * SAVE CHARACTER IN BUFFER AND
CLR ,Y * CLEAR NEXT CHARACTER SLOT IN BUFFER
LB820
RTS

* Unused at the moment.

savedrvec rmb 3 call regular RAM hook
rts just in case...

AREWEQUOTED rmb 1

end $3f00

Looking at this now, I expect I had to duplicate all this ROM code because I need to stay in the un-crunch loop for the complete line. If you look at the ROM code and see a better approach, please leave a comment.

I also think this may be wrong, since I do not see it calling the original vector when complete. This may not work for tokens added by Extended or Disk BASIC. We will have to fix that at some point.

For now, I will just leave this here as an example, and then figure out how I could do something similar with my cursor move codes so they LIST showing the codes, and only move around the screen when PRINTing them.

This should be interesting… The way I patched CHROUT, it has no idea if the character is from LIST or PRINT. I will have to figure out how to solve that, next.

Until then…

Hacking the Color BASIC PRINT command – part 3

See Also: part 1, part 2, part 3, part 4, part 5 and part 6 (and maybe more to come…)

Just because it works, doesn’t mean it’s correct.

Keep this in mind as I continue my exploration of using Color BASIC RAM hooks to change how the PRINT output works.

Previously, I created a small assembly language routine that would intercept BASIC’s output to the 32 column screen and then treat a lowercase “r” (as in “right”) as a transparent space. My proof-of-concept seemed to work.

Today, I present updated code that now supports “u” for up, “d” for down, “l” for left, and “r” for right, semi-simulating how you can embed cursor movements in a PRINT command on a VIC-20 (and, I assume, the C64 and maybe even the PET before it).

Let me walk you through what I came up with:

* lwasm consmove2.asm -fbasic -oconsmove2.bas --map

* Convert any lowercase characters written to the
* screen (device #0) to uppercase.

UP      equ     'u      character for up
DOWN    equ     'd      character for down
LEFT    equ     'l      character for left
RIGHT   equ     'r      character for right

I start with some definitions of the characters I want to use to move the cursor UP, DOWN, LEFT or RIGHT. I wanted to be able to easily change these characters. In a future installment, I may try to implement this using “DEF USR” so you could do something like X=USR0(“abcd”) and pass in a string containing four characters to use for up, down, left and right. For now, hard-coded.

DEVNUM  equ     $6f     device number being used for I/O
CURPOS  equ     $88     location of cursor position in RAM
RVEC3   equ     $167    console out RAM hook
VIDRAM  equ     $400    VIDEO DISPLAY AREA

These next defines are things that the Color BASIC Unravelled disassembly listings had defined. The first is the memory location that contains the current device number for output (0 for screen, -2 to printer, etc.). After that is where BASIC tracks the cursor position on the 32 column screen. Neat is the address of the RAM Vector for the console out routine. Lastly, VIDRAM is the start of the 32-column screen in memory (1024, or &H400 in hex).

    org $7f00

init
    lda RVEC3       get op code
    sta savedrvec   save it
    ldx RVEC3+1     get address
    stx savedrvec+1 save it

    lda #$7e        op code for JMP
    sta RVEC3       store it in RAM hook
    ldx #newcode    address of new code
    stx RVEC3+1     store it in RAM hook

    rts             done

uninstall
    * TODO

Here, I changed the start address of this program to &H7F00 instead of &H3F00. If trying this on a 16K machine, you would have to change it back. Once I know how big the final assembly code is, I’ll probably move it as close as possible to the end of the 32K RAM used by BASIC so it can leave as much RAM for the BASIC program as possible.

This init routine pulls out the three bytes at RVEC3 (which are either “rts” three times for Color BASIC, or patched to be a “JMP xxxx” 3-byte instruction if other BASIC ROMs (Extended, Disk, CoCo 3) are installed I save those to 3-bytes of reserved memory at the end of the program.

I then replace it with a JMP (&H73) instruction followed by the two byte address of my “newcode” that I want to run any time a character is output.

For now, there is no way to uninstall this routine, but I left myself a reminder I should create an uninstall routine. I think I may just make it so if you EXEC the first time, it installs, and if you EXEC again, it uninstalls, rather than using two different memory locations the user would need to know. We shall see.

newcode
    * Do this only if DEVNUM is 0 (console)
    tst     DEVNUM      is DEVNUM 0?          
    bne     continue    not device #0 (console)

    leas    2,s         remove PC from stack since we won't be returning there.

* Now this is the start of what Color BASIC ROM does for PUTCHR:
* PUT A CHARACTER ON THE SCREEN
LA30A
    PSHS    X,B,A       SAVE REGISTERS
    LDX     CURPOS      POINT X TO CURRENT CHARACTER POSITION

When BASIC needs to output a character, it ends up at the PUTCHR routine in the ROM, starting at address A282. The very first thing it does is “jsr RVEC3” which means it will now jump to this “newcode” routine. The character to output will be in register A.

The first thing I do is check the DEVNUM value to see which device number is being used. I use “tst” to check for zero. If the device is not zero (screen), I branch to a “continue” routine which will be the code that was originally in the RVEC3. (Thus, on Color BASIC it will just branch to an “rts” return, or it will branch to whatever “jmp xxxx” was in the vector before I hijacked it. More on this in a moment.

If I get past that, I will be doing some work and then jumping back into the ROM so the rest of BASICs output routine can run. In that situation, I will not be doing an “rts” from my custom code. I may be wrong on this, but my understanding is that I need to remove the return address off the stack (since I will not be returning), which I do by moving the S stack pointer up by 2. Is this what I need to be doing?

Since BASIC already checks for special characters like backspace and ENTER, I was able to pattern my code after what the ROM does for those and leave out the part where it puts a character on the screen. I take the first two line that BASIC would have done (from LA30A in the Unravelled listing) and put them into my code. This then lets me have my custom blocks checking for special characters the same way BASIC does for the ones it handles.

checkup
    cmpa    #UP
    bne     checkdown
    CMPX    #VIDRAM+32  second line or lower?
    blt     goLA35D     disallow if on top line.
    leax    -32,x       move up one line
    bra     done

The first check is for the UP character. If not seen, we branch to the “checkdown” routine just after this one. If it was UP, then we check to see if X is 32 bytes past the start of screen memory — the start of the second line. If you are on the top line, moving UP is ignored and we branch to a label that will then JMP back into the Color BASIC ROM to finish the normal checks that BASIC does.

Else, decrement X (cursor position) by 32 to move up one line, then branch to the “done” routine which will get us back into BASIC ROM code.

checkdown
    cmpa    #DOWN
    bne     checkleft
    cmpx    #VIDRAM+512-32
    bge     goLA35D     disallow if on bottom line.
    leax    32,X        move down one line
    bra     done

The check for DOWN follows a similar pattern, this time disallowing down if you are on the bottom line of the screen. This is different than how Commodore handles it, since down on the Commodore will cause the screen to scroll up. I plan to fix this later to make it act more like the VIC-20 does.

If we cannot move down, we branch to a label that will then JMP back into the Color BASIC ROM to finish the normal checks that BASIC does. If we can move down, the cursor position in X is increased by 32.

NOTE: During my testing, I found a VIC-20 bug. If you cursor to the bottom right of the screen then press RIGHT, the first time you do this is scrolls up and then the cursor appears one line before the last line. If you then cursor to the bottom right position again and go RIGHT, it scrolls up and now the cursor is on the bottom left position. Odd. I should find a VIC-20 BASIC disassembly and see if I can figure out what causes it do misbehave, but only the first time.

checkleft
    cmpa    #LEFT
    bne     checkright
    cmpx    #VIDRAM     top left of screen?
    beq     goLA35D
    leax    -1,X        move left one character
    bra     done

The process repeats again for LEFT. This is much like what BASIC does with backspace (BS) where it disallows it if you are in the top left position of the screen. If we cannot move left, we branch to a label that will then JMP back into the Color BASIC ROM to finish the normal checks that BASIC does. If we can, the cursor position is decremented by 1.

checkright
    cmpa    #RIGHT
    bne     goLA30E
    cmpx    #VIDRAM+511 bottom right of screen
    beq     goLA35D
    leax    1,x         increment X, skipping that location.
    bra     done

Lather, rinse, repeat. For RIGHT, I disallow moving right from the bottom right character position. This is another difference from VIC-20 which I plan to fix later. If we cannot, we branch to the same label that gets us back into the BASIC ROM routine. If we can move right, the cursor position is incremented by 1.

And now some of the labels…

goLA30E
    jmp     $A30E       jump back into Color BASIC ROM code.

done
    stx     CURPOS      update cursor position
goLA35D
    jmp     $A35D       jump back into Color BASIC ROM code.

A30E in the disassembly is the first instruction right after the two lines I cloned into my routine at my LA30A label. Thus, I start out with those first two lines, then do my code and either jump back in to let the ROM code continue, OR I modify the cursor position then jump to DONE so it can finish up.

“done” will update the saved cursor position that is in the X register, then jump to A35D which is the final line of the output ROM routine where the registers that were pushed/saved at the top of the routine are pulled/restored (including PC which acts as an rts to whoever called the routine).

continue
savedrvec rmb 3         call regular RAM hook
    rts                 just in case...

    end

Lastly, if the original check for console (device 0) was not true, that code jumps to this end part so it can continue on to whatever the vector was pointing to before we hijacked it.

Here is a BASIC loader for this program. I added LINE 5 to reserve memory for the assembly, else BASIC will clobber it with string storage;

5 CLEAR 200,&H7F00
10 READ A,B
20 IF A=-1 THEN 70
30 FOR C = A TO B
40 READ D:POKE C,D
50 NEXT C
60 GOTO 10
70 END
80 DATA 32512,32607,182,1,103,183,127,96,190,1,104,191,127,97,134,126,183,1,103,142,127,24,191,1,104,57,13,111,38,68,50,98,52,22,158,136,129,117,38,10,140,4,32,45,50,48,136,224,32,43,129,100,38,10,140,5,224,44,36,48,136,32,32,29,129,108,38,9
90 DATA 140,4,0,39,22,48,31,32,16,129,114,38,9,140,5,255,39,9,48,1,32,3,126,163,14,159,136,126,163,93,32611,32611,57,-1,-1

And here is the complete assembly listing:

* lwasm consmove2.asm -fbasic -oconsmove2.bas --map

* Convert any lowercase characters written to the
* screen (device #0) to uppercase.

UP      equ     'u      character for up
DOWN    equ     'd      character for down
LEFT    equ     'l      character for left
RIGHT   equ     'r      character for right

DEVNUM  equ     $6f     device number being used for I/O
CURPOS  equ     $88     location of cursor position in RAM
RVEC3   equ     $167    console out RAM hook
VIDRAM  equ     $400    VIDEO DISPLAY AREA

    org $7f00

init
    lda RVEC3       get op code
    sta savedrvec   save it
    ldx RVEC3+1     get address
    stx savedrvec+1 save it

    lda #$7e        op code for JMP
    sta RVEC3       store it in RAM hook
    ldx #newcode    address of new code
    stx RVEC3+1     store it in RAM hook

    rts             done

uninstall
    * TODO

newcode
    * Do this only if DEVNUM is 0 (console)
    tst     DEVNUM      is DEVNUM 0?          
    bne     continue    not device #0 (console)

    leas    2,s         remove PC from stack since we won't be returning there.

* Now this is the start of what Color BASIC ROM does for PUTCHR:
* PUT A CHARACTER ON THE SCREEN
LA30A
    PSHS    X,B,A       SAVE REGISTERS
    LDX     CURPOS      POINT X TO CURRENT CHARACTER POSITION
    
checkup
    cmpa    #UP
    bne     checkdown
    CMPX    #VIDRAM+32  second line or lower?
    blt     goLA35D     disallow if on top line.
    leax    -32,x       move up one line
    bra     done

checkdown
    cmpa    #DOWN
    bne     checkleft
    cmpx    #VIDRAM+512-32
    bge     goLA35D     disallow if on bottom line.
    leax    32,X        move down one line
    bra     done

checkleft
    cmpa    #LEFT
    bne     checkright
    cmpx    #VIDRAM     top left of screen?
    beq     goLA35D
    leax    -1,X        move left one character
    bra     done

checkright
    cmpa    #RIGHT
    bne     goLA30E
    cmpx    #VIDRAM+511 bottom right of screen
    beq     goLA35D
    leax    1,x         increment X, skipping that location.
    bra     done

goLA30E
    jmp     $A30E       jump back into Color BASIC ROM code.

done
    stx     CURPOS      update cursor position
goLA35D
    jmp     $A35D       jump back into Color BASIC ROM code.

continue
savedrvec rmb 3         call regular RAM hook
    rts                 just in case...

    end

And now I can write a program like this:

As I typed that in, after the four Xs I typed a lowercase “d” for down, then four lowercase “l” for left, then X and two “r” then another X, then “dlll” followed by the four Xs again. It looks weird as it moves the cursor around as I type, but I will be fixing that with more additions to this routine in a future installment.

For now, it lets me put this 4×3 thing on the screen without erasing stuff around or behind it:

Progress!

Please, folks, let me know in the comments if you see something I am doing wrong. Am I handling the stack pointer thing properly? I am just experimenting and … well, “it works for me.”

To be continued…

Hacking the Color BASIC PRINT command – part 2

See Also: part 1, part 2, part 3, part 4, part 5 and part 6 (and maybe more to come…)

Here is what I have learned so far…

When BASIC wants to output a character, it jumps to a ROM routine called CHROUT. This is one of a few documented ROM calls in Color BASIC. Here is how it is documented in the EDTASM+ manual:

I have used this ROM call in previous blog posts, showing how to output “HELLO WORLD” or whatever using assembly:

That small program makes the X register point to the first byte of the MSG data. It then loads that byte into register A and increments X. If A is 0 (the end of string marker), it branches to DONE. If not, it jumps to the subroutine pointed to at $A002 which outputs whatever is in A. It then goes back to do it again.

I wrote a two-part post about Color BASIC RAM hooks awhile ago. There I explored these extra spots where later BASIC ROMs (like Extended, DISK, and the CoCo 3 ROM) can patch in and make the original Color BASIC call new code first. That new code did things like add support for DISK devices or the (gone in CoCo 3) DLOAD command. I assume it is also used to patch in the CoCo 3’s high resolution 40/80 column text screens, as well.

The idea is before Color BASIC does its thing with the CHROUT routine, it will jump to “new code” and let it run first, and that code may then return back to let the normal code process. When disk devices were added as device #1 to device #15, this jump goes to code in Disk Extended BASIC which checks the incoming device number. If it is 1-15, new code is called that takes care of writing data to a disk file. If not, it returns back and Color BASIC handles the devices it knows about (device 0 for console, device -1 is the cassette, and device -2 if the serial/printer port).

For the task Erico was inquiring about, I thought I could patch in a new routine that would check for device 0 (writing to the screen) and then if it was, look for a special character that now means “transparent”. Instead of putting the character on the screen then moving the cursor position over by one, it would just move the cursor position and leave the screen alone.

Here is what I came up with in a quick and dirty hack:

* lwasm consmove.asm -fbasic -oconsmove.bas --map

* Convert any lowercase characters written to the
* screen (device #0) to uppercase.

DEVNUM equ $6f      device number being used for I/O
CURPOS equ $88      location of cursor position in RAM
RVEC3 equ $167      console out RAM hook

    org $3f00

init
    lda RVEC3       get op code
    sta savedrvec   save it
    ldx RVEC3+1     get address
    stx savedrvec+1 save it

    lda #$7e        op code for JMP
    sta RVEC3       store it in RAM hook
    ldx #newcode    address of new code
    stx RVEC3+1     store it in RAM hook

    rts             done

newcode
    * Do this only if DEVNUM is 0 (console)
    tst DEVNUM      is DEVNUM 0?          
    bne continue    not device #0 (console)

    * If here, device #0 (console)
    cmpa #'r        compare A to lowercase 'r'
    bne continue    if not, continue

    leas 2,s        remove PC from stack since we won't be returning there.

    * Now this is the start of what Color BASIC ROM does for PUTCHR:
LA30A
    pshs x,b,a
    ldx CURPOS      X points to current cursor position
    leax 1,x        increment X, skipping that location.
    jmp $a344       jump back into Color BASIC ROM code.

continue
savedrvec rmb 3     call regular RAM hook
    rts             just in case...

    end

The “init” code saves out whatever is in the console out RAM hook (three bytes, either “rts” if not being used, or “jmp abcd” to jump to new code). It then patches in a “jmp” to the new code in this program, then returns back. This installs the new code into the RAM hook.

Now when anything tries to print through CHROUT, BASIC first calls whatever the RAM hook points to. That will be the “newcode” here.

This new code first checks if the device is 0, indicating we are printing to the screen. If it is not, it branches to the “continue” spot where the original RAM hook jump was copied, allowing it to jump there and proceed as normal.

Next it checks to see if the character to print is a lowercase ‘r’ (I chose ‘r’ for “right”). If not, it returns.

Now things get weird. This is not a normal RAM hook because I can’t just “do stuff” then return to the original ROM code. That original code would still want to output whatever is in register A to the screen. To make mine work, I need to take over some of the functions of the ROM and then jump back into the normal ROM code so it can continue after I have done my thing.

In a normal RAM hook, BASIC would use a JSR (jump subroutine) to get there, then have that code return back. But for this hack to work, my new code has to jump directly back into Color BASIC. Because of this, there is a return address (where RTS will return to) on the stack that we won’t be using. Doing “leas 2,s” moves the stack pointer so it skips that. This allows me to jump directly out of this new code back into the ROM.

Now we have to do whatever code the ROM would have done before processing the character. I looked at the disassembly and see it pushes three registers to save them, then at the end it will pull them back off the stack to restore them.

So I do that in my code.

Next, all I need to do is increment the cursor position then jump back into the Color BASIC ROM just past where it would normally put the character on the screen. From there on, everything will be normal.

I may not have explained this very well, but I am sure someone can help me in the comments.

Patching BASIC with BASIC

Thanks to LWTools, here is a simple BASIC loader for this test code:

10 READ A,B
20 IF A=-1 THEN 70
30 FOR C = A TO B
40 READ D:POKE C,D
50 NEXT C
60 GOTO 10
70 END
80 DATA 16128,16170,182,1,103,183,63,43,190,1,104,191,63,44,134,126,183,1,103,142,63,24,191,1,104,57,13,111,38,15,129,114,38,11,50,98,52,22,158,136,48,1,126,163,68,16174,16174,57,-1,-1

If you load this into a CoCo or emulator, then RUN it, the code will be loaded at address &H3F00. Type “EXEC &H3F00” to install the new RAM hook and now any print of lowercase ‘r’ skips to the right without erasing anything.

10 CLS0
20 PRINT@0,"THISrrrrWILLrrrrWORK"
30 GOTO 30

That simple program will clear to a black screen, then you will see “THIS” then four blocks to the right “WILL” then four blocks past that “WORK”. Normally it would erase those blocks with spaces, but the lowercase ‘r’ now means “just skip to the right”.

It’s not perfect, but it does prove the concept.

More to come…

Hacking the Color BASIC PRINT command – part 1

Over in the CoCo Facebook group, Erico Patricio Monteiro asked an interesting question:

“Would it be possible to PRINT a transparent character?”

– Erico Patricio Monteiro

His idea was for putting stuff on the CoCo’s text screen without wiping out the background. He used an example of PRINT”OOOOX” where “O” represented the transparent character. The way I see it, if you had the first line printed like this:

PRINT @0,"ABCDEFG";

…and you had such a transparent character and did this:

PRINT @0,"OOOXOOO";

…your top line would then look like this:

ABCXEFG

This reminded me of how my Commodore VIC-20 had cursor movement characters you could embed in a PRINT statement. Once you typed the quote, you could now embed typeable escape codes to change colors, clear the screen, home the cursor, insert or delete characters, or just move the cursor up, down, left or right. This made VIC-20 program listings require a graphical printer to represent those characters. Here is something from the VIC-20 manual:

This is many pages into the manual, after it had explained how you can embed things like color changes or reverse video. The program listings in the VIC-20 manual had graphical characters in them, and you had to learn what to type to recreate them:

Program listings for the VIC looked weird ;-)

At some point, Commodore BASIC listings were represented with text strings instead of the graphical characters, making a modern listing look like this:

25 print"{home}{down}{right}{black}abc{down}{left*3}def{down}{left*3}ghi{down}{left*3}jkl"

Then you just had to know what key was “black” or “left” (three times, as the “left*3” indicates).

But I digress…

Since there was no PRINT@ or LOCATE on the VIC-20, any time you wanted to print something in a particular spot on the screen you had to print the HOME (move cursor to top left of the screen) character then use a bunch of cursor controls to move to where you wanted to print.

This was … not optimal. And thus, most BASIC VIC-20 programs would print their information (lives left, etc.) on the top of the screen since it was shorter code just to home and print there:

VIC-20 Sky-Ape-Er, screen 3.

My Sky-Ape-Er VIC-20 game had a timer, and I got it in that top corner like this:

105 print"{home}{blue}{reverse on}{right}time:";t:t=t+1

You will notice the above snipped says “reverse on” and has “time” in lowercase, but on my screenshot it is uppercase without being reversed. That is due to the character sets of the VIC-20 where some modes were upper and lower, some were upper with reverse, and other combinations. For the mode I was in, reverse was getting the uppercase characters (and uppercase characters typed with SHIFT would be the graphical characters for those keys).

“But that’s really not important to this story…”

If you look at the Color BASIC Unraveled book you can find the Console Out routine (PUTCHR) on page 84. I did not want to type it all in here, but I did find this GitHub repository by tomctomc that has this already in a text file:

coco_roms/bas.asm at master · tomctomc/coco_roms

From the “bas.asm”, here is the code in question:

; CONSOLE OUT
PUTCHR          JSR         >RVEC3          ; HOOK INTO RAM
                PSHS        B               ; SAVE ACCB
                LDB         DEVNUM          ; GET DEVICE NUMBER
                INCB                        ;  SET FLAGS
                PULS        B               ; RESTORE ACCB
                BMI         LA2BF           ; SEND TO LINE PRINTER
                BNE         LA30A           ; SEND TO SCREEN

                ...snip...

; PUT A CHARACTER ON THE SCREEN
LA30A           PSHS        X,B,A           ; SAVE REGISTERS
                LDX         CURPOS          ; POINT X TO CURRENT CHARACTER POSITION
LA30E           CMPA        #BS             ; IS IT BACKSPACE?
                BNE         LA31D           ; NO
                CMPX        #VIDRAM         ; AT TOP OF SCREEN?
                BEQ         LA35D           ; YES - DO NOT ALLOW BACKSPACE
                LDA         #$60            ; BLANK
                STA         ,-X             ; PUT IN PREVIOUS POSITION
                BRA         LA344           ; SAVE NEW CURPOS
LA31D           CMPA        #CR             ; ENTER KEY?
                BNE         LA32F           ; BRANCH IF NOT
                LDX         CURPOS          ; GET CURRENT CHAR POSITION
LA323           LDA         #$60            ; BLANK
                STA         ,X+             ; PUT IT ON SCREEN
                TFR         X,D
                BITB        #$1F            ; TEST FOR BEGINNING OF NEW LINE
                BNE         LA323           ; PUT OUT BLANKS TILL NEW LINE
                BRA         LA344           ; CHECK FOR SCROLLING
LA32F           CMPA        #SPACE
                BCS         LA35D           ; BRANCH IF CONTROL CHARACTER
                TSTA                        ;  SET FLAGS
                BMI         LA342           ; IT IS GRAPHIC CHARACTER
                CMPA        #$40
                BCS         LA340           ; BRANCH IF NUMBER OR SPECIAL CHARACTER
                CMPA        #$60            ; UPPER/LOWER CASE?
                BCS         LA342           ; BRANCH IF UPPER CASE ALPHA
                ANDA        #$DF            ; CLEAR BIT 5, FORCE ASCII LOWER CASE TO BE UPPER CASE
LA340           EORA        #$40            ; INVERT BIT 6, CHANGE UPPER CASE TO LOWER & VICE VERSA
LA342           STA         ,X+             ; STORE CHARACTER TO SCREEN
LA344           STX         CURPOS          ; SAVE CURRENT CHAR POSITION
                CMPX        #VIDRAM+511     ; END OF SCREEN BUFFER?
                BLS         LA35D           ; RETURN IF NO NEED TO SCROLL

You can see at LA30A the code begins checking for things like backspace and enter. Eventually at LA342 it puts the character on the screen an increments X which is the current screen location. It then has code (not shown) that detects being at the bottom of the screen and scrolling up a line if needed.

To patch this CHROUT routine to support a “transparent” character, I think we’d just have to intercept the code at LA342 and decide if it should put a character on the screen (STA ,X+) or just increment X (LEAX 1,X or something) without putting anything there.

And that would be a real simply patch. A CoCo 1/2 with 64K could run the program that copies the ROM into RAM then switches over, then this could code easily be patched.

And while we were there, maybe it could be extended to support cursor movements as well, replicating how the VIC-20 output works.

But I am getting ahead of myself…

To be continued…

Dragon User magazine and typing in assembly

Over on YouTube, user @ms-ex8em pointed me to the November 1988 issue of Dragon User magazine and a little program that helps you type in machine code program listings accurately.

I recall spending countless hours typing in machine code programs from Rainbow Magazine “back in the day”. These programs would be a few lines of BASIC followed by dozens or hundreds of lines of DATA statements with numbers that represented the bytes of the machine language program.

One such program I specifically remember was Zonx, a cool game with background sound effects. It appeared in the October 1985 issue. You can see what I typed in by checking out page 65. And page 67. And page 68. And page 70. And page 71.

https://colorcomputerarchive.com/repo/Documents/Magazines/Rainbow,%20The/The%20Rainbow%20Vol.%2006%20No.%2003%20-%20October%201986.pdf

And, somehow, I managed to type all of those numbers in and get a working program!

Rainbow did have one thing to help — and perhaps I used it. They had a small program you could run, then as you typed in your program listing, when you got to specific lines you could hit a key and it would spit out a number (a checksum?) and see if it matched their numbers:

Above, you can see what number should be expected when you got to line 150, 310, 440 and so on. After all the code was typed in, the final number it should give you was 238. This required you to type in the program exactly as it appeared including spaces and any other formatting.

It was not optimal, but it was better than nothing.

I also saw some BASIC loaders for machine language that took matters into their own hands, and had built-in checksum values in their DATA statements. For example, they would put a specific amount of numbers on each line, with a checksum of those numbers at the end. Say that amount of numbers was five. A line might look like this:

1500 DATA 1,1,1,1,1,5

In that silly example, you could add up ever five bytes you READ and then compare that to the 6th byte in each line. If it matched, continue. If it didn’t, the loader code would print some kind of message, hopefully indicating what line had a mismatch.

I always appreciated program listings that did that.

Now, back to @ms-ex8em… The code they shared was quite different. You can find it on page 12:

https://colorcomputerarchive.com/repo/Documents/Magazines/Dragon%20User/Dragon%20User%20-%208811%20-%20November%201988.pdf

I will do my best to type it in here, accurately.

10 REM HEX LOADER
20 CLEAR 200,31599
30 INPUT"START";ST
40 INPUT"END";ED
50 FOR J=ST to ED STEP 8
60 PRINT USING"##### : ";J;
70 INPUT A$
80 CS=0
90 FOR K=1 TO LEN(A$)
100 CS=CS+K*VAL("&H"+MID$(A$,K,1))
110 NEXT K
120 INPUT" = ";C
130 IF S<>CS THEN PRINT"CHECKSUM ERROR-TRY AGAIN":SOUND 1,1:GOTO 60
140 FOR K=0 TO 7
150 POKE J+K,VAL("&H"+MID$(K*2+1,2)
160 NEXT K,J

First comment… There appears to be a mistake in line 150. It should have two closing parens at the end of the line.

Now let’s see what this code is doing.

In line 10, the CLEAR reserves memory (so BASIC cannot use it) starting at 31599. This is because the machine language program will load starting at 31600.

In line 40, the user is asked to input the start and end of the program they will be typing in.

In line 50, there is a FOR/NEXT that goes from start to end, stepping by 8. This tells me they expect 8 bytes of data on each line the user types in. The first input should be at 31600, then the next at 31608 and so on.

The user types in the 16 characters (representing 8 hex bytes) shown in the box below. They look like this:

31600 : 8E7B6F9F74Be012B  = 1016

In line 80, a checksum is calculated by adding up the value of each character typed in that line.

In line 120, the user is asked to type in the checksum shown in the data listing.

In line 130, if the number the user typed in (from the data listing) does not match value the program calculated, an error message prints and it goes back to let the user retype the entire line.

If it matched, line 140 will then go through those 16 characters and create the byte value for each of the 8 data bytes and POKE them into memory at the proper place.

This repeats until the end address is reached. At that point, some new assembly language program has been loaded into memory starting at 31600.

Pretty cool!

The problem @ms-ex8em was seeing with this program was a ?TM ERROR. That is most likely showing up in line 150 because of the missing paren. Fix that, then maybe the rest will work ;-)

I thought this was a neat way to enter in machine code. What other neat ways did you see back then?

Comment away…

ChatGPT AI can analyze 6809 assembly language!

I gave it this:

clear
    lda #96
clearA
    ldx #1024
loop
    sta ,x+
    cmpx #1536
    bne loop
    rts

…and ChatGPT gave me this:

The clear subroutine in this code is used to clear the screen and initialize it with a grid of green squares. The subroutine begins by loading the value 96 into the accumulator register A using the LDA instruction. This value represents the color value for green, which will be used to fill the screen with green squares.

Next, the LDX instruction is used to load the value 1024 into the X register. This value is the memory address of the top left corner of the screen, and it will be used as the starting point for the POKE operations that will be used to fill the screen with green squares.

The loop label marks the start of a loop that will be used to fill the screen with green squares. Inside the loop, the STA ,X+ instruction is used to store the value in the accumulator register A at the memory location pointed to by the X register, and then increment the X register by one. This has the effect of setting the value at the current memory location to the color value for green, and then moving to the next memory location on the screen.

The CMPX instruction is then used to compare the value in the X register to the value 1536. If the X register is less than 1536, it means that there are still memory locations on the screen that need to be filled with the color value for green. In this case, the BNE instruction is used to branch to the loop label, causing the loop to be repeated and the next memory location on the screen to be filled with the color value for green.

If the X register is equal to 1536, it means that all of the memory locations on the screen have been filled with the color value for green, and the loop can be exited. In this case, the BNE instruction will not be executed, and the RTS instruction will be executed, causing the subroutine to return to the main program. This completes the execution of the clear subroutine.

– ChatGPT AI

Now, it may have figured some of this out because I previously fed it a longer (and fully commented) bit of 6809 code that contained that routine as one of its several functions. That’s the only way I can see it knew the screen location and a green block.

https://chat.openai.com/chat, ya’ll.

Counting 6809 cycles with LWASM

Followers of my ramblings know that I enjoy benchmarking BASIC. It is interesting to see how minor changes can produce major speed differences. And while BASIC is supposedly “completely predictable,” there are so many items that must be considered — line numbers, line length, number of variables, amount of strings, etc. — you can’t really look at any bit of code and know how fast it will run unless it’s something self contained like this:

10 FOR A=1 TO 1000
20 NEXT

Beyond things like that, there’s a lot of trail-and-error needed. Code like this:

...
100 FOR A=1 TO 100
110 Z=Z+1
120 NEXT
...

…can have dramatically different speeds depending on how many other variables there are, and where Z was declared in the list of them.

6809 assembly language is far more predictable. Every machine language instruction has a known amount of CPU cycles it takes to operate — based on variations of that code. For example, loading register A with a value:

lda #42

…should be 100% predictable simply by looking up the cycle counts for “load A” in some 6809 reference guide. The Motorola data sheet for the 6809 tells me that “LDA” takes 2 cycles.

So, really, there’s no benchmarking needed. You just have to look up the instructions (and their specific type — direct, indirect, etc.) and add up some numbers.

But where’s the fun in that?

William “Lost Wizard” Astle‘s LWTOOLS provides us with the lwasm assembler for Mac, Windows, Linux, etc. One of its features is cycle counting. It can generate a list of all the machine language bytes that the assembly source code turns in to and, optionally, include the cycle count for each line of assembly code.

I just learned about this and had to experiment…

Here is a simple assembly loop that clears the 32-column screen. I’ll add some comments that explain what it does, as if it were BASIC…

clear
    lda #96     * A=96 (green space)
clearwitha
    ldx #1024   * X=1024 (top left of screen)
loop
    sta ,x+     * POKE X,A:X=X+1
    cmpx #1536  * Compare X to 1536 
    bne loop    * If X<>1536, GOTO loop
    rts         * RETURN

To clear the screen to spaces (character 96), it is called with:

 bsr clear

To clear the screen with a different value, such as 128 for black, it can be called like this:

 lda #128
 bsr clearwitha

LWASM is able to tell me how many CPU cycles each instruction will take. To generate this, you have to include a special pragma command in the source code, or pass it in on the command line. In source code, it is done by using the special “opt” keyword followed by the pragma. The ones we are interested in are listed in the manual:

opt c  - enable cycle counts: [8]
opt cd - enable detailed cycle counts breaking down addressing modes: [5+3]
opt ct - show a running subtotal of cycles
opt cc - clear the running subtotal

Adding ” opt c” at the top of the source code will enable it, and then you would use the “-l” command line option to generate the list file which will not contain cycle counts. (You can also send the list output to a file using -lfilename if you prefer.)

You can also pass in this pragma using a command line “–pragma=c”, like this:

lwasm clear.asm -fbasic -oclear.bas --pragma=c -l

Above, I am assembling the program in to a BASIC loader which I can load from BASIC, and then RUN to load the machine language program in to memory. Here is what that command displays for me:

allenh@alsmacbookpro asm % lwasm clear.asm -fbasic -oclear.bas --pragma=c -l
0000                  (        clear.asm):00001         clear
0000 8660             (        clear.asm):00002 (2)         lda #96     * A=96 (green space)
0002                  (        clear.asm):00003         clearwitha
0002 8E0400           (        clear.asm):00004 (3)         ldx #1024   * X=1024 (top left of screen)
0005                  (        clear.asm):00005         loop
0005 A780             (        clear.asm):00006 (5)         sta ,x+     * POKE X,A:X=X+1
0007 8C0600           (        clear.asm):00007 (3)         cmpx #1536  * Compare X to 1536 
000A 26F9             (        clear.asm):00008 (3)         bne loop    * If X<>1536, GOTO loop
000C 39               (        clear.asm):00009 (4)         rts         * RETURN
                      (        clear.asm):00010         
                      (        clear.asm):00011             END

That’s a bit too wide of a listing for my comfort, so from now on I’ll just include the right portion of it — starting with the (number) in parenthesis. That is the cycle count. If you look at the “lda #96” line you will see it confirms “lda” takes two cycles:

(2)         lda #96     * A=96 (green space)

Another pragma of interest is one that will start counting the total number of cycles code takes.

opt ct - show a running subtotal of cycles

If you just turned it on, it would be not be very useful since it would just be adding up all the instructions from top to bottom of the source, not taking in to consideration branching or subroutines or loops. But, we can clear that counter and start it at any point by including the “opt” keyword in the code around routines we are interested in.

opt cc - clear the running subtotal

And we can turn them off by putting “no” in front:

opt noct - STOP showing a running subtotal of cycles

In the case of my clear.asm program, I would want to clear the counter and turn it on right at the start of loop, and turn it off at the end of the loop. This would show me a running count of how many cycles that loop takes:

        clear
(2)         lda #96
        clearA
(3)         ldx #1024
            opt ct,cc
                loop
(5)     5           sta ,x+
(3)     8           cmpx #1536
(3)     11          bne loop
                    opt noct
(4)         rts

The numbers to the right of the (cycle) count numbers are the sum of all instructions from the moment the counter was enabled.

The code from “loop” to “bne loop” takes 11 cycles. Since each loop sets one byte on the screen, and since there are 512 bytes on the screen, clearing the screen this way will take 11 * 512 = 5632 cycles (plus a few extra before the loop, setting up X and A).

Instead of clearing the screen 8-bits at a time, I learned that using a 16-bit register would be faster. I changed the code to use a 16-bit D register instead of the 8-bit A register, like this:

clear16
    lda #96     * A=96
    tfr a,b     * B=A (D=A*256+B)
clearA16
    ldx #1024   * X=1024 (top left of screen)
    opt ct,cc   * Clear counter, turn it on.
loop16
    std ,x++    * POKE X,A:POKE X+1,B:X=X+1
    cmpx #1536  * Compare X to 1536.
    bne loop16  * If X<>1536, GOTO loop16
    opt noct    * Turn off counter.
    rts         * RETURN

Since 16-bit register D is made up of 8-bit registers A and B, I simply transfer whatever is in A to B and that makes both bytes of D the same as A. Then in the loop, I store D at X, and increment it by 2 (to get to the next two bytes). Looking at cycles again…

        clear16
(2)         lda #96
(4)         tfr a,b
        clearA16
(3)         ldx #1024
            opt ct,cc
                loop16
(7)     7           std ,x++
(3)     10          cmpx #1536
(3)     13          bne loop16

The code from “loop16” to “bne loop16” takes 13 cycles, which is longer than the original. But, each loop does two bytes instead of one. Instead of needing 512 times through the loop, it only needs 256. 13 * 256 = 3328 cycles. Progress!

And, if we can do 16-bits at a time, why not 32? Currently, D is the value to store, and X is where to store it. We could just store D twice in a row…

clear32
    lda #96     * A=96
    tfr a,b     * B=A (D=A*256+B)
clearA32
    ldx #1024   * X=1024 (top left of screen)
    opt ct,cc   * Clear counter, turn it on.
loop32
    std ,x++    * POKE X,A:POKE X+1,B:X=X+2
    std ,x++    * POKE X,A:POKE X+1,B:X=X+2
    cmpx #1536  * Compare X to 1536.
    bne loop32  * If X<>1536, GOTO loop32
    opt noct    * Turn off counter.
    rts         * RETURN

Let’s see what that does…

        clear32
(2)         lda #96     * A=96
(4)         tfr a,b     * B=A (D=A*256+B)
        clearA32
(3)         ldx #1024   * X=1024 (top left of screen)
            opt ct,cc   * Clear counter, turn it on.
                loop32
(7)     7           std ,x++    * POKE X,A:POKE X+1,B:X=X+2
(7)     14          std ,x++    * POKE X,A:POKE X+1,B:X=X+2
(3)     17          cmpx #1536  * Compare X to 1536.
(3)     20          bne loop32  * If X<>1536, GOTO loop32
                    opt noct    * Turn off counter.
(4)         rts         * RETURN

Above, the “loop32” to “bne loop32” takes 20 cycles. Each loop does four bytes, so only 128 times through the loop to clear all 512 bytes of the screen. 20 * 128 = 2560 cycles. More than double the speed of the original one byte version.

We could do 48-bits at a time by storing three times, but that math doesn’t work out since 512 is not divisible by 6 (I get 85.33333333). Perhaps we could do the loop 85 times to clear the first 510 bytes (6 * 85 = 510), then manually do one last 16-bit store to complete it. Maybe like this:

clear48
    lda #96     * A=96
    tfr a,b     * B=A (D=A*256+B)
clearA48
    ldx #1024   * X=1024 (top left of screen)
    opt ct,cc   * Clear counter, turn it on.
loop48
    std ,x++    * POKE X,A:POKE X+1,B:X=X+2
    std ,x++    * POKE X,A:POKE X+1,B:X=X+2
    std ,x++    * POKE X,A:POKE X+1,B:X=X+2
    cmpx #1536  * Compare X to 1536.
    bne loop48  * If X<>1536, GOTO loop32
    opt noct    * Turn off counter.
    std ,x      * POKE X,A:POKE X+1,B:X=X+2
    rts         * RETURN

And LWASM shows me:

        clear48
(2)         lda #96     * A=96
(4)         tfr a,b     * B=A (D=A*256+B)
        clearA48
(3)         ldx #1024   * X=1024 (top left of screen)
            opt ct,cc   * Clear counter, turn it on.
                loop48
(7)     7           std ,x++    * POKE X,A:POKE X+1,B:X=X+2
(7)     14          std ,x++    * POKE X,A:POKE X+1,B:X=X+2
(7)     21          std ,x++    * POKE X,A:POKE X+1,B:X=X+2
(3)     24          cmpx #1536  * Compare X to 1536.
(3)     27          bne loop48  * If X<>1536, GOTO loop32
                    opt noct    * Turn off counter.
(5)                 std ,x      * POKE X,A:POKE X+1,B:X=X+2
(4)         rts         * RETURN

We have jumped to 27 cycles per loop. Each loop stores 6 bytes, and it takes 85 times to get 510 bytes, plus 5 extra after it is over for the last two bytes. 27 * 85 = 2295 cycles + 5 = 2300 cycles! We are still moving in the right direction.

Just for fun, what if we did four stores, 8 bytes at a time?

clear64
    lda #96     * A=96
    tfr a,b     * B=A (D=A*256+B)
clearA64
    ldx #1024   * X=1024 (top left of screen)
    opt ct,cc   * Clear counter, turn it on.
loop64
    std ,x++    * POKE X,A:POKE X+1,B:X=X+2
    std ,x++    * POKE X,A:POKE X+1,B:X=X+2
    std ,x++    * POKE X,A:POKE X+1,B:X=X+2
    std ,x++    * POKE X,A:POKE X+1,B:X=X+2
    cmpx #1536  * Compare X to 1536.
    bne loop64  * If X<>1536, GOTO loop32
    opt noct    * Turn off counter.
    rts         * RETURN

And that gives us:

        clear64
(2)         lda #96     * A=96
(4)         tfr a,b     * B=A (D=A*256+B)
        clearA64
(3)         ldx #1024   * X=1024 (top left of screen)
            opt ct,cc   * Clear counter, turn it on.
                loop64
(7)     7           std ,x++    * POKE X,A:POKE X+1,B:X=X+2
(7)     14          std ,x++    * POKE X,A:POKE X+1,B:X=X+2
(7)     21          std ,x++    * POKE X,A:POKE X+1,B:X=X+2
(7)     28          std ,x++    * POKE X,A:POKE X+1,B:X=X+2
(3)     31          cmpx #1536  * Compare X to 1536.
(3)     34          bne loop64  * If X<>1536, GOTO loop32
                    opt noct    * Turn off counter.
(4)         rts         * RETURN

34 cycles stores 8 bytes. 64 times through the loop to do all 512 screen bytes, so 64 * 34 = 2176 cycles.

By now, I think you can see where this is going. I believe this is called “loop unrolling”, since, if you wanted the fewest cycles, you could just code 256 “std ,x++” in a row (7 * 256) for 1792 cycles which would be fast but bulky code (each std ,x++ takes two bytes, so 512 bytes just for this copy routine).

There is always some balance between code size and speed. Larger programs took longer to load from tape or disk. But, if you didn’t mind load time, and you had extra memory available, tricks like this could really speed things up.

Blast it…

I have also read about “stack blasting” where you load values in to registers and then, instead of storing each register, you set a stack pointer to the destination and just push the registers on the stack. I’ve never done that before. Let’s see if we can figure it out.

There are two stacks in the 6809 — one is the normal one used by the program (SP, I believe is the register?), and the other is the User Stack (register U). If we aren’t using it for a stack, we can use it as a 16-bit register, too.

The stack grows “up”, so if the stack pointer is 5000, and you push an 8-bit register, the pointer will move to 4999 (pointing to the most recent register pushed). If you then push a 16-bit register, it will move to 4997. This means it will have to work in reverse from our previous examples. By pointing the stack register to the end of the screen, we should be able to push registers on to the stack causing it to grow “up” to the top of the screen.

At first glance, it doesn’t look promising, since pushing D on to the user stack (U) takes more cycles than storing D at U:

(5)         std ,u

(6)         pshu d

But, it seems we make that up when pushing multiple registers since the cycle count does not grow as much as multiple stores do:

(7)         std ,U++
(7)         stx ,U++
(8)         sty ,U++
        
(10)        pshu d,x,y

I also I see that STY is one cycle longer than STD or STX. This tells me to maybe avoid using Y like this…?

It looks good, though. 22 cycles compared to 10 seems quite the win. Let me see if I can do a clear routine using the User stack pointer and three 16-bit registers. We’ll compare this to the 48-bit clear shown earlier.

clear48s
    lda #96     * A=96
clearA48s
    tfr a,b     * B=A (D=A*256+B)
    tfr d,x     * X=D
    tfr d,y     * Y=D
    ldu #1536   * U=1536 (1 past end of screen)
    opt ct,cc   * Clear counter, turn it on.
loop48s
    pshu d,x,y
    cmpu #1026  * Compare U to 1026 (two bytes from start).
    bgt loop48s * If X<>1026, GOTO loop48s. 
    opt noct    * Turn off counter.
    pshu d      * Final 2 bytes.
    rts         * RETURN

And the results are…

        clear48s
(2)         lda #96     * A=96
        clearA48s
(4)         tfr a,b     * B=A (D=A*256+B)
(4)         tfr d,x     * X=D
(4)         tfr d,y     * Y=D
(3)         ldu #1536   * U=1536 (1 past end of screen)
            opt ct,cc   * Clear counter, turn it on.
                loop48s
(10)    10          pshu d,x,y
(4)     14          cmpu #1026  * Compare U to 1026 (two bytes from start).
(3)     17          bgt loop48s * If X>1026, GOTO loop48s
                    opt noct    * Turn off counter.
(6)                 pshu d      * Final 2 bytes.
(4)         rts         * RETURN

From “loop48s” to “bgt loop48s” we end up with 17 cycles compared to 27 using the std method. 85 * 17 = 1445 cycles + 6 final cycles = 1551 cycles. It looks like using stack push/pulls might be a real nice way to do this type of thing, provided the user stack is available, of course.

Side Note: here is a fantastic writeup of this and the techniques on the 6809, as used in some unnamed CoCo 3 game back in the day: https://blog.moertel.com/posts/2013-12-14-great-old-timey-game-programming-hack.html

The fastest way to zero

But wait! There’s more…

When setting a register to zero, I have been told to use “CLR” instead of “LDx #0”. Let’s see what that is all about…

(2)         lda #0
(1)         clra

(3)         ldd #0
(2)         clrd

Ah, now know a CLRA is twice as fast as LDA #0, and CLRD is one cycle faster than LDD #0. Nice.

Other 16-bit registers such as X, Y, and U do not have a CLR op code, so LDx will be have to be used there, I suppose.

I then wondered if it made more sense to CLR a memory location, or clear a register then store that register there.

(6)         clr 1024
        
(1)         clra
(4)         sta 1024

It appears in this case, it is less cycles to clear a register then store it in memory. Interesting. And using a 16-bit value:

(3)         ldd #0
(5)         std 1024

That is one cycle faster than doing a “clra / sta 1024 / sta 1025” it seems. It is also one byte less in size, so win win.

There is a lot to learn here, and from these experiments, I’m already seeing some things are not like I would have guessed.

I hope this inspires you to play with these LWASM options and see what your code is doing. During the writing of this article, I learned how to use that User Stack, and I expect that will come in handy if I decide to do any updates to my Invaders09 game some day…

Until next time…