More CoCo MC6847 VDG chip “draw black” challenge responses.

See also: challenge, responses, and more responses.

Today Sebastian Tepper submitted a solution to the “draw black” challenge. He wrote:

I think this is much faster and avoids unnecessary SETs. Instruction 100 will do the POKE only once per character block.

– Sebastian Tepper, 7/5/2022

The routine he presented (seen in lines 100 and 101) looked like this:

10 CLS
20 FOR A=0 TO 31
30 X=A:Y=A:GOSUB 100
40 NEXT
50 GOTO 50
100 IF POINT(X,Y)<0 THEN POKE 1024+Y*16+X/2,143
101 RESET(X,Y):RETURN

It did see the criteria of the challenge, correctly drawing a diagonal line from (0,0) down to (31,31) on the screen. And, it was fast.

POINT() will return -1 if the location is not a graphics character. On the standard CLS screen, the screen is filled with character 96 — a space. (That’s the value you use to POKE to the screen, but when printing, it would be CHR$(32) instead.) His code would simply figure out which screen character contained the target pixel, and POKE it to 143 before setting the pixel.

So I immediately tried to break it. I wondered what would happen if it was setting two pixels next to each other in the same block. What would RESET do?

I added a few lines to the original test program so it drew the diagonal line in both directions PLUS draw a box (with no overlapping corners). My intent was to make it draw a horizontal line on an even pixel line, and odd pixel line, and the same for verticals. It looks like this (and the original article has been updated):

10 CLS
20 FOR A=0 TO 15

30 X=A:Y=A:GOSUB 100
31 X=15-A:Y=16+A:GOSUB 100

32 X=40+A:Y=7:GOSUB 100
33 X=40+A:Y=24:GOSUB 100

34 X=39:Y=8+A:GOSUB 100
35 X=56:Y=8+A:GOSUB 100

40 NEXT
50 GOTO 50

And this did break Sebastian’s routine… and he immediately fixed it:

100 IF POINT(X,Y)<0 THEN POKE 1024+INT(Y/2)*32+INT(X/2),143
101 RESET(X,Y):RETURN

I haven’t looked at what changed, but I see it calculates the character memory location by dividing Y by two (and making sure it’s an integer with no floating point decimals — so for 15 becomes 7 rather than 7.5), and then adds half of X. (Screen blocks are half as many as the SET/RESET pixels).

And it works. And it works well — all cases are satisfied.

And if that wasn’t enough, some optimizations came next:

And for maximum speed you could change line 100 from:

100 IF POINT(X,Y)<0 THEN POKE 1024+INT(Y/2)*32+INT(X/2),143

to:

100 IFPOINT(X,Y)<.THEN POKE&H400+INT(Y/2)*&H20+INT(X/2),&H8F

To time the difference, I added these extra lines:

15 TIMER=0

and:

45 PRINT TIMER

This lowers execution time from 188 to 163 timer units, i.e., down to 87% of the original time.

– Sebastian Tepper, 7/5/2022

Any time I see TIMER in the mix, I get giddy.

Spaces had been removed, 0 was changed to . (which BASIC will see a much faster-to-parse zero), and integer values were changed to base-16 hex values.

Also, in doing speed tests about the number format I verified that using hexadecimal numbers was more convenient only when the numbers in question have two or more digits.

– Sebastian Tepper, 7/5/2022

Awesome!

Perhaps final improvement could be to change the screen memory location from 1024/&H400 to a variable set to that value, the multiplication value of 32/&h20, as well as the 143/&H8F. Looking up a variable, if there are not too many of them in the list before the ones you’re looking up, can be even faster.

Using the timer value of 163 for our speed to beat, first I moved that extra space just to see if it mattered. No change.

Then I declared three new variables, and used DIM to put them in the order I wanted them (the A in the FOR/NEXT loop initially being the last):

11 DIM S,C,W,A
12 S=1024:W=32:C=143
...
100 IFPOINT(X,Y)<.THENPOKES+INT(Y/2)*W+INT(X/2),C
101 RESET(X,Y):RETURN

No change. I still got 163. So I moved A to the start. A is used more than any other variable, so maybe that will help:

11 DIM A,S,C,W

No change — still 163.

Are there any other optimizations we could try? Let us know in the comments.

Thank you for this contribution, Sebastian. I admire your attention to speed.

Until next time…

19 thoughts on “More CoCo MC6847 VDG chip “draw black” challenge responses.

  1. Sebastian Tepper

    I found two additional optimizations:

    POKE function truncates decimals anyway, so we can use X/2 instead of INT(X/2) . This gives and execution time of 159.

    100 IF POINT(X,Y)<. THEN POKE &H400+INT(Y/2)*&H20+X/2,&H8F

    It is possible to change the division in INT(Y/2)&H20 for (Y AND &H1E)&H10. This further lowers the execution time to 151.

    100 IF POINT(X,Y)<. THEN POKE &H400+(Y AND &H1E)*&H10+X/2,&H8F

    So, 151 is the number to beat (for now).

    Reply
    1. Allen Huffman Post author

      I’ll post another follow-up soon, with these changes. I also did some extra benchmarking, running through it 100 times and using the average. I got a slightly different number — what are you running this on? CoCo 3 BASIC is slower (more RAM hooks it jumps through), and I think having DISK BASIC is slower than just having EXTENDED and such. I wasn’t aware of that when I started writing benchmark programs.

      POINT is slow. SET is slow. I need to look at the code and see why. I guess it’s all the parsing of parameters and such.

      Reply
  2. Sebastian Tepper

    I am using a COCO 1 in XROAR with the following configuration:

    xroar.exe -machine cocous -bas bas11.rom -extbas extbas10.rom -ram 32 -kbd-translate -lp-file printer.txt

    with RS disk cartridge enabled.

    Reply
  3. Sebastian Tepper

    Previous benchmark was 151, but I realized that you can also replace the numbers in lines 30-35 with their hexadecimal equivalents–after all, they are inside a loop that runs 16 times!

    Now my benchmark is 145 timer units. In general I get small variations, like +1 or -1 unit when running the code several times. I also got 146 and 144, so I am giving the value in the middle.

    10 CLS
    15 TIMER=0
    20 FOR A=0 TO 15
    30 X=A:Y=A:GOSUB 100
    31 X=&HF-A:Y=&H10+A:GOSUB 100
    32 X=&H28+A:Y=7:GOSUB 100
    33 X=&H28+A:Y=&H18:GOSUB 100
    34 X=&H27:Y=8+A:GOSUB 100
    35 X=&H38:Y=8+A:GOSUB 100
    40 NEXT
    45 PRINT TIMER
    50 GOTO 50
    100 IF POINT(X,Y)<. THEN POKE &H400+(Y AND &H1E)*&H10+X/2,&H8F
    101 RESET(X,Y):RETURN

    Reply
  4. Sebastian Tepper

    I wondered how far can we push this with some assembly helper routine, but without leaving the BASIC environment.

    I benchmarked 82 timer units with the following code:

    10 CLS:GOSUB 1000
    15 TIMER=0
    20 FOR A=0 TO &HF
    30 X=A:Y=A:GOSUB 100
    31 X=&HF-A:Y=&H10+A:GOSUB 100
    32 X=&H28+A:Y=7:GOSUB 100
    33 X=&H28+A:Y=&H18:GOSUB 100
    34 X=&H27:Y=8+A:GOSUB 100
    35 X=&H38:Y=8+A:GOSUB 100
    40 NEXT
    45 PRINT TIMER
    50 GOTO 50
    100 Z=USR0(Y*&H100+X):RETURN
    999 ‘ENHANCED RESET(X,Y)
    1000 Z$=”BDB3ED340644C6203D8E0400308BE661543A350684015649C610544A2AFC536D842B04C48F2002E484E78439″
    1010 M=474:DEFUSR0=M ‘USE CASSETTE BUFFER AREA (44 OUT OF 256 BYTES)
    1020 FOR Z=1 TO LEN(Z$) STEP 2
    1030 POKE M,VAL(“&H”+MID$(Z$,Z,2)):M=M+1
    1040 NEXT Z
    1050 RETURN

    Some comments:

    1) For those trying to break the code :-) you can try changing CLS with CLS 4 or any other color in line 10 and it will also work as intended. On the other hand, I decided NOT to checking the sanity of Y and X coordinates, so that’s a possible way to get unexpected results and eventually mess with memory outside the video area.

    2) I based the code on the disassembly of Unravelling Color Basic, though with some changes in the register usage and avoiding external memory addresses for intermediate values. Someone more proficient than me might find additional ways to optimize the code.

    3) I tested the assembly code in the 6809.uk online assembler–great tool! I think I found a bug in its compiler, because the instructions BMI (branch on minus) and BPL (branch on plus) were assembled with their codes interchanged, which of course broke the code functionality. It took me a lot of time in EDTASM until I found that when debugging the code said branch instructions had been assembled wring (0x2A exchanged with 0x2B). After manually patching this the code worked. Something to keep in mind.

    Reply
    1. Allen Huffman Post author

      When I first learned 6809 assembly, I don’t think it dawned on me that one could pass in two 8-bit values as a 16-bit for USR0. It’s just like using A and B registers to make D. I used to POKE them in memory spots then call a routine. “Wish I knew then what I know now!”

      Reply
      1. Sebastian Tepper

        In fact you could even pass three values and implement a sort of SET(X,Y,C) function that accepts C between 0 and 8.

        This is possible if you pack all three values using 15 bits like this:

        X: 0-63 (64 values) -> 6 bits -> lower 6 bits of ACCB
        Y: 0-31 (32 values) -> 5 bits -> lower 3 bits of ACCA : upper 2 bits of ACCB
        C: 0-8 (9 values) -> 4 bits -> bits 4 to 7 of ACCA
        TOTAL: 15 bits.

        Packing function: 2048C+64Y+X

        Reply
  5. Sebastian Tepper

    This is the listing I got initially after assembling the code in 6809.uk:

    START:
    4000: BD B4 F4 JSR $B3ED
    4003: 34 06 PSHS A,B
    4005: 44 LSRA
    4006: C6 20 LDB #$20
    4008: 3D MUL
    4009: 8E 04 00 LDX #$0400
    400C: 30 8B LEAX D,X
    400E: E6 61 LDB $01,S
    4010: 54 LSRB
    4011: 3A ABX
    4012: 35 06 PULS A,B
    4014: 84 01 ANDA #$01
    4016: 56 RORB
    4017: 49 ROLA
    4018: C6 10 LDB #$10
    LOOP:
    401A: 54 LSRB
    401B: 4A DECA
    401C: [2B] FC BPL LOOP
    401E: 53 COMB
    401F: 6D 84 TST ,X
    4021: [2A] 04 BMI FINAL1
    4023: C4 8F ANDB #$8F
    4025: 20 02 BRA FINAL2
    FINAL1:
    4027: E4 84 ANDB ,X
    FINAL2:
    4029: E7 84 STB ,X
    402B: 39 RTS

    Notice [2B] and [2A] machine codes should be interchanged, which I manually corrected in the BASIC implementation.

    Reply
  6. Sebastian Tepper

    This commented version of the assembly listing might be of aid in understanding what it does:

    01DA BD B3ED 00110 START JSR $B3ED ;GET VER:HOR COORDS IN ACCA:ACCB
    01DD 34 06 00120 PSHS A,B ;SAVE VER:HOR FOR LATER USE
    01DF 44 00130 LSRA ;TWO HOR PIXELS/CHAR
    01E0 C6 20 00140 LDB #32 ;32 BYTES/ROW
    01E2 3D 00150 MUL ;GET ROW OFFSET OF CHAR POSITION
    01E3 8E 0400 00160 LDX #$400 ;SCREEN BUFFER ADDRESS
    01E6 30 8B 00170 LEAX D,X ;ADD ROW OFFSET TO SCREEN BUFFER ADDRESS
    01E8 E6 61 00180 LDB 1,S ;GET HOR COORD
    01EA 54 00190 LSRB ;TWO HOR PIXELS/CHAR
    01EB 3A 00200 ABX ;ADD HOR OFFSET TO FORM CHAR ADDRESS
    01EC 35 06 00210 PULS A,B ;GET VER:HOR COORDS IN ACCA:ACCB
    01EE 84 01 00220 ANDA #1 ;KEEP ONLY LSB OF VER COORD
    01F0 56 00230 RORB ;LSB OF HOR COORD TO CARRY FLAG
    01F1 49 00240 ROLA ;LSB OF HOR COORD TO BIT 0 OF ACCA
    01F2 C6 10 00250 LDB #$10 ;MAKE A BIT MASK-TURN ON BIT 4
    01F4 54 00260 LOOP LSRB ;SHIFT IT RIGHT ONCE
    01F5 4A 00270 DECA ;SHIFTED IT ENOUGH?
    01F6 2A FC 00280 BPL LOOP ;NOT YET
    01F8 53 00290 COMB ;COMPLEMENT TO FORM BIT MASK
    01F9 6D 84 00300 TST ,X ;CHECK IF CHAR ON SCREEN IS GRAPHIC
    01FB 2B 04 00310 BMI FINAL1 ;IT’S ALREADY A GRAPHIC CHAR
    01FD C4 8F 00320 ANDB #$8F ;FORCE ALL-GREEN GRAPHIC CHAR
    01FF 20 02 00330 BRA FINAL2 ;READY TO APPLY BIT MASK
    0201 E4 84 00340 FINAL1 ANDB ,X ;APPLY BIT MASK
    0203 E7 84 00350 FINAL2 STB ,X ;STORE MODIFIED CHARACTER
    0205 39 00360 RTS ;BACK TO BASIC

    Reply
  7. Sebastian Tepper

    The process was as follows:

    1) Copied the SET, RESET and POINT code from Color Basic Unravelled to a text editor
    2) Deleted unneeded parts of the code, rearranged, and tried to optimize register usage
    3) Copied this assembly code into the 6809 emulator window and compiled, then tested a few cases by manually changing A and B registers (I skipped the first opcode that calls INTCNV).
    4) Copied assembled code from the 6809 emulator to the text editor again and manually assembled al the hexadecimal opcodes into the Z$ string.
    5) Saved the BASIC program with the assembly loader and tried it in XROAR.
    6) The code did not work, so I loaded EDTASM cartridge in XROAR and used ZBUG to inspect the assembly code in memory, following its execution step by step
    7) At this point I realized the disassembly showed BMI where BPL was needed and viceversa.
    8) Went back to the text editor and manually patched the hex values to get the proper branch instructions and reloaded in XROAR.

    Bottom line: it seems there is a bug in the 6809 emulator, but it is a VERY powerful tool which is much easier to use than ZBUG, so I’ll keep using it to develop and test code snippets.

    Reply

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.