Benchmarking the CoCo keyboard – part 3

See also: part 1, part 2, part 3, part 4, part 5, part 6, part 7 and more (coming “soon”).

Before I get started today, I wanted to share a comment about part 2 left by Paul Fiscarelli on the Sub-Etha Software Facebook page:

Allen – one minor optimization in your assembly routine. You can remove line 130 CMPA #0. The zero flag will be set if your call to POLCAT [$A000] returns a no-keypress in the A-register, so the CMPA is redundant.

Paul Fiscarelli

Awesome! Thanks, Paul! It can be like this:

      ORG  $3F00
START LDX  #1024
LOOP  JSR  [$A002]
      CMPA #0      *REDUNDANT
      BEQ  LOOP
      STA  ,X+
      CMPX #1536
      BNE  LOOP
      BRA  START

And now back to the article…

After a few digressions, today I will finally get back to the original purpose of this article: seeing what is the fastest way to read they keyboard in Color BASIC. Specifically, reading things like arrow keys that repeat when you hold them down. This is useful for game programs where you probably want the most speed.

We start with some code from Jim McClellan that enables INKEY$ to keep reporting an arrow key as long as it is held down. Normally, INKEY$ reports one key then won’t give another until a new key is pressed (or the same key is released then re-pressed).

0 REM keyread.bas
10 CLS
20 POKE 341,255:POKE 342,255:POKE 343,255:POKE 344,255
30 I$=INKEY$:IF I$="" THEN GOTO 20
40 PRINT ASC(I$)
50 GOTO 20

The POKEs in line 20 do something that allows INKEY$ to keep reading the four arrow keys. Parsing four POKE statements every time through a loop is time consuming, so I will present a few alternatives.

It’s benchmark time!

First, a quick-and-dirty benchmark. This will reset the BASIC timer, then do those four pokes 1000 times and print the value of TIMER:

0 REM keybench.bas
10 TIMER=0
20 FORA=1TO1000
30 POKE341,255:POKE342,255:POKE343,255:POKE344,255
40 NEXT
50 PRINTTIMER

I removed unneeded spaces in line 30, and when I run this in Xroar using Color BASIC 1.1, it prints 1812.

The first optimization I did was change decimal values to HEX values:

10 TIMER=0
20 FORA=1TO1000
30 POKE&H155,&HFF:POKE&H156,&HFF:POKE&H157,&HFF:POKE&H158,&HFF
40 NEXT
50 PRINTTIMER

By changing the decimal values (341, 342, 343, 345 and 255) into HEX values, the result prints 862. This is over twice as fast! Nice.

I was curious if parsing four values was faster or slower than doing all four inside a FOR/NEXT loop, so I tried that:

10 TIMER=0
20 FORA=1TO1000
30 FORZ=&H155 TO&H158:POKEZ,&HFF:NEXT
40 NEXT
50 PRINTTIMER

And the space in the FOR command is required when typing it in by hand because the tokenizer doesn’t know when the HEX value ends and the next keyword, TO, begins. This method uses more memory since it needs an extra variable and some overhead for the FOR loop.

It also turns out to be slower. This one shows me 1148. Okay, so it’s faster to brute force through four POKEs than put them in a loop, but I expect at some point the loop is faster. (i.e., maybe it’s faster to FOR/NEXT 100 POKEs than do 100 separate POKEs… Or maybe not. Maybe some day I’ll try. But I digress…)

In my benchmarking BASIC series, I shared how using a variable can be faster than constant values. It can be much quicker to look up a variable value than parse characters and turn that into a value. I tried this:

1 V=341:W=342:X=343:Y=344:Z=255
10 TIMER=0
20 FORA=1TO1000
30 POKEV,Z:POKEW,Z:POKEX,Z:POKEY,Z
40 NEXT
50 PRINTTIMER

This uses even more memory than the FOR loop since it now takes five extra variables, but the payoff may be worth it. It prints 653! That is a third the time the original decimal version took.

However, the more variables a program uses, the longer it takes to look up variables further at the end of the variable table. You could always do this with V, W, X, Y, Z being the first variables in the list, assuming you’d look them up every time through the main program loop, but if you have other variables that need to be looked up more often, you might want those first, slowing down these… Does that make sense?

Thus, “your mileage may vary.” You can declare variables in the order they should be on the variable stack, with the ones you look up the most at the front of the list, and the ones you rarely use at the end:

DIM V,W,X,Y,Z,A

Looking at the previous example, I notice that Z (the value 255) is used four times on that line. I wonder what happens if I declare it first? I’ll just define it manually at the start of line 1:

Z=255:V=341:W=342:X=343:Y=344
10 TIMER=0
20 FORA=1TO1000
30 POKEV,Z:POKEW,Z:POKEX,Z:POKEY,Z
40 NEXT
50 PRINTTIMER

With this, the four lookups for Z should be faster. Indeed, this prints 632! Yep, changing the position of that one Z variable sped it up ever so slightly.

Does that matter? In a game, every few moments you can save in the main loop speeds the game up. Maybe it might.

My vote would be to start with the HEX version, and once the game is written, start playing with variable order and see if moving the POKE values into variables will help.

But is this the fastest was to read repeating keys in Color BASIC? Is there another way to do it that will work with all variations of Color BASIC?

Comment if you know a faster way I should look at.

Until next time…

7 thoughts on “Benchmarking the CoCo keyboard – part 3

  1. William Astle

    Generally speaking, a loop will always be slower than just completely unrolling it into the resulting statements. That’s because the loop control stuff has to be run for each iteration of the loop (variable update, comparison, etc.) and you also have the overhead of the initial loop setup. There may be some cases where you get a benefit to using a loop depending on the specific internals of the interpreter and other factors, but I can’t think of any case where unrolling a loop would be slower than a FOR loop.

    Reply
    1. MiaM

      Generally unrolling loops makes the code faster, but in Basic a loop can create more compact code which will make GOTO and GOSUB find their lines faster. (In assembler a similar thing can happen, jumps/branches can end up needing a longer addressing mode. Not sure how 6809 does, but for example 68000 and x86 have different length variations of the various jumps. For 6502 the conditional jumps are always short and when code takes up more space you can end up needing a conditional (short) jump that jumps to an absolute (long) jump.

      Reply
  2. Jim

    Does the following help? Tested on MAME emulating a coco3 – I think it should work on other systems/versions of COLOR BASIC.

    0 REM keyread.bas
    10 CLS
    15 GOSUB1000
    20 I=USR(I)
    30 I$=INKEY$:IF I$=”” THEN 20
    40 PRINT ASC(I$)
    50 GOTO 20
    999 END
    1000 DATA &H8E, &H01, &H55, &HCC, &HFF, &HFF
    1002 DATA &HED, &H81, &HED, &H81
    1004 DATA &H39
    1010 ‘Assembly code
    1011 ‘ 8E 01 55 LDX #341
    1012 ‘ CC FF FF LDD #$FFFF
    1013 ‘ ED 81 STD ,X++
    1014 ‘ ED 81 STD ,X++
    1015 ‘ 39 RTS
    1020 FOR I=32700 TO 32710:READ T$:POKE I,VAL(T$):NEXT
    1030 POKE 275,127:POKE 276,188
    1040 RETURN

    Reply
    1. Allen Huffman Post author

      A USR assembly routine that does the POKE x,255s? Nice! I’ll throw this in the mix and see what it does. I am exploring all of this for a BASIC game project I am going to be writing about. I think, ultimately, it might make sense to have some USR routines to do keyboard read, screen display, etc. It would be neat to have a toolkit of simple routines any BASIC programmer could easily use for creating similar projects.

      Reply
    2. Allen Huffman Post author

      From 653 to 187! I=USR(I) for the win! Since it’s not parsing any input variable (I), I tried using period (faster way to parse zero) thinking it would take less time than looking up a variable. That dropped it slightly to 179. Sweet!

      Reply

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.