Adding gravity to a BASIC bouncing ball, follow-up.

“Let’s build a worksheet to calculate these values.”

This is a quote that will stay with me. It comes from a video response from Paul Fiscarelli that demonstrates his solution for adding gravity to the bouncing ball demo. He suggests pre-calculating the positions so nothing has to be calculated at run-time.

In part 3 of my bouncing ball series, I discussed the idea of using DATA for the positions rather than doing math simply because it would be faster. But, the math still has to be done somewhere to generate the values for those DATA statements.

But math is hard.

Fortunately, Paul provides the math.

And, he does it with style, using real equations that calculate an object in free-fall. Impressive! His video explains how the formulas work, and then he uses an Excel spreadsheet to do all the math, and then fills it up with locations representing each Y position of the ball, all designed to perfectly fit the 32 column screen. That’s something my brute-force simulated gravity attempt did not do.

He then uses some fancy Excel formulas to make it generate all the data together as numbers separated by commas, ready to be copy-and-pasted into an editor with the rest of the BASIC program.

Screen shot from Paul Viscarellis’ YouTube video showing turning Excel spreadsheet data into BASIC DATA statements.

Well done, Paul! I’m impressed!

Your demo sent me back to my benchmarking to find out which was faster:

Array or DATA?

Paul’s recalculated data is stored in DATA statements, such as:

DATA 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,-1

During the initialization code, he loads the data up in an array, such as:

FOR I=0 TO 20:READ Y(I):NEXT

He then has all his Y positions available just by using an index counter (he re-purposes my YM Y-movement variable) that loops through them. In his example, he places a -1 at the end of the DATA statement, and uses that as a dynamic way to indicate the end of the data. When he’s retrieving his Y value from the array, he checks for that to know when he needs to start back at the beginning of the array:

YM=YM+1:Y=Y(YM):IF Y<0 THEN YM=0:Y=Y(YM)

In part 8 of my Optimizing Color BASIC series, I looked into arrays and discovered they were significantly slower than non-array variables. Surely an array is faster than the calculating that gravity equation real-time, but it it faster than just READing each item each time?

READ/RESTORE versus Arrays

I present a new benchmark… which is faster? Using READ/RESTORE, or using pre-loaded arrays? Let’s fine out!

First, the test program is going to load 20 values into an array, then in the benchmark loop it’s going to just assign the next array value to a variable and increment an index variable until it gets to the end of the data. It will then reset the index to 0 and start over. This is a scaled-down version of Paul’s demo looping through the array values.

0 REM ARRAYS.BAS
6 DIMZ(20),Y,X:FORA=0TO19:READZ(A):NEXT:X=0
5 DIM TE,TM,B,A,TT
10 FORA=0TO3:TIMER=0:TM=TIMER
20 FORB=0TO1000
40 Y=Z(X):X=X+&H1:IFX>&H14 THENX=0:GOTO40
70 NEXT
80 TE=TIMER-TM:PRINTA,TE
90 TT=TT+TE:NEXT:PRINTTT/A/60:END
100 DATA 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,-1

Running this in the Xroar emulator gives 12.65 seconds.

Next I removed the array and just do a READ each time, looking for a -1 which indicates we need to RESTORE back to the start of the data. It looks like this:

0 REM RESTORE.BAS
5 DIM TE,TM,B,A,TT
10 FORA=0TO3:TIMER=0:TM=TIMER
20 FORB=0TO1000
30 READ Z:IFZ=-1THENRESTORE:GOTO30
40 Y=Z
70 NEXT
80 TE=TIMER-TM:PRINTA,TE
90 TT=TT+TE:NEXT:PRINTTT/A/60:END
100 DATA 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,-1

This produces 10.51 seconds! A bit faster. But it’s not quite the same.

In the READ/RESTORE example, I have to keep checking for -1. In the array version, there is an index value (needed for the array) that can be checked. Is it faster to do that, rather than looking for -1?

0 REM RESTIDX.BAS
5 DIM TE,TM,B,A,TT
10 FORA=0TO3:TIMER=0:TM=TIMER
20 FORB=0TO1000
30 READ Z:X=X+&H1:IFX>&H14 THENRESTORE:X=&H0:GOTO30
40 Y=Z
70 NEXT
80 TE=TIMER-TM:PRINTA,TE
90 TT=TT+TE:NEXT:PRINTTT/A/60:END
100 DATA 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,-1

Yipes! The one slows down to 14.15 seconds. It seems incrementing a variable, comparing it, then resetting it adds a lot of overhead. It seems just checking for -1 is the faster way.

Let’s make it a tad bit faster.

In part 6 of my BASIC series, I looked at using READ on integers, hex, and strings. I found HEX to be almost twice as fast. I’ll also replace the -1 with some HEX value we won’t be using. Let’s try that here:

0 REM DATAHEX.BAS
10 FORA=0TO3:TIMER=0:TM=TIMER
20 FORB=0TO1000
30 READ Z:IFZ=&HFF THENRESTORE:GOTO30
40 Y=Z
70 NEXT
80 TE=TIMER-TM:PRINTA,TE
90 TT=TT+TE:NEXT:PRINTTT/A/60:END
100 DATA &1,&H2,&H3,&H4,&H5,&H6,&H7,&H8,&H9,&HA,&HB,&HC,&HD,&HE,&HF,&H10,&H11,&H12,&H13,&H14,&HFF

9.27 seconds!!!

It looks like using Paul’s technique to pre-calculate all the gravity values, and then reading them dynamically from DATA statements in the demo, should give us a great speed boost even over my simple attempt to do things like “Y=Y+.25”.

Thanks, Paul, for providing us with a way to do great “real” gravity without having to do a bit of math…on the CoCo. Hopefully I can dissect those equations and replicate it.

Paul’s Video

This is the first time I have ever enjoyed watching a presentation on Excel. I hope you enjoy it, too. Here’s Paul’s video:

Until next time…

Jerry Stratton’s SuperBASIC for the CoCo

Floating point and 902.1 follow-up

1 Reply

Yesterday, I wrote a short bit about how I wasted a work day trying to figure out why we would tell our hardware to go to “902.1 Mhz” but it would not.

The root cause was a limitation in the floating point used in the C program I was working with. A float cannot represent every value, and it turns out values we were using were some of them. I showed a sample like this:

#include <stdio.h>
#include <stdlib.h>
 
int main()
{
   float f = 902.1;
   printf ("float 902.1  = %f\r\n", f);
 
   double d = 902.1;
   printf ("double 902.1 = %f\r\n", d);
 
   return EXIT_SUCCESS;
}

The float representation of 902.1 would display as 902.099976, and this caused a problem when we multiplied it by one million to turn it into hertz and send it to the hardware.

I played WHAC-A-MOLE trying to find all that places in the code that needed to change floats to doubles, and then played more WHAC-A-MOLE to update the user interface to change fields that use floats to use doubles…

Then, I realized we don’t actually care about precision past one decimal point. Here is the workaround I decided to go with, which means I can stop whacking moles.

float MHz;
uint32_t Hertz;
uint32_t Hertz2;
for (MHz=902.0; MHz < 902.9; MHz+=.1)
{
    Hertz = (MHz * 1000000);

    // Round to nearest 10000th of Hertz.
    Hertz2 = (((Hertz + 50000) / 100000)) * 100000;

    printf ("%f * 1000000 = %u -> %u\n", MHz, Hertz, Hertz2);
}

I kept the existing conversion (which would be off by quite a bit after being multiplied by one million) and then did a quick-and-dirty hack to round that Hertz value to the closest decimal point.

Here are the results, showing the MHz float, the converted Hertz, and the new rounded Hertz we actually use:

902.000000 * 1000000 = 902000000 -> 902000000
902.099976 * 1000000 = 902099975 -> 902100000
902.199951 * 1000000 = 902199951 -> 902200000
902.299927 * 1000000 = 902299926 -> 902300000
902.399902 * 1000000 = 902399902 -> 902400000
902.499878 * 1000000 = 902499877 -> 902500000
902.599854 * 1000000 = 902599853 -> 902600000
902.699829 * 1000000 = 902699829 -> 902700000
902.799805 * 1000000 = 902799804 -> 902800000
902.899780 * 1000000 = 902899780 -> 902900000

There. Much nicer. It’s not perfect, but it’s better.

Now I can get back to my real project.

Until next time…

VIC-20 Super Expander and the CHUG logo

1 Reply

My first computer was a Commodore VIC-20. I used it to do TV titles for my dad’s fishing videos (he shot and edited video that would run at trade shows and such). I can’t find the tape of those VIC-20 graphics, but I did find this:

This was my version of the CHUG (Commodore Houston User’s Group) logo, done on the VIC-20 Super Expander cartridge. That cartridge added 3K of extra RAM, and had a ROM that gave new commands to do things like draw lines, play music, etc.

I also found a few other things, but I don’t think I had anything to do with them. (Unless I did the face graphic. That one, and as second version of it I found later, seem very familiar.)

The face one would draw the circles around the eyes, then un-draw them, over and over. Not really “blinking” but…

My quest for recovering my early VIC-20 games continues…

Floating point and 902.1

5 Replies

I wasted most of my work day trying to figure out why some hardware was not going to the proper frequency. In my case, it looked fine going to 902.0 mHz, but failed on various odd values such as 902.1 mHz, 902.3 mHz, etc. But, I was told, there was an internal “frequency sweep” function that went through all those frequencies and it worked fine.

I finally ended up looking at the difference between what our host system sent (“Go to frequency X”) and what the internal function was doing (“Scan from frequency X to frequency Y”).

Then I saw it.

The frequency was being represented in hertz as a large 32-bit value, such as 902000000 for 902 mHz, or 902100000 for 902.1 mHz. The host program was taking its 902.1 floating point value and converting it to a 32-bit integer by multiplying that by 1,000,000… which resulted it it sending 902099975… which was then fed into some formula and resulted in enough drift due to being slightly off that the end results was also off.

902099975 is not what I expected from “multiply 902.1 by 1,000,000”.

I keep forgetting how bad floating point is. Try this:

#include <stdio.h>
#include <stdlib.h>
int main()
{
   float f = 902.1;
   printf ("float 902.1  = %f\r\n", f);
   double d = 902.1;
   printf ("double 902.1 = %f\r\n", d);
   return EXIT_SUCCESS;
}

It prints:

float 902.1 = 902.099976
double 902.1 = 902.100000

A double precision floating point can correctly represent 902.1, but a single precision float cannot.

The Windows GUI was correctly showing “902.1”, though, probably because it was taking the actual value and rounding it to one decimal place. Thank you, GUI, for hiding the problem.

I guess now I have to go through and change all those floats to doubles so the user gets what the user wants.

Until next time…

Hitchhiker’s Guide to the Galaxy is 42 years old?

Color BASIC optimization challenge – scaling

18 Replies

Prerequisite: Optimizing Color BASIC series

Here is a simple Color BASIC program that will scale blue box on the screen from small to large then back, going through the scaling processes 100 times.

0 REM scale.bas
10 SW=32/4 ' SCALE WIDTH
20 SH=16/3 ' SCALE HEIGHT
30 SM=.1   ' SCALE INC/DEC
40 S=.5    ' SCALE FACTOR
70 TM=TIMER:FOR Z=1 TO 100
80 W=INT(SW*S)
90 H=INT(SH*S)
100 P=15-INT(W/2)+(7-INT(H/2))*32
110 CLS
120 FOR A=1 TO H
130 PRINT@P+A*32,STRING$(W,175)
140 NEXT A
150 S=S+SM
160 IF H<1 OR H>15 THEN SM=-SM:S=S+(SM*2)
170 NEXT Z
180 ' 60=NTSC 50=PAL
190 PRINT:PRINT (TIMER-TM)/60;"SECONDS"

After this runs, it will report the approximate number of seconds it took. It does this by resetting the TIMER at the start, then printing the current TIMER value divided by 60 (since the CoCo timer is based on the NTSC video interrupt that happens 60 times a second).

NOTE: If you run this on a PAL system, you will need to change the 60 to a 50 in line 190. (edit: thanks, George P., for catching my typo.)

On the Xroar emulator running on my Mac it reports 25.25 seconds.

Your challenge, should you decide to accept it, is to take this code and make it run faster.

Rules

You must leave the basic algorithm intact (the SW, SH, S and SH stuff with all the math). You can rename variables, change the representation of values, speed up PRINTing, etc. but the core program flow should remain the same.
For bonus points, you are welcome to rewrite the program (in BASIC) to improve upon the algorithm in any way that makes sense, provided it achieves the same results (including the 1 to 100 benchmark loop).

There are some very (very!) simple things that can be done to dramatically improve the speed to his code.

Feel free to share your efforts in the comments. If you post your code, be sure to post the resulting time, too.

Good luck!

Adding gravity to a BASIC bouncing ball.

3 Replies

Using this Color BASIC code as a reference:

0 REM gravity.bas
10 CLS
20 X=1:Y=1:XM=1:YM=1
30 PRINT@P," ";:P=X+Y*&H20:PRINT@P,"O";
50 X=X+XM:IF X<&H1 OR X>&H1E THEN XM=-XM
60 Y=Y+YM:IF Y<&H1 OR Y>&HE THEN YM=-YM
80 GOTO 30

…how would you add simulated gravity to the bounce? When I was a teen, I did this on my CoCo 3. I forget how I did it, but here is what I tried tonight:

0 REM gravity.bas
10 CLS
20 X=1:Y=1:XM=1:YM=.25
30 PRINT@P," ";:P=X+INT(Y)*&H20:PRINT@P,"O";
50 X=X+XM:IF X<&H1 OR X>&H1E THEN XM=-XM
60 Y=Y+YM:IF Y<&H1 OR Y>&HE THEN YM=-YM:Y=Y+YM
70 YM=YM+.25
80 GOTO 30

But on a text screen, the “jump” is large enough when it’s near the bottom that it never actually hits the bottom of the screen. In the CoCo 3’s high-resolution screen, this wasn’t an issue. With only 16 horizontal positions, it’s quite limited.

I’m sure there’s a real clever way to do this. Any thoughts?

CoCo bouncing BASIC ball, part 5

4 Replies

See also: part 1, part 2, part 3, part 4 and part 5.

It seems any time I touch BASIC these days, it turns into a benchmarking session to see if I can do something faster.My Jim Gerrie-inspired bouncing ball program has taken quite a tangent, and today it not going to change that.

MC-10 has its advantages

As previously discussed, PRINT@ seems to be the fastest way to put characters on the screen. But what if you want something that’s not just text? In Jim’s MC-10 demo, he uses the semi graphics characters in his ball. The MC-10’s BASIC allowed you to type those characters similarly to how Commodore computers let you type in their PETASCII characters. The excellent MC-10 Javascript Emulator has this image showing the keyboard layout:

MC-10 keyboard layout (image from MC-10.com).

If you look at the keys, you will see some contain graphics blocks next to the letter (Q and a solid block, F and checkerboard, etc.). You can generate them with SHIFT-Letter. You also see some keywords above the keys which you could generate by doing CONTROL-Letter. This let them type in graphics characters in a PRINT statement:

MC-10.com emulator showing how to “type” demographics characters.

Advantage MC-10. We have no way to do that on the CoCo.

PRINT CHR$()

So how do we print the graphics characters? We use CHR$() which will print whatever character we tell it to. For example, letter “A” is ASCII 65. We could type:

PRINT CHR$(65)

…and it would print the letter A.

Our graphics characters start at 128 and go to 255, looping the same basic shapes through the 8 available colors (color + black). We can see them all by typing:

FOR A=128 TO 255:PRINT CHR$(A);:NEXT A

Using PRINT CHR$() to print the CoCo semi graphics characters.

If I knew which characters to use to draw a ball, I could print them using CHR$(). Unfortunately, I don’t. I have no idea where my old CoCo “quick guide” is from the 1980s that listed them all. Fortunately, Simon Jonassen has a website that lets us design semi graphics:

http://cocobotomy.roust-it.dk/sgedit/

Using my previous text ball for reference, I want to make a semi graphics ball that is 10×7 characters (so it appears round on the 32×16 4:3 aspect ratio display…). Using Simon’s tool, I came up with this:

It’s not a great ball, but it gives me something to start with.

On the bottom right of this web page are buttons to spit out the assembly, BASIC or CSV “code” to display this. But, it’s the whole screen, and looks like this:

10 CLEAR2000:DIMT,A:CLS
20 FORT=1024TO1535:READA:POKET,A:NEXT
100 A$=INKEY$:IFA$="" THEN100
1000 DATA 128,161,166,172,172,172,172,169,162,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128
1010 DATA 161,168,128,128,128,128,128,128,164,162,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128
1020 DATA 170,128,128,128,128,128,128,128,128,165,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128
1030 DATA 170,128,128,128,128,128,128,128,128,165,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128
1040 DATA 170,128,128,128,128,128,128,128,128,165,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128
1050 DATA 164,162,128,128,128,128,128,128,161,168,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128
1060 DATA 128,164,169,163,163,163,163,166,168,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128
1070 DATA 128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128
1080 DATA 128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128
1090 DATA 128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128
1100 DATA 128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128
1110 DATA 128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128
1120 DATA 128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128
1130 DATA 128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128
1140 DATA 128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128
1150 DATA 128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128

That program, when ran, would draw the VDG semi graphics screen. But I only want the ball at the top left, so I should be able to find the values for that.

Side Note: Hey, Simon! I don’t think the CLEAR 2000 is necessary. That’s only used for strings. And since you aren’t using A$ for anything (you don’t DIM it either, I notice), you could do 100 IF INKEY$=”” THEN 100 instead and eliminate that variable. Also, generating the values as HEX would make it draw the screen faster. (Heh, force-of-habit when I write these articles. Simon is one of the most amazing CoCo programmers out there, and in one of my earlier articles, he contributed enhancements to my attempts at assembly code. This is about as “helpful” as I could ever be for someone as talented as Simon.)

There seems to be 16 DATA statements, each containing 32 values. Thus, the first seven DATAs look to be the first seven lines of the screen, and the first 10 values of each of those should be the 10 values for my ball. This gives me the following values:

1000 DATA 128,161,166,172,172,172,172,169,162,128
1010 DATA 161,168,128,128,128,128,128,128,164,162
1020 DATA 170,128,128,128,128,128,128,128,128,165
1030 DATA 170,128,128,128,128,128,128,128,128,165
1040 DATA 170,128,128,128,128,128,128,128,128,165
1050 DATA 164,162,128,128,128,128,128,128,161,168
1060 DATA 128,164,169,163,163,163,163,166,168,128

Those are the numbers I’d use to PRINT CHR$() that ball. I’ll first try it like this:

30 PRINT@P+0,CHR$(128);CHR$(161);CHR$(166);CHR$(172);CHR$(172);CHR$(172);CHR$(172);CHR$(169);CHR$(162);CHR$(128);
31 PRINT@P+32,CHR$(161);CHR$(168);CHR$(128);CHR$(128);CHR$(128);CHR$(128);CHR$(128);CHR$(128);CHR$(164);CHR$(162);
32 PRINT@P+64,CHR$(170);CHR$(128);CHR$(128);CHR$(128);CHR$(128);CHR$(128);CHR$(128);CHR$(128);CHR$(128);CHR$(165);
33 PRINT@P+96,CHR$(170);CHR$(128);CHR$(128);CHR$(128);CHR$(128);CHR$(128);CHR$(128);CHR$(128);CHR$(128);CHR$(165);
34 PRINT@P+128,CHR$(170);CHR$(128);CHR$(128);CHR$(128);CHR$(128);CHR$(128);CHR$(128);CHR$(128);CHR$(128);CHR$(165);
35 PRINT@P+160,CHR$(164);CHR$(162);CHR$(128);CHR$(128);CHR$(128);CHR$(128);CHR$(128);CHR$(128);CHR$(161);CHR$(168);
36 PRINT@P+192,CHR$(128);CHR$(164);CHR$(169);CHR$(163);CHR$(163);CHR$(163);CHR$(163);CHR$(166);CHR$(168);CHR$(128);

Using P as the starting PRINT@ (0 for the top left of the screen), each line will print the ten CHR$() values of the ball, then the next line will print at “P+32”, making it the next line down, and so on.

0 REM BALLVDG.BAS
5 DIM TE,TM,B,A,TT
10 FORA=0TO3:TIMER=0:TM=TIMER
20 FORB=0TO1000
30 PRINT@P+0,CHR$(128);CHR$(161);CHR$(166);CHR$(172);CHR$(172);CHR$(172);CHR$(172);CHR$(169);CHR$(162);CHR$(128);
31 PRINT@P+32,CHR$(161);CHR$(168);CHR$(128);CHR$(128);CHR$(128);CHR$(128);CHR$(128);CHR$(128);CHR$(164);CHR$(162);
32 PRINT@P+64,CHR$(170);CHR$(128);CHR$(128);CHR$(128);CHR$(128);CHR$(128);CHR$(128);CHR$(128);CHR$(128);CHR$(165);
33 PRINT@P+96,CHR$(170);CHR$(128);CHR$(128);CHR$(128);CHR$(128);CHR$(128);CHR$(128);CHR$(128);CHR$(128);CHR$(165);
34 PRINT@P+128,CHR$(170);CHR$(128);CHR$(128);CHR$(128);CHR$(128);CHR$(128);CHR$(128);CHR$(128);CHR$(128);CHR$(165);
35 PRINT@P+160,CHR$(164);CHR$(162);CHR$(128);CHR$(128);CHR$(128);CHR$(128);CHR$(128);CHR$(128);CHR$(161);CHR$(168);
36 PRINT@P+192,CHR$(128);CHR$(164);CHR$(169);CHR$(163);CHR$(163);CHR$(163);CHR$(163);CHR$(166);CHR$(168);CHR$(128);
70 NEXT
80 TE=TIMER-TM:PRINTA,TE
90 TT=TT+TE:NEXT:PRINTTT/A:END

When this runs, you can SEE it PRINT the ball character-by-character! This is very slow. My benchmark took “forever” to run, reporting 26133 (at 60 ticks per second, that’s over 7 minutes to draw that 1001 times). If you take 26133 / 1001 (I really need to change that loop to be 0 to 999) you get 26.10 “ticks” per time. Divide that by 60 (per tick) you get .43. So it’s taking almost half a second to draw seven lines of ten characters each using CHR$() for each character. (Plus overhead of PRINT@ and such).

We need our ball to bounce faster than that.

We have discussed how switching from decimal to hex will speed things up, so let’s try that:

30 PRINT@P+&H0,CHR$(&H80);CHR$(&HA1);CHR$(&HA6);CHR$(&HAC);CHR$(&HAC);CHR$(&HAC);CHR$(&HAC);CHR$(&HA9);CHR$(&HA2);CHR$(&H80);
31 PRINT@P+&H20,CHR$(&HA1);CHR$(&HA8);CHR$(&H80);CHR$(&H80);CHR$(&H80);CHR$(&H80);CHR$(&H80);CHR$(&H80);CHR$(&HA4);CHR$(&HA2);
32 PRINT@P+&H40,CHR$(&HAA);CHR$(&H80);CHR$(&H80);CHR$(&H80);CHR$(&H80);CHR$(&H80);CHR$(&H80);CHR$(&H80);CHR$(&H80);CHR$(&HA5);
33 PRINT@P+&H60,CHR$(&HAA);CHR$(&H80);CHR$(&H80);CHR$(&H80);CHR$(&H80);CHR$(&H80);CHR$(&H80);CHR$(&H80);CHR$(&H80);CHR$(&HA5);
34 PRINT@P+&H80,CHR$(&HAA);CHR$(&H80);CHR$(&H80);CHR$(&H80);CHR$(&H80);CHR$(&H80);CHR$(&H80);CHR$(&H80);CHR$(&H80);CHR$(&HA5);
35 PRINT@P+&HA0,CHR$(&HA4);CHR$(&HA2);CHR$(&H80);CHR$(&H80);CHR$(&H80);CHR$(&H80);CHR$(&H80);CHR$(&H80);CHR$(&HA1);CHR$(&HA8);
36 PRINT@P+&HC0,CHR$(&H80);CHR$(&HA4);CHR$(&HA9);CHR$(&HA3);CHR$(&HA3);CHR$(&HA3);CHR$(&HA3);CHR$(&HA6);CHR$(&HA8);CHR$(&H80);

This looks a tiny bit faster. The benchmark reports 14790 (which breaks down to .24 seconds each time). That’s almost twice as fast (well, .24 to .44) but still not fast enough.

The only other thing we could do would be to remove all the in-between semicolons, since they aren’t actually needed to print the characters side-by-side (except for the last one, since we don’t want it to clear the rest of the screen line):

30 PRINT@P+&H0,CHR$(&H80)CHR$(&HA1)CHR$(&HA6)CHR$(&HAC)CHR$(&HAC)CHR$(&HAC)CHR$(&HAC)CHR$(&HA9)CHR$(&HA2)CHR$(&H80);
31 PRINT@P+&H20,CHR$(&HA1)CHR$(&HA8)CHR$(&H80)CHR$(&H80)CHR$(&H80)CHR$(&H80)CHR$(&H80)CHR$(&H80)CHR$(&HA4)CHR$(&HA2);
32 PRINT@P+&H40,CHR$(&HAA)CHR$(&H80)CHR$(&H80)CHR$(&H80)CHR$(&H80)CHR$(&H80)CHR$(&H80)CHR$(&H80)CHR$(&H80)CHR$(&HA5);
33 PRINT@P+&H60,CHR$(&HAA)CHR$(&H80)CHR$(&H80)CHR$(&H80)CHR$(&H80)CHR$(&H80)CHR$(&H80)CHR$(&H80)CHR$(&H80)CHR$(&HA5);
34 PRINT@P+&H80,CHR$(&HAA)CHR$(&H80)CHR$(&H80)CHR$(&H80)CHR$(&H80)CHR$(&H80)CHR$(&H80)CHR$(&H80)CHR$(&H80)CHR$(&HA5);
35 PRINT@P+&HA0,CHR$(&HA4)CHR$(&HA2)CHR$(&H80)CHR$(&H80)CHR$(&H80)CHR$(&H80)CHR$(&H80)CHR$(&H80)CHR$(&HA1)CHR$(&HA8);
36 PRINT@P+&HC0,CHR$(&H80)CHR$(&HA4)CHR$(&HA9)CHR$(&HA3)CHR$(&HA3)CHR$(&HA3)CHR$(&HA3)CHR$(&HA6)CHR$(&HA8)CHR$(&H80);

This removes the extra time it takes BASIC to parse (and skip) NINE semicolons on each line (times ten lines). That adds up, but removing them only increases the benchmark to 14542 — barely measurable.

I don’t see a faster way to do this using PRINT CHR$() over and over and over and over again.

Definition of insanity…

To avoid all the time it takes for BASIC to parse each CHR$() over and over and over and over, we can do that just once and store the result in a string and print that string instead. I’ll use one-letter string names, unique for each line, for speed. It would be “easier” to use an array (like BL$(7) or something) but I’ve previously explored that and found array access to be slower.

Since this is a demo, we’ll do it the silly way, like this:

A$=CHR$(&H80)+CHR$(&HA1)+CHR$(&HA6)+CHR$(&HAC)+CHR$(&HAC)+CHR$(&HAC)+CHR$(&HAC)+CHR$(&HA9)+CHR$(&HA2)+CHR$(&H80)

B$=CHR$(&HA1)+CHR$(&HA8)+CHR$(&H80)+CHR$(&H80)+CHR$(&H80)+CHR$(&H80)+CHR$(&H80)+CHR$(&H80)+CHR$(&HA4)+CHR$(&HA2)

C$=CHR$(&HAA)+CHR$(&H80)+CHR$(&H80)+CHR$(&H80)+CHR$(&H80)+CHR$(&H80)+CHR$(&H80)+CHR$(&H80)+CHR$(&H80)+CHR$(&HA5)

D$=CHR$(&HAA)+CHR$(&H80)+CHR$(&H80)+CHR$(&H80)+CHR$(&H80)+CHR$(&H80)+CHR$(&H80)+CHR$(&H80)+CHR$(&H80)+CHR$(&HA5)

E$=CHR$(&HAA)+CHR$(&H80)+CHR$(&H80)+CHR$(&H80)+CHR$(&H80)+CHR$(&H80)+CHR$(&H80)+CHR$(&H80)+CHR$(&H80)+CHR$(&HA5)

F$=CHR$(&HA4)+CHR$(&HA2)+CHR$(&H80)+CHR$(&H80)+CHR$(&H80)+CHR$(&H80)+CHR$(&H80)+CHR$(&H80)+CHR$(&HA1)+CHR$(&HA8)

G$=CHR$(&H80)+CHR$(&HA4)+CHR$(&HA9)+CHR$(&HA3)+CHR$(&HA3)+CHR$(&HA3)+CHR$(&HA3)+CHR$(&HA6)+CHR$(&HA8)+CHR$(&H80)

Now we can just print those strings and, instead of BASIC having to parse and generate seventy CHR$()s each time, it will just have to look up and print seven strings. This should be much faster!

I made some room in my BENCH.BAS program to fit this strings in at the top, and it now looks like this:

0 REM BALLVDG3.BAS
1 DIM TE,TM,B,A,TT
2 A$=CHR$(&H80)+CHR$(&HA1)+CHR$(&HA6)+CHR$(&HAC)+CHR$(&HAC)+CHR$(&HAC)+CHR$(&HAC)+CHR$(&HA9)+CHR$(&HA2)+CHR$(&H80)
3 B$=CHR$(&HA1)+CHR$(&HA8)+CHR$(&H80)+CHR$(&H80)+CHR$(&H80)+CHR$(&H80)+CHR$(&H80)+CHR$(&H80)+CHR$(&HA4)+CHR$(&HA2)
4 C$=CHR$(&HAA)+CHR$(&H80)+CHR$(&H80)+CHR$(&H80)+CHR$(&H80)+CHR$(&H80)+CHR$(&H80)+CHR$(&H80)+CHR$(&H80)+CHR$(&HA5)
5 D$=CHR$(&HAA)+CHR$(&H80)+CHR$(&H80)+CHR$(&H80)+CHR$(&H80)+CHR$(&H80)+CHR$(&H80)+CHR$(&H80)+CHR$(&H80)+CHR$(&HA5)
6 E$=CHR$(&HAA)+CHR$(&H80)+CHR$(&H80)+CHR$(&H80)+CHR$(&H80)+CHR$(&H80)+CHR$(&H80)+CHR$(&H80)+CHR$(&H80)+CHR$(&HA5)
7 F$=CHR$(&HA4)+CHR$(&HA2)+CHR$(&H80)+CHR$(&H80)+CHR$(&H80)+CHR$(&H80)+CHR$(&H80)+CHR$(&H80)+CHR$(&HA1)+CHR$(&HA8)
8 G$=CHR$(&H80)+CHR$(&HA4)+CHR$(&HA9)+CHR$(&HA3)+CHR$(&HA3)+CHR$(&HA3)+CHR$(&HA3)+CHR$(&HA6)+CHR$(&HA8)+CHR$(&H80)

10 FORA=0TO3:TIMER=0:TM=TIMER
20 FORB=0TO1000

30 PRINT@P+&H0,A$;
31 PRINT@P+&H20,B$;
32 PRINT@P+&H40,C$;
33 PRINT@P+&H60,D$;
34 PRINT@P+&H80,E$;
35 PRINT@P+&HA0,F$;
36 PRINT@P+&HC0,G$;

70 NEXT
80 TE=TIMER-TM:PRINTA,TE
90 TT=TT+TE:NEXT:PRINTTT/A:END

The benchmark now reports 3200! That’s a huge improvement from the original 26133. I think this is about as fast as we can get… at least using strings.

But, for this demo, I wanted to have several frames of animation. If I make a bunch of strings, that’s going to slow things down since there will be more strings to search through. Also, I’ll run out of single-character variables names and might have to do things like A1$, A2$, A3$, etc. for the first frame, and B1$, B2$, B3$, etc. for the second frame, and so on. While this would STILL be faster than doing a bunch of PRINT CHR$(), and I expect still faster than using arrays like F$(0) through F$(6).

If I was less impatient, I’d test that, and even try a double dimension array like F$(frame, line) which would be really easy and look really nice … but would probably be be slower than anything but PRINT CHR$()s…

Note to Self: Write an article about using multi-dimensioned arrays to do simple animation in BASIC.

But I’m impatient, so now let’s circle back to the start of this post where I mentioned the MC-10 being able to embed characters in its PRINT statements directly without needing variables.

We can’t type those graphics characters on the CoCo, but if we don’t mind cheating a bit, we can modify the contents of a program so that the quoted string contains special graphic characters. It makes lines that do that impossible to edit, and print (on most printers), but if it’s just a demo (or some program you write for people to JUST run and not mess with), it works.

Self-modifying BASIC code

Suppose we had a print statement like this:

10 PRINT "**********";

…and we wanted to change those ten asterisks to be a graphics character. If we knew where they were located in BASIC memory, we could POKE the values and change them from a 42 (ASCII for ‘*’) to something else, like a 128 for the solid black block graphics character.

As a kid, I did this in a pretty brute-force way, blindly PEEKing through program memory and changing values to what I wanted them to be. This sometimes had dangerous side effects if the value I was PEEKing for (like 42) appeared as part of a BASIC keyword token or something not inside the quoted string.

So, let’s try to be a bit smarter and scan through the program memory but only look for values between quotes.

If I recall, memory locations 25 and 26 contain the start of the BASIC program. You can get that address like this:

PRINT PEEK(25)*256+PEEK(26)

The end of the program is at 27 and 28:

PRINT PEEK(27)*256+PEEK(28)

It should be super easy (barely an inconvenience) to make a program that PEEKs memory between that range and looks for a 34 (the quote character) and then will start substituting our source character (42, the asterisk) for our target character (128, a black block).

10 PRINT "**********"
20 END
100 ST=PEEK(25)*256+PEEK(26)
110 EN=PEEK(27)*256+PEEK(28)
120 QF=0 'QOUTE FOUND FLAG
130 FOR A=ST TO EN
140 IF PEEK(A)=32 THEN IF QF=1 THEN QF=0 ELSE QF=1
150 IF QF=1 THEN IF PEEK(A)=42 THEN POKE A,128
160 NEXT

If you load this program and RUN it, it will print a line of ten asterisks.

Then if you RUN 100, it will do the search/replace looking for asterisks between quotes and changing them to 128s (black block).

One that is done, RUN again and you see it now prints a row of ten black blocks!

From this point on, that line 10 has been forever changed. If you LIST it, you will see weirdness. In this case, line 10 says:

10 PRINT "FORFORFORFORFORFORFORFORFORFOR"

…because apparently that byte of 128 is the token for the keyword FOR.

If you try to EDIT LINE 10, BASIC turns the tokens back into ASCII text for you to edit. Thus, as soon as you edit, you are changing that line to say “FORFORFORFOR…” instead of the graphics characters.

Thus, don’t edit a line after you do this trick!

I think this is a good stopping point for today. My goal here is to switch from my ball print ASCII “X” characters to graphics blocks, and retain the speed of raw PRINTs rather than using a ton of variables that have to be looked up each time — and the more variables, the slower that lookup gets.

But there’s more ASCII fun to be had before I start doing this.

To be continued…

CoCo bouncing BASIC ball, part 4

2 Replies

See also: part 1, part 2, part 3, part 4 and part 5.

Since I like to jump around, let’s do just that.

CoCo cross development revisited

Awhile ago, I posted about Microsoft Visual Studio Code and CoCo cross development. I just remembered this was a thing, so I have been doing some experiments tonight using the Xroar emulator and Visual Studio Code to quickly type up BASIC code and then load it into the emulator via an ASCII file (simulating a cassette tape).

My process is this:

With the Color BASIC extension for Visual Studio Code installed, I type up my BASIC program and save it out to a file called “ASCII.BAS”. (The filename doesn’t matter.)
In Xroar, I select Load (from the cassette menu, or Apple-L on my Mac) and browse to this ascii.bas file I just saved.
In Xroar, I type CLOAD and watch it quickly load in my text file as if it was loaded as an ASCII basic program from tape.
I type RUN and see if it worked…

This let’s me write code quickly on my Mac, and then test it out on the emulated CoCo without too much effort.

With that out of the way, let’s return to discussing this bouncing ball project…

Discussing the ball project

My earlier experiments show that the fastest way to print a block of characters on the screen is to calculate the position and then use PRINT@. For example, here is a 10 x 7 block of text that sorta looks like a ball. With variable P being the top left corner to PRINT@ to, I just add offset values to get to each line. I use hex values because that’s faster than using decimal:

100 REM BALL
110 PRINT@P+&H00,"  XXXXXX  ";
120 PRINT@P+&H20," X      X ";
130 PRINT@P+&H40,"X        X";
140 PRINT@P+&H60,"X        X";
150 PRINT@P+&H80,"X        X";
160 PRINT@P+&HA0," X      X ";
170 PRINT@P+&HC0,"  XXXXXX  ";

I can adjust the location of P and have this print the ball anywhere I want on the screen. But, since it’s a ball, it doesn’t really make sense for me to print those empty corners, so I removed them and adjusted the offsets:

100 REM BALL
110 PRINT@P+&H02,  "XXXXXX";
120 PRINT@P+&H21, "X       X";
130 PRINT@P+&H40,"X         X";
140 PRINT@P+&H60,"X         X";
150 PRINT@P+&H80,"X         X";
160 PRINT@P+&HA1, "X      X";
170 PRINT@P+&HC2,  "XXXXXX";

Just so I could see the ball better, I added spaces before the string quotes in lines 110, 120, 160 and 170. To make it faster, I’d remove those, and put all these PRINTs on one line (if it fits). Every little bit helps, but we’ll optimize for speed later.

Now, Jim Gerrie’s demo had different frames of animation which he did using an array of strings. But, printing arrays is slower (since it has to look up the values each time). I decided I’d try raw PRINTs and make each “frame” of the ball be a subroutine. It takes time to GOSUB to that routine, but it will RETURN quickly.

I could then use a variable to represent which frame to print, and use ON/GOSUB to get to it (at the overhead of teaching forward in the program to find that line number).

40 ON F GOSUB 100,200,300,400,500,600,700

We’d need to benchmark to see if searching the array is faster than searching line numbers. (Since each array string would have to be looked up, versus one search for a line number, I expect the GOSUB approach will be faster unless the program is huge and it has to search through tons of lines.)

Now I can do my X and Y movement calculating, conversion that to a PRINT@ location, and then GOSUB to the appropriate frame routine to display it.

To erase the ball, I could just clear the entire screen (CLS), or I could make a subroutine that just PRINTs over the old ball:

1000 REM ERASE
1100 PRINT@P+&H02,  "      ";
1200 PRINT@P+&H21, "        ";
1300 PRINT@P+&H40,"          ";
1400 PRINT@P+&H60,"          ";
1500 PRINT@P+&H80,"          ";
1600 PRINT@P+&HA1, "        ";
1700 PRINT@P+&HC2,  "      ";
1800 RETURN

This should greatly reduce the amount of flicker.

Let’s see what the program looks like now:

10 CLS
20 X=0:Y=0:XM=1:YM=1:F=1:FM=1
30 GOSUB 1000:P=X+Y*&H20
40 ON F GOSUB 100,200,300,400,500,600,700
50 X=X+XM:IF X<&H1 OR X>&H15 THEN XM=-XM:FM=-FM
60 Y=Y+YM:IF Y<&H1 OR Y>&H8 THEN YM=-YM
70 F=F+FM:IF F>7 THEN F=1 ELSE IF F<1 THEN F=7
80 GOTO 30

In line 10, I clear the screen. Right now, I’m just using text on the green screen, but ultimately I’ll want to clear the screen to some background color, and “erase” by printing that color over the old ball.

Line 20 initializes the variables I will be using:

X – X position of the top left corner of the ball.
Y – Y position of the top left corner of the ball.
XM – value to add to X for the next X movement (1 to move to the right, -1 to move to the left).
YM – value to add to Y for the next Y movement (1 to move down, -1 to move up).
F – frame of the ball to display. Since ON GOTO/GOSUB uses base-1 values, frames will be 1-x.
FM – value to add to F to get to the next frame. When moving left to right, I’ll add 1 and increment the frame. When the ball bounces off the right side of the screen, I’ll start adding -1 and reverse the animation.

Line 30 erases the ball at the current position. This doesn’t make sense the first time we RUN, but it will have something to erase every time after that. We also calculate the PRINT@ P position from the X and Y values.

Line 40 does the ON GOSUB to the routine to print whatever frame we are supposed to display. If F is 1, it GOSUBs to 100. If F is 4, it GOSUBs to 400.

Line 50 adds the XM value to X, giving us our next X position. It then checks to see if X has gone too far left, or too far right, and reverses the XM value if so.

Line 60 is the same as above, but for the Y value.

Line 70 is similar, but either increments or decrements the frame, then checks to see if it needs to wrap around to the frame at the other side.

After this, we just need the routines that print the ball frames and erase the ball frame:

100 REM FRAME 1
110 PRINT@P+&H02,  "XXXXXX";
120 PRINT@P+&H21, "X      X";
130 PRINT@P+&H40,"X        X";
140 PRINT@P+&H60,"X        X";
150 PRINT@P+&H80,"X        X";
160 PRINT@P+&HA1, "X      X";
170 PRINT@P+&HC2,  "XXXXXX";
180 RETURN

200 REM FRAME 2
210 PRINT@P+&H02,  "XXXXXX";
220 PRINT@P+&H21, "XX     X";
230 PRINT@P+&H40,"XX       X";
240 PRINT@P+&H60,"XX       X";
250 PRINT@P+&H80,"XX       X";
260 PRINT@P+&HA1, "XX     X";
270 PRINT@P+&HC2,  "XXXXXX";
280 RETURN

300 REM FRAME 3
310 PRINT@P+&H02,  "XXXXXX";
320 PRINT@P+&H21, "XXX    X";
330 PRINT@P+&H40,"XXX      X";
340 PRINT@P+&H60,"XXX      X";
350 PRINT@P+&H80,"XXX      X";
360 PRINT@P+&HA1, "XXX    X";
370 PRINT@P+&HC2,  "XXXXXX";
380 RETURN

400 REM FRAME 4
410 PRINT@P+&H02,  "XXXXXX";
420 PRINT@P+&H21, "X XX   X";
430 PRINT@P+&H40,"X XXX    X";
440 PRINT@P+&H60,"X XXX    X";
450 PRINT@P+&H80,"X XXX    X";
460 PRINT@P+&HA1, "X XX   X";
470 PRINT@P+&HC2,  "XXXXXX";
480 RETURN

500 REM FRAME 5
510 PRINT@P+&H02,  "XXXXXX";
520 PRINT@P+&H21, "X  XX  X";
530 PRINT@P+&H40,"X  XXXX  X";
540 PRINT@P+&H60,"X  XXXX  X";
550 PRINT@P+&H80,"X  XXXX  X";
560 PRINT@P+&HA1, "X  XX  X";
570 PRINT@P+&HC2,  "XXXXXX";
580 RETURN

600 REM FRAME 6
610 PRINT@P+&H02,  "XXXXXX";
620 PRINT@P+&H21, "X   XX X";
630 PRINT@P+&H40,"X    XXX X";
640 PRINT@P+&H60,"X    XXX X";
650 PRINT@P+&H80,"X    XXX X";
660 PRINT@P+&HA1, "X   XX X";
670 PRINT@P+&HC2,  "XXXXXX";
680 RETURN

700 REM FRAME 7
710 PRINT@P+&H02,  "XXXXXX";
720 PRINT@P+&H21, "X    XXX";
730 PRINT@P+&H40,"X      XXX";
740 PRINT@P+&H60,"X      XXX";
750 PRINT@P+&H80,"X      XXX";
760 PRINT@P+&HA1, "X    XXX";
770 PRINT@P+&HC2,  "XXXXXX";
780 RETURN

1000 REM ERASE
1100 PRINT@P+&H02,  "      ";
1200 PRINT@P+&H21, "        ";
1300 PRINT@P+&H40,"          ";
1400 PRINT@P+&H60,"          ";
1500 PRINT@P+&H80,"          ";
1600 PRINT@P+&HA1, "        ";
1700 PRINT@P+&HC2,  "      ";
1800 RETURN

(After I typed this, I realize I need a FRAME 8, but I’ll fix that later.)

You can see my simple attempt at making the ball “spin” in action here:

With this proof-of-concept done, I can now get back to trying to make it get done faster by optimizing the BASIC code for speed.

To be continued…

Sub-Etha Software

"In Support of the CoCo and OS-9 since 1990!"

Adding gravity to a BASIC bouncing ball, follow-up.

“Let’s build a worksheet to calculate these values.”

Array or DATA?

READ/RESTORE versus Arrays

Paul’s Video

Jerry Stratton’s SuperBASIC for the CoCo

Floating point and 902.1 follow-up

VIC-20 Super Expander and the CHUG logo

Floating point and 902.1

Hitchhiker’s Guide to the Galaxy is 42 years old?

Color BASIC optimization challenge – scaling

Rules

Adding gravity to a BASIC bouncing ball.

CoCo bouncing BASIC ball, part 5

MC-10 has its advantages

PRINT CHR$()

Definition of insanity…

Self-modifying BASIC code

CoCo bouncing BASIC ball, part 4

CoCo cross development revisited

Discussing the ball project