Optimizing BASIC for speed versus size

Normally, I would say that “optimizing” a program could mean two things:

Making the program as fast as possible (even if it was larger)
Making the program as small as possible (even if it was slower)

There is always a compromise.

Loop Unrolling

In BASIC, this will print 100 times:

10 FOR I=1 TO 100
20 PRINT I
30 NEXT

But this might be a faster way to print 100 times:

10 PRINT 1
20 PRINT 2
30 PRINT 3
...
1000 PRINT 1000

This is a poor example because PRINTing a variable is generally faster than printing a number due to the BASIC interpreter needing to parse all the TEXT digits of the number and convert it to a string to PRINT. To my surprise, when I ran this test, I found they were basically the same speed. The overhead of processing a line and converting a string of digits seemed to offset the time saved of the loop that was printing a variable that doesn’t need parsing. (FOR/NEXT loop fast, PRINTing variable fast versus a bunch of lines that might be faster but have slower parsing for the text digits.)

However, if the series of individual PRINT lines were printing a variable (“PRINT X”) it would be more than twice as fast.

10 PRINT X
20 PRINT X
...
1000 PRINT X

Apples to apples (difficult to do in this example), blasting out a ton of lines is faster than doing them in a loop. I’ve heard assembly language programmers refer to this as “loop unrolling” and it can work in BASIC as well.

A better example might be how you draw a background screen. Consider a 32 column text screen that is cleared to black, and a border is printed around it:

Here is some code that would do this:

10 CLS 0
20 PRINT CHR$(128);STRING$(30,207);CHR$(128);
30 FOR I=1 TO 14
40 PRINT CHR$(207);STRING$(30," ");CHR$(207);
50 NEXT
60 PRINT CHR$(128);STRING$(30,207);
70 GOTO 70

Or, you could do it using loop unrolling and change that loop of 14 steps to be 14 PRINTS:

10 CLS 0
20 PRINT CHR$(128);STRING$(30,207);CHR$(128);
30 PRINT CHR$(207);STRING$(30," ");CHR$(207);
40 PRINT CHR$(207);STRING$(30," ");CHR$(207);
50 PRINT CHR$(207);STRING$(30," ");CHR$(207);
60 PRINT CHR$(207);STRING$(30," ");CHR$(207);
70 PRINT CHR$(207);STRING$(30," ");CHR$(207);
80 PRINT CHR$(207);STRING$(30," ");CHR$(207);
90 PRINT CHR$(207);STRING$(30," ");CHR$(207);
100 PRINT CHR$(207);STRING$(30," ");CHR$(207);
110 PRINT CHR$(207);STRING$(30," ");CHR$(207);
120 PRINT CHR$(207);STRING$(30," ");CHR$(207);
130 PRINT CHR$(207);STRING$(30," ");CHR$(207);
140 PRINT CHR$(207);STRING$(30," ");CHR$(207);
150 PRINT CHR$(207);STRING$(30," ");CHR$(207);
160 PRINT CHR$(207);STRING$(30," ");CHR$(207);
170 PRINT CHR$(128);STRING$(30,207);
180 GOTO 180

When I time those two, there is a teeny speed improvement in the second version (around 1/60th of a second). Is that enough to justify the overhead of the extra memory needed? Probably not in this example, but it might be if it was in code that was redrawing a screen for a game or something.

Side Note: Of course, we could speed this up by pre-generating the strings and the PRINTing them. That would avoid all the parsing of CHR$() and the numbers and building temporary strings over and over and over.

10 CLS 0
20 PRINT CHR$(128);STRING$(30,207);CHR$(128);
25 L$=CHR$(207)+STRING$(30," ")+CHR$(207);
30 FOR I=1 TO 14
40 PRINT L$;
50 NEXT
60 PRINT CHR$(128);STRING$(30,207);
70 GOTO 70

That is about 40% faster than the original, and it could be made even faster, but that’s not the point currently.

Inlining

Another way to speed things up is eliminate as many GOTOs and GOSUBs as you can. Every GO has to search either forward, line by line, looking for the target line (if going to a line after the current one), or start at the very first line and search forward (if starting at an earlier line).

A 10,000 line program will find it quite slow to be on line 10000 and type GOTO 9999. Likewise, line 1 saying “GOTO 10000” will be quite slow.

If there is enough memory, inlining subroutine code will speed things up every time the code is used. Consider this subroutine from my old *ALL RAM* BBS:

15 BR$="*==============*==============*"
...
25 A$="Welcome To *ALL RAM* BBS!":GOSUB1055
...
50 'New User
55 A$="Password Application Form":GOSUB1055
...
100 'Main Menu
105 A$="*ALL RAM* BBS Master Menu":GOSUB1055
...
150 'Call Sysop
155 A$="Calling the Sysop":GOSUB1055
...
200 'Goodbye
205 A$="Thank you for calling":GOSUB1055
...
250 'Userlog
255 A$="List of Users":GOSUB1055
...
1050 'Function Border
1055 CLS:PRINTBR$:PRINTTAB((32-LEN(A$))/2)A$:PRINTBR$:RETURN

The routine at line 1050 takes a string in A$. It will clear the screen, print a border string, print the A$ centered, then print the border string again and return.

The time it takes to run this code should be the same no matter where it appears in the program, but the time it takes to get to this code will vary depending on where it is called from. If you were calling it from line 1040, it would be quite fast. If you were calling it from line 20, it has to scan forward through every line from 20-1050 to find it, which would be slower. At least, because it is a GOSUB routine, it will return quickly. (That’s a big advantage of using GOSUB over GOTO.)

This routine could be inlined in to the code, and each use would be slightly faster.

15 BR$="*==============*==============*"
...
25 A$="Welcome To *ALL RAM* BBS!"
26 CLS:PRINTBR$:PRINTTAB((32-LEN(A$))/2)A$:PRINTBR$
...
50 'New User
55 A$="Password Application Form"
56 CLS:PRINTBR$:PRINTTAB((32-LEN(A$))/2)A$:PRINTBR$
...
100 'Main Menu
105 A$="*ALL RAM* BBS Master Menu"
106 CLS:PRINTBR$:PRINTTAB((32-LEN(A$))/2)A$:PRINTBR$
...
150 'Call Sysop
155 A$="Calling the Sysop"
156 CLS:PRINTBR$:PRINTTAB((32-LEN(A$))/2)A$:PRINTBR$
...
200 'Goodbye
205 A$="Thank you for calling"
206 CLS:PRINTBR$:PRINTTAB((32-LEN(A$))/2)A$:PRINTBR$
...
250 'Userlog
255 A$="List of Users"
256 CLS:PRINTBR$:PRINTTAB((32-LEN(A$))/2)A$:PRINTBR$
...

Again, this is a pretty poor example, but if this routine was something that required speed (a game, animation, etc.) every little bit would help.

Other time-saving techniques

I have covered many other optimizations to code, such as using “&H” hex numbers rather than decimal and using “.” instead of 0, and all of these things can combine to make a dramatically faster program — at the expense of code size. (A=9 takes three bytes, while A=&H9 is faster but takes four bytes.)

But if we were back in 1980 and trying to program on a 4K CoCo, perhaps we would have more things to worry about than speed.

Size matters

On that 1980 4K CoCo, a “PRINT MEM” on startup shows 2343 bytes free for a program. If you didn’t plan to use any strings or string functions, a CLEAR 0 would give back 200 bytes for a total of 2534 bytes free. You probably won’t be doing any loop unrolling or inlining in this environment.

Instead, focusing on the smallest way to do something makes more sense.

Subroutine everything!

If any bit of code is used more than once, it may make sense to make it a subroutine as long as the overhead of having to add “GOSUB xxx” and “RETURN” do not take more than the duplicate code would.

10 X=0:Y=0:P=(Y*32)+X:PRINT@P,"HELLO"

Above, some X and Y coorindates (representing the CoCo’s 32×16 text screen) are converted to a PRINT@ screen position (0-511). This is the type of code that might appear many places where something is printed at the screen. This makes it a good candidate for being a subroutine:

10 X=0:Y=0:GOSUB 1000:PRINT@P,"HELLO"
...
1000 P=(Y*32)+X:RETURN

Each time “P=(Y*32)+X” is used, that takes 10 bytes. The overhead for “GOSUB 1000” is 8 bytes, then the subroutine itself adds 5 bytes for the line overhead and 2 bytes for “:RETURN”. That’s 15 bytes extra we just added, but if it were used more than a few times, it starts saving memory. (See note below about line numbers and how you can make this save even more.)

And, if the routine was using X/Y to a P position to print something, you might as well put that in the subroutine as well and eliminate the P variable completely:

10 X=0:Y=0:A$="HELLO":GOSUB 1000
...
1000 PRINT@(Y*32)+X,A$:RETURN

With just a bit of reworking code, many bytes can be saved.

But that’s not the best example of saving bytes, because “GOSUB 1000” takes up a lot of memory. Let’s look at how to make it take up less.

Small line numbers save bytes

When a program is tokenized (keywords like PRINT replaced with one and two byte tokens representing the command), the line number is changed in to a two-byte representation of the line. It doesn’t matter if you are at line 5 or line 55000, the line number will take up two bytes of the tokenized line.

But, when you have a line number in the line itself, such as as the target of GOTO, GOSUB if an IF/THEN, those digits will be stored in full just as you typed them:

10 GOTO 10
20 GOTO 100
30 GOTO 1000
40 GOTO 10000

Above, line 20 will take up one more byte than line 10, because “100” takes one more byte than “10”. Because of this, using shorter line numbers will save space. Instead of this:

10 PRINT "ENTER YOUR NAME";:GOSUB 10000
20 PRINT "PHONE NUMBER";:GOSUB 10000
30 PRINT "FAVORITE COLOR";:GOSUB 10000
...
1000 INPUT A$:RETURN

…you could move the subroutine to the top and use single digit line numbers like this:

0 GOTO 10
1 INPUT A$:RETURN
...
10 PRINT "ENTER YOUR NAME";:GOSUB 1
20 PRINT "PHONE NUMBER";:GOSUB 1
30 PRINT "FAVORITE COLOR";:GOSUB 1
...

In this example, you have immediately added extra overhead by having line 0, but as long as you save enough to offset that, the program will be smaller. In this example, the three “GOSUB 10000” were turned in to “GOSUB 1” saving four bytes every time they were used.

As a bonus, having the subroutines at the start of the program will make them faster, since every GOSUB will be to a higher number, and BASIC will just start at the first line and scan forward, finding the subroutine much quicker. Smaller and faster!

And, of course, don’t forget to number your program by 1 — or, if using Extended Color BASIC with the RENUM command, you can do this:

RENUM newline, startline, increment
Renumbers a program.
newline is the first new renumbered line. If you omit newline, BASIC uses 10.
startline is where the renumbering starts. If you omit start/ine, BASIC renumbers the entire program.
increment is the increment between each renumberedline. If you omit increment, BASIC uses 10.
Note: RENUM does not rearrange the order of lines.
– Getting Started With Extended Color BASIC (1984 edition), page 58

RENUM 0,0,1

That will renumber the program to start at line 0, beginning at line line 0, and incrementing by 1s. If you started with:

100 PRINT
200 PRINT
300 PRINT
400 PRINT

…a RENUM 0,0,1 will give you:

0 PRINT
1 PRINT
2 PRINT
3 PRINT

You may save a few bytes every time a GOTO/GOSUB or IF/THEN is used.

Side Note: I believe it was Alex “Basic Utils” Evans who recently pointed out to me that GOTO and GOSUB each used two bytes. They are actually three token words — GO, TO and SUB. You can even write them separated with a space like “GO TO 100” or “GO SUB 100“. I hadn’t seen that since I was first learning BASIC back in 1982 or so, in some books a schoolmate was loaning me.

Avoid gratuitous use of lines

Every line in Color BASIC has five bytes of overhead. If you typed in a line that was just:

10 A

…you would see memory goes down by six bytes. Five bytes for the line overhead, and one byte for the “A”.

Side Note: Those five bytes are a 2-byte address of the next line, a 2-byte line number, and a trailing 0 terminator at the end of the line. In the above example, the bytes would look like “[xx][xx][yy][yy][A][0]” for that line.

Because of this, writing things out on separate lines consumes more memory, and is also slower since BASIC has more line number data to parse (included when doing a GOTO/GOSUB).

10 PRINT
20 PRINT
30 PRINT
40 PRINT
50 PRINT

…will be 15 bytes larger than doing:

10 PRINT:PRINT:PRINT:PRINT:PRINT

The first example would be tokenized as:

[xx][xx][yy][yy][print_token][0] <- 6 bytes
[xx][xx][yy][yy][print_token][0] <- 6 bytes
[xx][xx][yy][yy][print_token][0] <- 6 bytes
[xx][xx][yy][yy][print_token][0] <- 6 bytes
[xx][xx][yy][yy][print_token][0] <- 6 bytes (30 bytes total)

…versus this for the second example:

[xx][xx][yy][yy][print_token][:][print_token][:][print_token][:][print_token][:][print_token][:][0] <- 15 bytes total

In Color BASIC, the input buffer will allow you to type up to 249 characters for one line, so you can really pack a lot of commands together and save space (and speed).

Side Note: In Extended BASIC, the EDIT command can be used to get a few extra characters on a line. If you type a long line that features BASIC keywords up to the max of 249 characters, then press ENTER, the line gets tokenized and words that took up five characters like “PRINT” get turned in to one or two byte tokens. This can free up enough space to allow doing an EDIT on the line, then an Xtend to go to the end of the line, and type a few more characters. “When every last byte matters…”

Other space-saving techniques

REM – If space matters, leave out any REMarks in the code as they both take up extra space and slow the program down. I would sometimes comment my programs pretty heavily, then save the commented version out, and then delete the remmarks and save out the smaller version.
Spaces – Removing any unnecessary spaces (“GO TO 1000” becomes “GOTO1000”) helps.
Semicolons – A semi-colon is only required at the end of a PRINT line if you want to avoid having the carriage return printed, or when separating variables that BASIC cannot tell apart form keywords. For example, “PRINT A;B;C” is obviously needed because “PRINT ABC” would be a different variable. But, the semicolon is assumed between items you can print such as “PRINT VAL(A$);B$;CHR$(128)” which works just fine as “PRINT VAL(A$)B$CHR$(128)”. Heck, even when printing string variables like “PRINT A$B$C$D$” the semicolons are not needed. BUT, you still need it if BASIC can’t tell. If you wanted to print A$, B (number) and C$, you would need “PRINT A$B;C$” since BASIC needs to know where the B non-string variable ends.
Trailing Quotes – If a line ends in a quote, the quote can be left off. This goes for things like PRINT”SOMETHING” or even A$=”SOMETHING”. BASIC will figure it out when it gets to the end of the line.
Parenthesis – There are times when parenthesis are required to make math work (order of operations), but sometimes it is just done to make the code easier to read (like spaces and splitting up one instruction per line). In my X/Y example, I used “(Y*32)+X” but the parens don’t matter. The multiplication will happen first, then the addition, so “Y*32+X” will save two bytes. TEST FIRST! Some things look like they should work, but will require parens. (“1*256+1 – 1*256+1” feels like it should be zero, but it will be 2. For this you do need “(1*256+1)-(1*256+1)”.

And, of course, figuring out how to reduce code and combine things is always part of the process.

See also “Carl England’s CRUNCH program” which will do those steps for you.

There are many other tips and tricks for optimizing BASIC for speed or size, but hopefully these examples get you started in experimenting.

Until next time…

3 thoughts on “Optimizing BASIC for speed versus size”

allenhuffman May 11, 2023 at 5:23 pm

Testing…

Loading...

Reply ↓
Johann Klasek May 12, 2023 at 6:37 am

Ad subroutines: The order matters: Place often used subroutines called from all over the program at the start of a program (the line search is expensive). This reduces probably the need for inlining.

Probably already mentioned in some of the other great optimization articles, but these are my favorites which I think might gain the most in time savings … (addressing the 3 main reasons for time losses: variable search, line search, value conversion).

Ad constants: Avoid constants numbers in critical areas (loops). The conversion into the floating point representation takes very much time (depending on digit count). Put the values into variables. But there is a trade-off to consider. Variable a searched linearly and variable definition should be ordered according to their reference usage. This saves

Do variable ordering: DIM allows early definition to do it in the order you need. Loop variables (which are typically referenced frequently) might be defined at the beginning with DIM I,J,K (which ensures to be placed a the beginning of the variable list which grows dynamically).

Reconstruct GOTO loops into FOR-NEXT loops. A construction like FOR S=-1 TO 0: do something: S=(WHILE_DO_CONDITION):NEXT
(just one of many possible forms)
The loop terminates as soon the while-condition fails.
This ensures that the loop is time stable on every place in the program (no overhead for back-jumping searches raised by GOTOs).

Loading...

Reply ↓
1. Allen Huffman Post authorMay 12, 2023 at 2:04 pm
  
  Rewriting code to use STEP0 GOTOs is something I have yet to try, but I need to. These are great notes! I’ll add!
  
  Loading...
  
  Reply ↓

Sub-Etha Software

"In Support of the CoCo and OS-9 since 1990!"