See also: part 1, part 2, part 3, part 4, part 5, part 6, part 7, part 8 and part 9.
- 2/14/2017 – Fixed numeric typo (thanks, Geroge P!).
HEX versus DECimal Numbers
As Barbie once said*…
Math is hard! – Barbie
While Mattel’s Math-Is-Hard Barbie never quite made the splash the marketing team had hoped for, her sentiment lives on.
Side Note: *This is in reference to a the Teen Talk Barbie doll released in 1992, and out of the 270 phrases the doll could say, that was not one of them. The real quote was “Math class is tough!”
Earlier in this series, I touched on the fact that dealing with numbers is time consuming for BASIC. Something as simple as B=65535 takes time to process as the interpreter translates that base-10 decimal number in to an internal floating point value. The more digits, the more work. For instance:
0 REM NUMBERS.BAS 10 TIMER=0:TM=TIMER 20 FOR A=1 TO 1000 30 B=1 40 NEXT 50 PRINT TIMER-TM
That prints a value of 183. If you change line 3 to read “B=12345” the number jumps to 485. You can see the increase:
- B=1 – 183
- B=12 – 262
- B=123 – 337
- B=1234 – 408
- B=12345 = 485
Obviously, the more numbers to parse and convert, the more time it will take. It also seems to matter if the value has a decimal point in it:
- B=1.0 – 403
- B=1.1 – 476
Even though that is only three characters to process, it takes longer than B=123. Clearly, more work is being done on floating point values. Even though all Color BASIC numbers are represented internally as floating point, it still makes sense to avoid using them unless you really need them.
You can also represent a base-16 number in hexadecimal. For the value of 1, it feels like parsing “&H1” should take longer than parsing “1”. Let’s try:
- B=&H1 – 180
- B=&H12 – 175
- B=&H123 – 200
- B=&H1234 – 203
It seems that parsing a hexadecimal value is much faster than dealing with base-10 values. Using this, you could speed up a program just by switching to hex, provided that your numbers are between 0 and 65535 (the values that can be represented in hex). I was surprised to see that negative values also work:
- B=&HFFFF – 201
- B=-&HFFFF – 230
It seems dealing with the negative takes a bit of more time, though, so it makes sense to avoid using them unless you really need them. ;-)
With this in mind, let’s test a FOR/NEXT loop:
10 TIMER=0:TM=TIMER 20 FOR A=&H1 TO &H3E8 30 B=&H1 40 NEXT 50 PRINT TIMER-TM
This prints 182, which is basically the same speed as the original that used 0 TO 1000. I guess hexadecimals don’t really help out FOR/NEXT.
Why? Because the FOR/NEXT statement is only parsed once, then the loop counters are set up and done. It is probably a tad faster to use hex, but that savings only happens once in the “do it 1000 times” test.
But, as you see, USING the variables gets faster. Any place we use a number, it seems using a hex version of that number may speed it up:
10 TIMER=0:TM=TIMER 20 FOR A=1 TO 1000 30 IF A>&HFF THEN REM 40 NEXT 50 PRINT TIMER-TM
This prints 278. Doing it with A>255 prints 427! Imagine if you could speed up every time you used a number in your code:
10 TIMER=0:TM=TIMER 20 FOR A=1 TO 1000 30 PRINT@&H20,"HELLO" 40 NEXT 50 PRINT TIMER-TM
That prints 391, but changing it to PRINT@32 prints 469! If you use a bunch of PRINT@s in your code, you can speed them up just by switching to hex!
Math could be accelerated, too, simply due to the number conversion being faster. The more digits, the better advantage hex has:
- B=A+&H270F – 285
- B=A+9999 – 483
And the more numbers, the more time you can save by using hex. A common PRINT thing is to use the length of a string to figure out how to center is on the screen:
0 REM NUMBERS.BAS 10 TIMER=0:TM=TIMER 15 CLS:A$="HELLO, WORLD!":LN=LEN(A$) 20 FOR A=1 TO 1000 25 PRINT@32*8+16-LN/2,A$ 40 NEXT 50 PRINT TIMER-TM
That prints 1284. Converting line 24 to HEX:
And now it prints 1097.
In a game where you might be PRINTing things on the screen constantly, those savings could really add up.
Pity that math is hard, else we could just use hex in our programs and get a free speed boost.
Until next time…
One place where a decimal point will give an improvement is when using the value 0 in an expression. Basic will accept a stand-alone decimal point as the number 0, but it will process it faster than the ‘0’ character.
Try comparing the speed of:
IF N < 0 THEN …
with that of:
IF N < . THEN …
You want something to really blow your mind? You can put spaces in the middle of numbers. You can also put spaces in the middle of variable names.
There is absolutely no way that is true. ;)
A B = 10
PRINT A B
A = 1 2 3 4 5
That’s awesome. Why the heck does that work?
The CHRGET subroutine for MS based interpreters parsing the program text skips every space character. The one that are needed to prevent the tokenizer to misinterpret keywords or variable names aren’t necessary at all after a line is tokenized. These blanks are usually kept for ease of editing.
Spaces in string constants are directly handled by the expression evaluation without using CHRGET.
It’s just a side-effect due to the lack of a strong lexical analysis.
Thanks, Johann! I don’t know your name. How did you come to know the internals of the interpreter?
Did a lot on Commodore based systems, digging into the interpreter to merge Basic and machine code stuff, to accomplish parameter passing and return values, extending the interpreter, improved string garbage collection and so on. Later I got a Dragon32 which fascinates my (my preferred CPU) I stumbled into the ECB and saw all the similarities … I could do the same interfacing, nearly the same data structures, just other addresses (and of course the endianess …). ;)
I started on a VIC-20, and preferred the Extended BASIC on the CoCo. It wasn’t until recent years that I found out CBM BASIC was Microsoft. Any idea why it used GET$ instead of INKEY$?
The GET command is typical for the 6502 branch of the early MS Basic. I think GET was simply part of the unified I/O commands (based on logical file numbers on top of static device numbers) which allowed to read a single character from the standard input device (the keyboard). Output redirection with CMD can easily achieved.
As opposed INKEY$ is a string function, not a command which has to be invented to the read the keyboard because the above mentioned file number layer is missing. The basic I/O command are reduced to PRINT and INPUT. Single byte input was not a necessary. But Commodore computers (mainly business oriented) with their IEEE interface needed the single-byte-read-ability to communicate with devices in a very distinct and controlled manner (later on at home computer times this device was only the floppy which demanded this kind of operation).
I hope this meets the point. ;)
So in my VIC days, GET A$ was kind of an implied GET #stdin,A$ read?
> So in my VIC days, GET A$ was kind of an implied GET #stdin,A$ read?
Exactly, if you open the keyboard device (number 0) you can do this
10 OPEN 1,0
20 GET#1,A$:IF A$=”” GOTO20
30 PRINT A$
Alas, there no concept of STDIN as opposed to STDOUT which is controlled by the command CMD. So GET A$ (without hash argument) is always bound to the keyboard. But this is just a limitation of Basic itself, the underlaying Kernal on CBMs actually do keep a “current input device”.
I understand. I only had the Datasette and a cheesy thermal printer in my VIC days, so I never learned much about I/O. It all makes much more sense to me today than it did back then :)
Thanks for commenting! I am learning much.
Son of a gun. A quick test in my benchmark program using “Z=0” showed 178, and “Z=.” showed 141.
It looks like you can do PRINT@.,”HELLO” too. Wild.
How did you figure this out???
I read about it someplace, but don’t remember where. It was probably part of a discussion on the CoCo mailing list.
The reason -&HFFFF is slower is actually pretty straight forward yet also counterintuitive. It actually evaluates the minus as unary negation and the &HFFFF is converted independently. Now negation is practically instant in the floating point representation used. However, it does require an extra trip through the expression evaluator.
Also, as a side note, on the Coco3, &H (and &O) can be used for 24 bit values. They expanded it so that LPOKE and LPEEK could be used with hex addresses across the whole address space.
Pingback: Interfacing assembly with BASIC via DEFUSR, part 5 | Sub-Etha Software
Pingback: Optimizing Color BASIC, part 7 | Sub-Etha Software
Pingback: Optimizing Color BASIC, part 8 | Sub-Etha Software
Pingback: Optimizing Color BASIC, part 9 | Sub-Etha Software
Pingback: Optimizing Color BASIC, part 9 | Sub-Etha Software