Counting characters in a string in Color BASIC

And so it begins…

I want to visualize some playing card statistics, and the easiest way I could think of to do this was to write a program in BASIC. Of course.

There are many approaches I can take, so as I put figure it out I will “show my work” and experiments that help me figure out which one will work best.

For my task, I will greatly simplify a deck of cards by ignoring suits (I just care if it’s an Ace, a Five, a King or whatever). I figure I can store it as a 52-character string representing each card:

A123456789JQKA123456789JQKA123456789JQKA123456789JQK

I could “shuffle” this “deck” and end up with a string in “random” order. I believe I have touched on randomizing a list in the past, and had some great suggestions in comments on how to do it better. Forgetting all of that, let’s just say I end up with as string that gets randomized.

But I digress…

The next question will be: “How many Kings are left in the deck?”

Some of you will already see I am heading down a poor path for doing this, but let’s start there anyway.

One way of counting a specific character in a string is to loop through it and use MID$ to pull out an individual character. Something like this:

FOR A=1 TO LEN(A$)
IF MID$(A$,A,1)="S" THEN C=C+1
NEXT

That would count all the “S” characters that appear in the string. Since every time MID$ is used it has to build a new string representing that portion of the original string, this should be our slowest way to do this. On a system with tons of strings in use, string manipulation gets real slow. For such a simple program, this might be fast enough.

Another approach, which was originally shown to me during a word wrap article as a submission, would be to use VARPTR to get the memory location of the 5-byte string ID memory, and then go to the memory where the string bytes are stored and use PEEK to look for them. You can find details in my earlier article on VARTPR.

The memory location that VARPTR returns will have the length of string as the first byte (byte 0), then an empty byte (byte 1, always 0), then the next two bytes will be the address where the string is stored (bytes 2 and 3) followed by a zero (byte 4). Knowing this, something like this would do the same thing as MID$:

A=VARPTR(A$)
SL=PEEK(A)
SS=PEEK(A+2)*256+PEEK(A+3)
FOR A=SS TO SS+SL-1
IF PEEK(A)=ASC("S") THEN C=C+1
NEXT

And do it faster.

VARPTR is a legal BASIC function, but it still seems nasty to reach in to string memory to do this. Thus, I came up with the idea of using INSTR. This function returns the start location of a matching string in another string, or 0 if not found:

PRINT INSTR("ABCDEFG","D")

That should print 4, since a “D” is located at the 4th position in the string.

You can also add an additional parameter which is where in the string to start searching. Doing this:

PRINT INSTR(5,"ABCDEFG","D")

…would print 0, because it starts scanning at the 5th character (just past the D) of the string, and then won’t find anymore.

I could start using INSTR with a position of 1 (first character), and if it comes back with a value other than 0, I found one. That value will be the position of the found character. I could then loop back and use that position + 1 to scan again at the character after the match. Repeat until a 0 (no more found) is returned. That lets the scan for characters be done by the assembly code of the BASIC ROM and is even faster. It looks like this:

F=0
xxx F=INSTR(F+1,A$,T$):IF F>0 THEN C=C+1:GOTO xxx

And we put them all together in a benchmark test program…

10 ' COUNTSTR.BAS
20 A$="THIS IS A STRING I AM GOING TO USE FOR TESTING. I WANT IT TO BE VERY LONG SO IT TAKES A LONG TIME TO PARSE."
30 T$="S":T=ASC(T$)
40 '
50 ' MID$
60 '
70 PRINT "MID$:";TAB(9);
80 TIMER=0:C=0
90 FOR A=1 TO LEN(A$)
100 IF MID$(A$,A,1)=T$ THEN C=C+1
110 NEXT
120 PRINT C,TIMER

130 '
140 ' VARPTR
150 '
160 PRINT "VARPTR:";TAB(9);
170 TIMER=0:C=0
180 A=VARPTR(A$)
190 SL=PEEK(A)
200 SS=PEEK(A+2)*256+PEEK(A+3)
210 FOR A=SS TO SS+SL-1
220 IF PEEK(A)=T THEN C=C+1
230 NEXT
240 PRINT C,TIMER

250 '
260 ' INSTR
270 '
280 PRINT "INSTR:";TAB(9);
290 TIMER=0:C=0:F=0
300 F=INSTR(F+1,A$,T$):IF F>0 THEN C=C+1:GOTO 300
310 PRINT C,TIMER

And running prints:

Wow, using INSTR is six times faster than MID$? And four times faster than VARPTR. Nice.

Now you know a bit about what I need to do. I need to represent cards in a deck (and be able to “draw” cards from that deck) and calculate how many of a specific card remain in the deck.

Since I do not need to track or show the suits (hearts, spaced, clubs, diamonds), I figure I could use one byte in a string.

To be continued … but in the meantime, do you have a better approach? Comment away!

2 thoughts on “Counting characters in a string in Color BASIC

  1. RogelioP

    Nice to see INSTR getting some attention. Been trying to write up a Roman Numerals parser to have the program detect an invalid Roman Numeral entry, perhaps I should give INSTR a chance…

    Reply
    1. Allen Huffman Post author

      I would be interested in seeing that. A few years ago, when I fell down the Mandela Effect rabbit hole, I saw about some clocks using IIII for IV. I had never noticed that my entire life, and I see it everywhere. I didn’t even know that was a valid representation. So clearly, I don’t know the rules.

      Reply

Leave a Reply to RogelioPCancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.