Could the CoCo run a tiny Large Language Model (AI) thing?

An interesting post appeared on the Color Computer Facebook group the other day… Youngstown Ken posted:

AI on the CoCo? This is a simple language experiment that I think can be done on the coco if anyone would like to try the code. I asked Grok to write a basic program to read a 10MB corpus of sentences from a dictionary and then build a simple Markov graph db to predict the next word in a sequence.

– Youngstown Ken, via Facebook CoCo group

The attached link went to a Grok-generated conversation about the subject. You can read the short interaction here, and see the generated Python and BASIC-ish code:

https://x.com/i/grok/share/sXRGgHcIROsl7PPNXnF8D34MC?fbclid=IwY2xjawI15sRleHRuA2FlbQIxMQABHZVbpT6JU-bIDIQJoVd4KxBf4gJFb2oSJmYX9t9Y3sIShpw0nK6nmP8laA_aem_Q4MMZzgAmLsauqsrU5ID6g

So naturally, someone needs to try this and see if it would even work. AI is terrible at BASIC, though I did have some success talking it through some simple 32-column text screen game code when ChatGPT first made a big public splash a few years ago.

Let’s look at the second code variation it produced:

10 REM Two-Level Markov Chain Predictor for TRS-80 CoCo
20 REM Vocabulary: 1000 words, Trigram model fits in 512KB

30 DIM V$(999): REM Vocabulary array (1000 words)
40 DIM F(999): REM Frequency array for vocab building
50 DIM M(499,3): REM Model: 500 trigrams max (w1,w2,w3,prob)
60 N=0: REM Number of trigrams loaded

70 REM Pass 1: Build vocabulary from corpus.txt
80 OPEN "I",#1,"CORPUS.TXT"
90 FOR I=0 TO 999: F(I)=0: NEXT I: REM Initialize frequency
100 IF EOF(1) THEN 140
110 LINE INPUT #1,L$: REM Read line
120 GOSUB 500: REM Process line for word frequencies
130 GOTO 100
140 CLOSE #1

150 REM Sort top 1000 words by frequency (simplified)
160 FOR I=0 TO 999: V$(I)="<UNK>": NEXT I: REM Default to <UNK>
170 W$="":C=0: REM Temp word and count
180 OPEN "I",#1,"CORPUS.TXT" : REM Reopen for word assignment
190 IF EOF(1) THEN 230
200 LINE INPUT #1,L$
210 GOSUB 600: REM Assign top 1000 words
220 GOTO 190
230 CLOSE #1

240 REM Pass 2: Build trigram model
250 OPEN "I",#1,"CORPUS.TXT"
260 IF EOF(1) THEN 300
270 LINE INPUT #1,L$
280 GOSUB 700: REM Build trigrams
290 GOTO 260
300 CLOSE #1

310 REM Input sequence and predict
320 PRINT "Enter two-word sequence (e.g., 'the quick'):"
330 INPUT S$
340 GOSUB 800: REM Extract last two words and predict
350 PRINT "Next word:";W3$
360 GOTO 320: REM Loop for more predictions

500 REM Subroutine: Count word frequencies
510 W$="":P=1
520 IF P>LEN(L$) THEN RETURN
530 C$=MID$(L$,P,1)
540 IF C$=" " OR P=LEN(L$) THEN GOSUB 550: GOTO 570
550 IF W$<>"" THEN FOR I=0 TO 999: IF V$(I)=W$ THEN F(I)=F(I)+1: GOTO 560: NEXT I
560 W$="":P=P+1:GOTO 520
570 W$=W$+C$:P=P+1:GOTO 520

600 REM Subroutine: Assign top 1000 words (simplified)
610 W$="":P=1
620 IF P>LEN(L$) THEN RETURN
630 C$=MID$(L$,P,1)
640 IF C$=" " OR P=LEN(L$) THEN GOSUB 650: GOTO 670
650 IF W$<>"" THEN FOR I=0 TO 999: IF V$(I)="" AND F(I)>C THEN V$(I)=W$:C=F(I): GOTO 660: NEXT I
660 W$="":P=P+1:GOTO 620
670 W$=W$+C$:P=P+1:GOTO 620

700 REM Subroutine: Build trigrams
710 W1$="":W2$="":W3$="":P=1:W=0
720 IF P>LEN(L$) THEN RETURN
730 C$=MID$(L$,P,1)
740 IF C$=" " OR P=LEN(L$) THEN GOSUB 750: GOTO 770
750 IF W$<>"" THEN W=W+1:IF W=1 THEN W1$=W$ ELSE IF W=2 THEN W2$=W$ ELSE IF W=3 THEN W3$=W$:GOSUB 900
760 W$="":P=P+1:GOTO 720
770 W$=W$+C$:P=P+1:GOTO 720

900 REM Subroutine: Add trigram to model
910 IF N>=500 THEN RETURN: REM Limit to 500 trigrams
920 FOR I=0 TO 999: IF V$(I)=W1$ THEN W1=I:GOTO 930: NEXT I:W1=999
930 FOR I=0 TO 999: IF V$(I)=W2$ THEN W2=I:GOTO 940: NEXT I:W2=999
940 FOR I=0 TO 999: IF V$(I)=W3$ THEN W3=I:GOTO 950: NEXT I:W3=999
950 FOR I=0 TO N-1: IF M(I,0)=W1 AND M(I,1)=W2 THEN M(I,3)=M(I,3)+1:RETURN
960 M(N,0)=W1:M(N,1)=W2:M(N,2)=W3:M(N,3)=1:N=N+1
970 RETURN

800 REM Subroutine: Predict next word
810 W1$="":W2$="":P=1
820 IF P>LEN(S$) THEN GOTO 850
830 C$=MID$(S$,P,1)
840 IF C$=" " THEN W1$=W2$:W2$="":P=P+1:GOTO 820 ELSE W2$=W2$+C$:P=P+1:GOTO 820
850 IF W1$="" THEN W1$=W2$:W2$="": REM Handle single word
860 FOR I=0 TO 999: IF V$(I)=W1$ THEN W1=I:GOTO 870: NEXT I:W1=999
870 FOR I=0 TO 999: IF V$(I)=W2$ THEN W2=I:GOTO 880: NEXT I:W2=999
880 W3$="<UNK>":P=0
890 FOR I=0 TO N-1: IF M(I,0)=W1 AND M(I,1)=W2 AND M(I,3)>P THEN W3$=V$(M(I,2)):P=M(I,3)
900 NEXT I
910 RETURN

This is surprising to me, since it does appear to be honoring the 2-character limit of Color BASIC variables, which is something my early interactions with ChatGPT would not do.

The original prompt told it to do something that could run on a CoCo within 512K. Even Super Extended Color BASIC on a 512K CoCo 3 still only gives you 24K of program space on startup. And, as I recently learned, if you try to use a CoCo 3’s 40 or 80 column screen and clear more than about 16K of string space, you get a crash.

I think I may give this a go. I do not understand how any of this works, so I do not expect good results, BUT I do want to at least try to see if I can make the program functional. If it expects to use 512K of RAM for string storage, this program is doomed to fail. But, 24K would be room for 1000 24-byte words, at least.

To be continued … maybe.

In the meantime, check out that Grok link and see what you can come up with. And let me know in the comments.

2 thoughts on “Could the CoCo run a tiny Large Language Model (AI) thing?

  1. L. Curtis Boyle

    Maybe try it in BASIC09 – use a hard drive file of 512K for “virtual memory”. Might help.

    Reply

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.