Author Archives: Allen Huffman

About Allen Huffman

Co-founder of Sub-Etha Software.

UnderColor’s spiral challenge from 1984 – part 3

ALERT! ALERT! We are doing this wrong. It has been pointed out in a comment to an earlier installment that we missed an important part about what this contest was supposed to produce!

More on that in a moment… But first, let’s look at a faster version of the challenge, created by Dillon Teagan:

10 TIMER=0
20 PMODE4,1:PCLS1:SCREEN1,1
30 L=&HFF:I=-3:DIMR,L,U,D
40 D$="D=D;R=R;U=U;L=L;
50 DRAW"BM0,191C0R=L;U191L=L;
60 FORD=188TO3STEP-6:R=L+I:U=D+I:L=R+I:DRAWD$:NEXT
70 PRINTTIMER/60

This one clocks in at 2.7 seconds, and does some wonderful optimizations!

First, Dillon clearly understands how the BASIC interpreter works. He is doing things like using hex &HFF instead of decimal 255 which makes that bit parse a tad faster. Next, you see him declare a variable, followed by a DIM which pre-declared R, L, U and D. In this example, that DIM probably does not help, but in a real program, you can declare your variables up front and do them in the order of “most accessed” to least. This sets their order in the variable table, so when you try to access one, the one you access the most can be at the top of the list. I’ve posted about this before, but consider this:

10 DIM A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W,X,Y,Z
20 PRINT Z
30 Z=Z+1
40 IF Z<100 THEN 20

If we truly did have 26 variables in use, and declared them like this, Z would be at the end of the variable table. EVERY time Z is needed, BASIC has to scan through all 26 variables trying to match Z so it can be used. This would be MUCH slower than, if you knew Z was being used the most often, you did this:

10 DIM Z,A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W,X,Y
20 PRINT Z
30 Z=Z+1
40 IF Z<100 THEN 20

With that one minor change (declaring Z first), Z is now the first variable in the table and will be found much quicker. Try it sometime.

But I digress…

The next cool optimization Dillon does is by drawing the initial bottom line (bottom left to bottom right, 256 pixels wide) and right side (bottom right to top right, 192 pixels tall) in line 50, along with the first left (top right to top left, 256 pixels wide) before entering the loop.

The loop itself is using a FOR/NEXT loop which is faster than using GOTO since no line scanning has to be done. BASIC stores where the FOR is, and when NEXT is encountered it pops back to that location rather than scanning forward to find the target line, or starting at the first line and scanning forward from there. Nice.

With the first three “full width/height” lines out of the way, the loop is just doing the “minus 3” size of all four lines. That’s clever. The entire draw string is in a string (D$) and I am unsure if this speeds things up versus having it directly in the line itself (line 60).

Impressive. I wish I’d thought of that!

However… we have been doing it wrong, so far, it seems.

We’ve been doing it wrong so far, it seems.

In a comment left on part 1 of this series, Paul Fiscarelli pointed out something that I completely got wrong:

Hello Allen – I sent you a DM on FB, but I don’t think you’ve seen it yet. What you have posted above is not exactly an accurate reproduction of what is being asked in the challenge question. You are using an offset decrease of 3 in each of your iterations, which produces a gap of only 2-pixels in both height and width. The challenge is indicating a gap of 3-pixels between lines, which requires an offset decrease of 4 in each iteration. This is further evident in the challenge’s diagram of the spiral, which indicates a line-length of 188-pixels for the line on the far left-hand side of the screen. If you perform a screen grab in XRoar (pixel perfect geometry of 640×480 window and 512×384 screen resolution), you will find your code generates a line length of 189 pixels (scale the 512×384 in half).

If you change the offset decrease in your code to 4 instead of 3, you will achieve a render time of roughly 2.43 seconds. This is due to the fact that you are rendering about 23% fewer lines with the DRAW statement.

You can reduce this time even further if you were to use only a single offset variable for drawing both the horizontal and vertical lines, and by adding a separate width to the horizontal lines with a value of 64 = (256w – 192v). This method will shave roughly 0.10 seconds off your render time, down to approximately 2.33 seconds.

10 TIMER=0
20 PMODE4,1:PCLS:SCREEN1,1
30 OF=191:DRAW”BM0,191″
40 DRAW”R=OF;R64″
50 DRAW”U=OF;L=OF;L64;”
60 OF=OF-4
70 DRAW”D=OF;R=OF;R64;”
80 OF=OF-4
90 IF OF>0 THEN 50
100 TM=TIMER
110 IF INKEY$=”” THEN 110
120 PRINT TM/60

As an additional optimization step, you can replace the IF/THEN boundary check with a FOR/NEXT loop, to shave another 0.05 seconds from your render time – getting it down to roughly 2.28 seconds.

10 TIMER=0
20 PMODE4,1:PCLS:SCREEN1,1
30 OF=191:DRAW”BM0,191″
40 DRAW”R=OF;R64″
50 FOR N=0 TO 23
60 DRAW”U=OF;L=OF;L64;”
70 OF=OF-4
80 DRAW”D=OF;R=OF;R64;”
90 OF=OF-4
100 NEXT
110 TM=TIMER
120 IF INKEY$=”” THEN 120
130 PRINT TM/60

There are probably some other optimizations that can be done here – but it’s a start. Oh, I also tested these examples with VCC 2.1.9.2-pre2. I’m sure there will be slight variations in timing with the different emulators, and probably even with different versions of VCC.

Paul Fiscarelli

So, uh … oops. It seems obvious now:

Let’s regroup, and take a look at Paul’s versions in the next installment.

Until then… RTFM!

Generating C functions and prototypes using macros – part 1

See Also: part 1 and part 2.

There is always a first time for everything, and my first time doing this was a few years ago with a day job task. I was going to create a Windows DLL (dynamic link library) that would handle hundreds of messages (write/read response) via the I2C protocol.

The message protocol featured a header that contained a Message Type and Command Code. There were several types of messages, but for this blog post let’s just look at two types:

  • DATA – a message sent to do something, with our without payload data. These messages get a response back that is either an ACKnowledgement (it worked) or a NAK (it did not work). For example “set the time to X”.
  • QUERY – a message sent to request information, such as “what time is it?”

The Command Code was a byte, which meant there could be up to 256 (0-255) commands per message type. For example, some DATA messages might be defined like this:

#define DATA_PING        0 // ACK if there, NAK if not
#define DATA_RESET_CPU   1 // Reset CPU
#define DATA_EXPLODE     2 // Cause the penguin on the TV set to explode.

And QUERY messages might be like:

#define QUERY_STATUS      0 // Returns a status response
#define QUERY_VOLTAGE     1 // Returns the voltage
#define QUERY_TEMPERATURE 2 // Returns the temperature

These are, of course, made up examples, but you get the idea.

We had a number of different board types on our system that could receive these messages. Some messages were only applicable to specific boards (like, you couldn’t get temperature from a board that did not have a thermometer circuit or whatever). My idea was to create simple C functions that represented the message sent to the specific board, like this:

resp = AlphaBoardPING ();
resp = BetaBoardPING ();
resp = DeltaBoardPING ();
...
resp = BetaBoardRESET_CPU ();
...
resp = DeltaBoardQUERY_STATUS (...);

…and so on. I thought it would make a super simple way to write code to send messages and get back the status or response payload or whatever.

This is what prompted me to write a post about returning values as full structures in C. That technique was used there, making it super simple to use (and no chance of a wrong pointer crashing things).

I experimented with these concepts on my own time to make sure this idea would work. Some of the things I did ended up on my GitHub page:

  • https://github.com/allenhuffman/GetPutData – routines that let you put bytes or other data types in a buffer. Code similar to this was used to create the message bytes (header, payload, checksum at the end, etc.)
  • https://github.com/allenhuffman/StructureToFromBuffer – routines that would let me define how bytes in a buffer should be copied into a C structure. I was proud of this approach since it let me pass a pointer to a buffer of bytes along with some tables and have it return a fully populated C structure ready to use. This greatly simplified the amount of work needed to use these messages.

But, as I like to point out, I am pretty lazy and hate doing so much typing. The thought of creating hundreds of functions by hand was not how I wanted to spend my time. Instead, I wanted to find a way to automate the creation of these functions. After all, they all followed the same pattern:

  • Header – containing Message Type, Command Code, number of byte in a payload, etc.
  • Payload – any payload bytes
  • Checksum – at the end

The only thing custom would be what Message Type and Command Code to put in, and if there was a payload to send, populating those bytes with the appropriate data.

When a response was received, it would be parsed based on the Message Type and Command Code, and return a C structure matching the response payload.

A program to write a program?

Initially, I thought about making a program or script that would spit out hundreds of functions. But, this not lazy enough. Sure, I could have done this and created hundreds of functions, but what if those functions needed a change later? I’d have to update the program that created the functions and regenerate all the functions all over again.

There has to be a better lazier way.

Macros: a better lazier way

I realized I could make a set of #define macros that could insert proper C code or prototypes. Then, if I ever needed to change something, I only had to change the macro. There would be no regeneration needed, since the next compile would use the updated macro. Magic!

It worked very well, and created hundreds and hundreds of functions without me ever having to type more than the ones in the macro.

It worked so well that I ended up using this approach very recently for another similar task I was far too lazy to do the hard way. I thought I would share that much simpler example in case you are lazy as well.

A much simpler example

At work we use LabWindows/CVI, a Windows C compiler with its own GUI. It has a GUI editor where you create your window with buttons and check boxes and whatever, then you use functions to load the panel, display it, and hide it when done. They look like this:

int panelHandle = LoadPanel (0, "PanelMAIN.uir", PANEL_MAIN);

DisplayPanel (panelHandle);

// Do stuff...

HidePanel (panelHandle);

DiscardPanel (panelHandle);

Then, when you interact with the panel, you have callback functions (if the user clicks the “OK” button, it jumps to a function you might name “UserClickedOKButtonCallback()” or whatever.

If you need to manipulate the panel, such as changing the status of a Control (checkbox, text box, or whatever), you can set Values or Attributes of those Controls.

SetCtrlVal (panelHandle, PANEL_MAIN_OK_BUTTON, 1);
SetCtrlAttribute (panelHandle, PANEL_MAIN_CANCEL_BUTTON, ATTR_DIMMED, 1);

It is a really simple system and one that I, as a non-windows programmer who had never worked with GUIs before, was able to pick up and start using quickly.

Simplicity can get complicated quickly…

One of the issues with this setup is that you had to have the panel handle in order to do something. If a message came in from a board indicating there was a fault, that code might need to toggle on some “RED LED” graphics on a GUI panel to indicate the faulted condition. But, that callback function may not have any of the panel IDs. The designed created a lookup function to work around this:

int mainPanelHandle = LookUpPanelHandle(MAIN_PANEL);
SetCtrlVal (mainPanelHandle, PANEL_MAIN_FAULT_LED, 1);

A function similar to that was in the same C file where all the panels were loaded. Their handles saved in variables, then the LookUp function would go through a huge switch/case with special defines for every panel and return the actual panel handle that matched the define passed in.

It worked great but it was slower since it had to scan through that list every time we wanted to look up a panel. At some point, all the panel handles were just changed to global variables so they could be accessed quickly without any lookup:

SetCtrlVal (g_MainPanelHandle, PANEL_MAIN_FAULT_LED, 1);

This also worked great, but did not work from threads that did not have access to the main GUI context. Since I am not a Windows programmer, and have never used threads on any embedded systems, I do not actually understand the problem (but I hear there are “thread safe” variables that can be used for this purpose).

Self-contained panel functions for the win!

Instead of learning those special “thread safe” techniques, I decided to create a set of self-contained panel functions so you could do things like this:

int mainPanelHandle = PanelMainInit (); // Load/init the main panel.
PanelMainDispay (); // Display the panel.
SetCtrlVal (mainPanelHandle, PANEL_MAIN_FAULT_LEFT, 1);
...
PanelMainHide ();
...
PanelMainTerm (); // Unload main panel and release the memory.

When I needed to access a panel from another routine, I would use a special function that returned the handle:

int panelMainHandle = PanelMainGetHandle ();
SetCtrlVal (mainPanelHandle, PANEL_MAIN_FAULT_LEFT, 1);

I even made these functions automatically Load the panel if needed, meaning a user could just start using a panel and it would be loaded on-demand if was not already loaded. Super nice!

Here is a simple version of what that code looks like:

static int S_panelHandle = 0; // Zero is not a valid panel handle.

int PanelMainInit (void)
{
    int panelHandle = 0;

    if (S_panelHandle <= 0) // Zero is not a valid panel handle. 
    {
        panelHandle = LoadPanel (0, "PanelMAIN.uir", PANEL_MAIN);

        // Only set our static global if this was successful.
        if (panelHandle > 0) // Zero is not a valid panel handle. 
        {
            S_panelHandle = panelHandle; 
        }
    }
    else // S_panelHandle was valid.
    {
        panelHandle = S_panelHandle;
    }	
    
    // Return handle or status in case anyone wants to error check.
    return panelHandle;
}

int PanelMainGetHandle (void)
{
    // Return handle or status in case anyone wants to error check.
    return PanelMainInit ();
}

int PanelMainTerm (void)
{
    int status = UIEHandleInvalid;

    if (S_panelHandle > 0) // Zero is not a valid panel handle.
    {
        status = DiscardPanel (S_panelHandle);
        if (status == UIENoError)
        {
            S_panelHandle = 0; // Zero is not a valid panel handle. 
        }
    }

    // Return status in case anyone wants to error check.
    return status;
}

int PanelMainDisplay (void)
{
    int status = UIEHandleInvalid;
    int panelHandle;
	
    panelHandle = PanelMainInit (); // Init if needed.
	
    if (panelHandle > 0) // Zero is not a valid panel handle.
    {
        status = DisplayPanel (panelHandle);
    }
	
    // Return status in case anyone wants to error check.
    return status;
}

int PanelMainHide (void)
{
    int status = UIEHandleInvalid;

    if (S_panelHandle > 0) // Zero is not a valid panel handle.
    {
        status = HidePanel (S_panelHandle);
    }
	
    // Return status in case anyone wants to error check.
    return status;
}

This greatly simplified dealing with the panels. Now they could “just be used” without worrying about loading, etc. There was no long Look Up table, and no global variables. The only places the panel handles were kept was inside the file where the panel’s functions were.

Nice and simple, and it worked even greater than the first two attempts.

…until you have to make a hundred of these functions…

…and then decide you need to change something and have to make that change in a hundred functions.

My solution was to use #define macros to generate the code and prototypes, then I would only have to change the macro to alter how all the panels works. (Spoiler: This worked even greater than the previous greater.)

In part 2, I will share a simple example of how this works. If you are lazy enough, you might actually find it interesting.

Until then…

C, I can be taught! At least about calloc.

A long, long time ago, I learned about malloc() in C. I could make a buffer like this:

char *buffer = malloc (1024);

…use it, then release it when I was done like this:

free (buffer);

I have discussed malloc here in the past, including a recent post about an error checking malloc I created to find a memory leak.

But today, the Bing CoPilot A.I. suggested I use calloc instead.

And I wasn’t even sure I even remembered this was a thing. And it has been there since before C was standardized…

void* calloc (size_t num, size_t size);

malloc() will return a block of memory and, depending on the operating system and implementation in the compiler, that memory may have old data in it. Hackers were known to write programs that would allocate blocks of memory then inspect it to see what they could find left over from another program previously using it.

Hey, everybody’s gotta have a hobby…

calloc() is a “clear allocation” where it will initialize the memory it returns to zero before returning it to you. It also takes two parameters instead of just one. While malloc() wants to know the number of bytes you wish to reserve, calloc() wants to how know many things (bytes, structures, etc.) you want to reserve, and the size of each thing.

To allocate 1024 bytes using calloc() you would use:

char *buffer = calloc (1, 1024);

…and get one thing of 1024 bytes. Or maybe you prefer reversing that:

char *buffer = calloc (1024, 1);

…so you get 1024 things that are 1 byte each.

Either way, what you get back is memory all set to zeros.

calloc() was suggested by the A.I. because I was allocating a set of structures like this:

// Typedefs
typedef struct
{
    int x;
    int y;
    int color;
} MyStruct;

MyStruct *array = malloc (sizeof(MyStruct) * 42);

The A.I. saw that, and suggested calloc() instead, like this:

MyStruct *array = calloc (42, sizeof(MyStruct));

I do think that looks a bit cleaner and more obvious, if you are familiar with calloc(), and as long as you don’t need the extra speed (setting that memory to 0 should take more time than not doing that), it seems like something to consider.

And maybe that will break me (and other programmers who wrote code before me that I may one day maintain) from doing it manually like…

char *ptr = malloc (1024);
memset (ptr, 0x0, 1024);

I wonder if I will even remember this the next time I need to malloc() something.

I mean calloc() something.

Whatever.

Until then…

UnderColor’s spiral challenge from 1984 – part 2

See Also: part 1, part 2, with part 3 and part 4 coming (and maybe more).

These types of posts are probably my favorite. Someone posts something with a coding challenge, and folks start submitting their ideas, and I get to play collector and just repost without doing any work… (Well, except I am submitting ideas as well, this time.)

This all kicked off when Michael Pittsley shared a contest he found in a 1984 issue of UnderColor magazine that challenged BASIC programmers to draw a specific spiral pattern:

Michael tackled it with this code:

10 ' SPIRAL THING
20 '
30 XL=3:YT=0:XR=252:YB=188:PX=3
40 PMODE 4,1
50 PCLS
60 SCREEN 1,1
65 LINE (XL,YB) - (XR-PX,YB),PSET
200 ' RIGHT
210 XR=XR-PX
220 LINE - (XR,YB),PSET
300 ' UP
306 YT=YT+PX
307 IF YT=96 THEN 600
310 LINE - (XR,YT), PSET
400 ' LEFT
410 XL=XL+PX
420 LINE - (XL,YT),PSET
500 ' DOWN
510 YB=YB-PX
520 LINE - (XL,YB),PSET
550 GOTO 200
600 GOTO 600

A nicely formatted and documented (with REMmarks) example of the power BASIC. He was even aware that LINE does not need both a start and end point each time you use it. If you just do “LINE-(x,y),PSET” it will draw from the last point to the new coordinates. This greatly reduces the amount of parsing BASIC has to do versus if you wrote the program always having a pair of start/end points.

Let’s modify it with some timing stuff.

10 ' SPIRAL THING
20 '
25 TIMER=0
30 XL=3:YT=0:XR=252:YB=188:PX=3
40 PMODE 4,1
50 PCLS
60 SCREEN 1,1
65 LINE (XL,YB) - (XR-PX,YB),PSET
200 ' RIGHT
210 XR=XR-PX
220 LINE - (XR,YB),PSET
300 ' UP
306 YT=YT+PX
307 IF YT=96 THEN 600
310 LINE - (XR,YT), PSET
400 ' LEFT
410 XL=XL+PX
420 LINE - (XL,YT),PSET
500 ' DOWN
510 YB=YB-PX
520 LINE - (XL,YB),PSET
550 GOTO 200
600 PRINT TIMER/60

Running it reports 3.16666667 seconds at the end, so really close to the time my version using DRAW got. AND, that is with REMs and spaces and multiple lines to parse. Let’s see what tricks we can do to speed it up by removing spaces, REMs, and packing lines together:

25 TIMER=0:XL=3:YT=0:XR=252:YB=188:PX=3::PMODE4,1:PCLS:SCREEN1,1:LINE(XL,YB)-(XR-PX,YB),PSET
200 XR=XR-PX:LINE-(XR,YB),PSET:YT=YT+PX:IFYT=96THEN600
310 LINE-(XR,YT),PSET:XL=XL+PX:LINE-(XL,YT),PSET:YB=YB-PX:LINE-(XL,YB),PSET:GOTO200
600 PRINT TIMER/60

This drops it down to 3.03333334! Parsing LINE may be faster than parsing a DRAW string. Also, longer line numbers take longer to parse, so we could RENUM0,0,1:

0 TIMER=0:XL=3:YT=0:XR=252:YB=188:PX=3:PMODE4,1:PCLS:SCREEN1,1:LINE(XL,YB)-(XR-PX,YB),PSET
1 XR=XR-PX:LINE-(XR,YB),PSET:YT=YT+PX:IFYT=96THEN3
2 LINE-(XR,YT),PSET:XL=XL+PX:LINE-(XL,YT),PSET:YB=YB-PX:LINE-(XL,YB),PSET:GOTO1
3 PRINTTIMER/60

However, this did not seem to make any consistent measurable difference.

A further optimization could be done by changing the 2-letter variables to 1-letter. But, beyond that, it would take adjusting the logic to find more improvements. Parsing integers is slow, such as “IF YT=96” as is parsing line numbers (thus, lines 1, 2 and 3 should be faster to “GOTO 1” than 100, 200, 300 and “GOTO 100”).

And, one question:: Michael started in from the corner of the screen. Here is a screen shot done in the Xroar emulator with artifact colors turned off so we can see the actual lines better:

This means this version is drawing a few lines less than the version I made. I need to go back and re-read the article and see if I got the details wrong, or if Michael is just keeping the 3-pixel border more consistent. I like the 3-pixel padding, but perhaps the first line to the right on the bottom should have gone further (3 pixel padding) and then at the top left a bit more (3 pixel padding).

Meanwhile, in response to Michael Pittsley‘s post. Andrew Ayers took the version I did and modifies it to work on the CoCo 3. The original contest was a few years before the CoCo 3 even existed, but one would imagine if the article had appeared later it might have had a contest for fastest CoCo 1/2 version, and fastest CoCo 3 version:

A couple more slightly improved CoCo 3 versions of Allen’s code above:

Both of these take longer to run than the original Allen posted, even with the high-speed poke…at least, running on XRoar Online (no idea about real hardware).

– Andres Ayers

CoCo 3 RGB b/w 320×192:

5 POKE65497,0
10 TIMER=0
20 HSCREEN2:PALETTE0,0:PALETTE1,63:HCOLOR1
30 W=320:H=191:HDRAW"BM0,191"
40 HDRAW"R=W;"
50 HDRAW"U=H;L=W;"
60 H=H-3:W=W-3
70 HDRAW"D=H;R=W;"
80 H=H-3:W=W-3
90 IF H>0 THEN 50
100 TM=TIMER
110 IF INKEY$="" THEN 110
120 PRINT TM/60
130 POKE65496,0

CoCo 3 RGB b/w 640×192:

5 POKE65497,0
10 TIMER=0
20 HSCREEN4:PALETTE0,0:PALETTE1,63:HCOLOR1
30 W=640:H=191:HDRAW"BM0,191"
40 HDRAW"R=W;"
50 HDRAW"U=H;L=W;"
60 H=H-3:W=W-3
70 HDRAW"D=H;R=W;"
80 H=H-3:W=W-3
90 IF H>0 THEN 50
100 TM=TIMER
110 IF INKEY$="" THEN 110
120 PRINT TM/60
130 POKE65496,0

It has been so long since I worked in BASIC on the CoCo 3 that I had forgotten about HDRAW and such being there. And, as expected, it takes longer to draw 320 or 640 pixel lines than it would to draw the 256 pixel lines on the CoCo 1/2 PMODE 4 display.

This does make me wonder how it would compare if limited to the same 256×192 resolution of PMODE 4. I expect the overhead of banking in and out the CoCo 3 high resolution graphics screens will add some extra time, and likely working with a screen with more than just 1-bit color (on/off pixels) is more memory to plot through.

Anyone want to work on that experiment?

Make it faster!

Revisiting my version from part 1, I changed the screen to PCLS1 (to make it white) and set COLOR 0 (to make the drawing black) so it would look like the printout in the magazine. This looks nice, but now we cannot tell how close to the original border of the image my results are:

I also see that because of the 256×191, the very last line does not appear to have a 3 pixel padding. Maybe we can look at the math later.

I simply took my version and combined lines and removed any spaces:

0 'SPIRAL2.BAS
5 FORA=1TO1000:NEXT
10 TIMER=0:PMODE4,1:PCLS1:SCREEN1,1:COLOR0:W=255:H=191:DRAW"BM0,191R=W;"
50 DRAW"U=H;L=W;":H=H-3:W=W-3:DRAW"D=H;R=W;":H=H-3:W=W-3:IFH>0THEN50
100 TM=TIMER
110 IF INKEY$="" THEN 110
120 PRINT TM/60

I could also do the “RENUM0,0,1” trick, but all of this only gets me to 3.016666667 seconds.

NOTE: I put a FOR/NEXT at the start so when I type “RUN” I have a moment to release the ENTER key before the timing starts. If you hit a key during the drawing, BASIC tries to handle that key and it will slow down the program. Run it and pound on the keyboard while it is drawing and you can see it slow down quite a bit (I got 3.7 seconds doing that).

But I digress…

Let’s collect some more examples, and see what other methods folks come up with. Now I want to get the same logic, just DRAW versus LINE to draw the lines, and see which one is faster.

To be continued…

UnderColor’s spiral challenge from 1984 – part 1

See Also: part 1, part 2, with part 3 and part 4 coming (and maybe more).

And now back to CoCo …

– Michael Pittsley posted in the TRS-80 Color Computer (CoCo) group on Facebook:

Many of us have our CoCos and have memories or how good we once were writing basic programs on it. Including myself. I found this article in the first UnderColor magazine. It was a contest to see who could write an ECB program that created a spiral. — Write an Extended Basic program that draws a spiral figure on graphics screen 0 on PMODE 4. The figure, when done should look like the picture. Use any combination of Basic commands, but no assembly language. The winner will be the person whose program executes in the shortest possible time. (Entries that simply list a series of LINE commands will be disqualified). I took a stab at it and realized how much I had forgotten about basic, so this was fun for me. I put my results as the first comment. Feel free to try your hand at it, post a screen shot and the time it took to complete.

– Michael Pittsley

This caught my attention.

UnderColor magazine (1984-1985) was one I never saw, though the name sounds familiar so I may have at least read a reference to it, or seen an ad for it somewhere. You can find the issues preserved here:

Search – TRS-80 Color Computer Archive

The article in question appeared in the first issue, which you can read here:

UNDERCOLOR Vol1 No 1 Dec 10, 1984

The article, by Bill Barden, presented a contest to see who could write a program in BASIC (no assembly allowed) that would generate a spiral as demonstrated by this graphic:

The winner would be the program that could do this in the least amount of time.

I have discussed drawing spirals on the CoCo in the past, but not like this. I also wrote about spirals for a text mode attract screen, but not like this.

So now let’s spiral like this.

LINE versus DRAW

The most obvious approach would be to use the LINE command. It takes a set of X and Y coordinates and draws a line between them, like this:

LINE (0,0)-(255,191),PSET

However, with what I know about BASIC these days (and wish I knew back then), that is alot of parsing of numbers and characters and such. That makes it slower than it might need to be.

One shortcut is that LINE remembers where it left off, so you can start a new line just by specifying the destination:

LINE-(127,0),PSET

Doing this trick should speed up a spiral program, since you only need to give the starting point once, then you can just “draw to the next spot” from then on out.

But I did not attempt this. Instead, I thought about DRAW.

The DRAW command is very powerful, and does allow you to draw to specific coordinates. You can do a “blank” draw just to move the starting point, like this:

DRAW"BM0,191"

That will do a Blank Move to (0,191), which is the lower left corner of the screen and the location where the spiral is supposed to start.

You can then do things like…

DRAW"R10"

…and that will draw a line 10 pixels to the right. (Well, the coordinates are scaled, I think, so it is 10 pixels on a PMODE 4 screen, but at other lower resolutions, that number of pixels will be scaled down.)

How can we spiral like that? One way would be to build a string and append it:

X=100
X$=STR$(X)
DRAW"R"+X$

That works, but all that parsing and creating strings and such would certainly be slower than using a built-in feature of DRAW which lets you use a variable inside the quotes! You just put “=” before the variable name, and a “;” after it.

X=100
DRAW"R=X;"

That will draw to the right however many pixels X is set to!

Could this be faster than line?

Here is what I came up with:

0 'SPIRAL1.BAS
10 TIMER=0
20 PMODE4,1:PCLS:SCREEN1,1
30 W=255:H=191:DRAW"BM0,191"
40 DRAW"R=W;"
50 DRAW"U=H;L=W;"
60 H=H-3:W=W-3
70 DRAW"D=H;R=W;"
80 H=H-3:W=W-3
90 IF H>0 THEN 50
100 TM=TIMER
110 IF INKEY$="" THEN 110
120 PRINT TM/60

This sets a TIMER variable at the start, draws the spiral, then reads the value of the TIMER again. When you press any key, the program exits and prints the time (TIMER/60) it took to run the main code.

Here is what I get:

And pressing a key shows me:

3.03333334

Three seconds.

I expect I can optimize my program to speed it up. In the meantime, what can you come up with? Is there a faster way?

Let’s play!

C and VLAs (Variable Length Arrays)

When you are old (or “experienced” if you prefer), you begin to realize how much of what you learned is wrong. Even if it was “right” when you learned it. I think of all peers that went through computer courses at colleges back in the late 1980s or 1990s, learning now-obsolete languages and being taught methods and approaches that are today considered wrong.

When I learned C, it was on a pre-ANSI K&R C compiler. I learned it on my Radio Shack Color Computer 3 under the OS-9 operating system, with assistance from a friend of mine who had learned C on his Commodore Amiga.

I had alot of new things to learn in 1995 when I took a job with Microware Systems Corporation (creator of OS-9 and the K&R compiler I had learned on). Their Ultra-C compiler was an ANSI compiler, and it did things quite different.

In that era of the C89/C90 standard, arrays were just arrays and we liked it that way:

int array[42];

if you wanted things to be more flexible, you had to malloc() memory yourself.

int *array = malloc (sizeof(int)*42);

…and remember to stay within your boundaries and clean up/free that memory when you were done with it.

But C99 changed this, somewhat, with the introduction of VLAs (Variable Length Arrays). Now you could declare an array using a variable like this:

int x=42;

int array(x);

Neat. I do not think I have ever used this. One downside is you cannot do this with static variables, since those are created/reserved at compile time. But it is still neat.

But today I learned, you couldn’t rely on VLA is you were using C11. Apparently, they became optional that year. A compiler would define a special define if it did not support them:

__STD_NO_VLA__

But at least for twelve years of the standard, you could rely on them, before not being able to rely on them.

And then C23 happened, which I just learned made VLAs mandatory again.

So, uh, I guess if you have the latest and greatest, you can use them. For now. Until some future change makes them option again. Or removes them. Or whatever.

Still neat.

But I doubt any of the embedded C compilers I use for my day job support them.

Ciaran “Xroar” Anscomb’s PCLEAR 0 without assembly!

On the CoCo mailing list, Ciaran (author of the Xroar emulator), said this:

FWIW this is the bodge I have saved out for doing PCLEAR0 without such
ROM patching:

POKE183,PEEK(183)-6:POKE188,PEEK(188)-6:PCLEAR1:POKE183,PEEK(183)+6:POKE188,PEEK(188)+6

Annoyingly verbose, but should account for DOS, and works on the
Dragon too.

..ciaran

https://pairlist5.pair.net/mailman/listinfo/coco

This topic came up because of Juan Castro‘s experiments with updating HDB-DOS to add new functionality on a CoCo 1 and 2 (but that is a discussion for a dedicated blog post sometime). Juan had recently “fixed” Extended Color BASIC to allow using “PCLEAR 0” to remove all graphics memory and give more RAM to BASIC. I have discussed PCLEAR 0 in the past

This mysterious line performs a PCLEAR 0 without needing to load and run program of assembly code!

POKE183,PEEK(183)-6:POKE188,PEEK(188)-6:PCLEAR1:POKE183,PEEK(183)+6:POKE188,PEEK(188)+6

And it works!

But … how does it work!?!

Ciaran, you’ve got some ‘splainin’ to do…

Until then…

A safer memcpy with very limited use cases

Here is a quick one… At my day job, I found lines of code like this:

memcpy(systemDataPtr->serialNumber, resp.serialNumber, 16);

A quick peek at systemDataPtr->serialNumber shows it defined as this:

unsigned char serialNumber[MAX_SERIAL_NUMBER_LENGTH];

…with that constant defined as:

#define MAX_SERIAL_NUMBER_LENGTH        16

So while 16 is correct, the use of hard-coded “magic numbers” (hat tip to a previous manager, Pete S., who introduced me to that term) is probably best to be avoided. Change that #define, and things could go horribly wrong with a memory overrun or massive nuclear explosion or something.

One simple fix is to use the #define in the memcpy:

memcpy(systemDataPtr->serialNumber, resp.serialNumber, MAX_SERIAL_NUMBER_LENGTH);

This, of course, assumes that resp.serialNumber is also 16. Let’s see:

char serialNumber[16];

Ah, magic number! In this case, it comes from a DLL header file that does not share that #define, and the header file for the DLL was made by someone who had never made a Windows DLL before (me) and did not make #defines for these various lengths.

What if the DLL value ever got out-of-sync? Worst case, not all data would be copied (only 16 bytes). That seems fine. But if the DLL value became smaller, like 10, then the memcpy would still copy 16 bytes, copying the 10 from the DLL buffer plus 6 bytes of data in memory after it — buffer overrun?

In this case, since the destination buffer can hold 16 bytes, and we only copy up 16 bytes, the worst case is we could get some unintended data in that buffer.

sizeof() exists for a reason.

One thing I tend to do is use sizeof() instead of hard-coded numbers or the #define, since it will continue to work if the source buffer ever got changed from using the #define:

memcpy(systemDataPtr->serialNumber, resp.serialNumber, sizeof(systemDataPtr->serialNumber));

But this still has the same issue if the source resp.serialNumber became larger.

A safer, and more ridiculous, memcpy

Naturally, I came up with a ridiculous “solution”: A safer memcpy() that is much more of a pain to use because you have to know the size of each buffer and tell it the size of each buffer so it can make sure not to copy something larger than will fit into the destination buffer.

Here is the prototype of memcpy():

void * memcpy ( void * destination, const void * source, size_t num );

It will blindly copy up to “num” bytes from “source” to “destination”. But a ridiculous safer memcpy might look like this:

void * memcpy_safer ( void * destination, size_t sizeOfDestination,
                      const void * source, size_t sizeOfSource,
                      size_t num );

Just think of the extra overhead to add two more parameters for every use! Plus, it is a longer function name so you get to type even more! Wonderful.

Here is a quick implementation:

void * memcpy_safer ( void * destination, size_t sizeOfDestination,
                      const void * source, size_t sizeOfSource,
                      size_t num )
{
    // Use whichever size is the smallest.
    if ((num > sizeOfDestination) || (num > sizeOfSource))
    {
        if (sizeOfDestination < sizeOfSource)
        {
            num = sizeOfDestination;
        }
        else
        {
            num = sizeOfSource;
        }
    }
    
    return memcpy ( destination, source, num);
}

Now that the function knows the size of source and destination, it can do a simple check to make sure to use whichever is smallest for the max number of bytes to copy — regardless (or irregardless, depending on which you prefer) of how many bytes you specified.

If both buffers are larger than the “num” passed in, num is used. But if “num” is larger than either buffer, it is reduced to match the largest of the two buffers.

Note that this does not do any NULL checks, so it is relying on the behavior of memcpy() which would likely be a crash if NULL is passed in. To improve, NULL checks could be added.

I decided to pass this function into Bing CoPilot and see what it “thought” about this:

Limitations & Risks

Silent truncation: If num is too large, the function silently reduces it. This can lead to subtle bugs if the caller expects all bytes to be copied.

No error signaling: There’s no return code or logging to indicate that truncation occurred.

Still not “safe” in all cases: If destination or source are NULL, or if the buffers overlap, memcpy may still cause undefined behavior.

No runtime buffer validation: It assumes the caller provides correct buffer sizes, which may not be guaranteed.

– Bing CoPilot

And, of course, the A.I. offered up these changes:

#include <string.h>
#include <stddef.h>
#include <stdio.h>

void * memcpy_safer ( void * destination, size_t sizeOfDestination,
                      const void * source, size_t sizeOfSource,
                      size_t num )
{
    if (!destination || !source) {
        fprintf(stderr, "Error: NULL pointer passed to memcpy_safer.\n");
        return NULL;
    }

    if (num > sizeOfDestination || num > sizeOfSource) {
        size_t adjusted = (sizeOfDestination < sizeOfSource) ? sizeOfDestination : sizeOfSource;
        fprintf(stderr, "Warning: Truncating copy from %zu to %zu bytes.\n", num, adjusted);
        num = adjusted;
    }

    return memcpy(destination, source, num);
}

That version adds NULL checks, returns a NULL if either buffer passed in was NULL, and adds prints to standard error if a NULL happens or if the value was truncated.

Not bad, predictive language model.

My ridiculous test program

Here is my test program, which I wrote using the Online GDB C compiler:

/******************************************************************************

Welcome to GDB Online.
  GDB online is an online compiler and debugger tool for C, C++, Python, PHP, Ruby, 
  C#, OCaml, VB, Perl, Swift, Prolog, Javascript, Pascal, COBOL, HTML, CSS, JS
  Code, Compile, Run and Debug online from anywhere in world.

*******************************************************************************/
#include <stdint.h> // for uint8_t
#include <stdio.h>  // for printf()
#include <stdlib.h> // for EXIT_SUCCESS
#include <string.h> // for memcpy()

/*---------------------------------------------------------------------------*/
// PROTOTYPES
/*---------------------------------------------------------------------------*/

void * memcpy_safer ( void * destination, size_t sizeOfDestination,
                      const void * source, size_t sizeOfSource,
                      size_t num );

void * memcpy_safer2 ( void * destination, size_t sizeOfDestination,
                       const void * source, size_t sizeOfSource,
                       size_t num );

void initializeBuffer (void *dataPtr, size_t dataSize, uint8_t value);

void dumpBuffer (const char* prefix, void *dataPtr, size_t dataSize);

/*---------------------------------------------------------------------------*/
// MAIN
/*---------------------------------------------------------------------------*/

int main()
{
    uint8_t smallerBuffer[10];
    uint8_t largerBuffer[15];
    
    // Test 1: copy longer buffer into smaller buffer.
    
    printf ("\nInitialized buffers:\n\n");    
    
    // Initialize buffers with something we can identify later.
    initializeBuffer (smallerBuffer, sizeof(smallerBuffer), 0x1);
    dumpBuffer ("smallerBuffer", smallerBuffer, sizeof(smallerBuffer));

    initializeBuffer (largerBuffer, sizeof(largerBuffer), 0x2);
    dumpBuffer ("largerBuffer ", largerBuffer, sizeof(largerBuffer));

    printf ("\nTest 1: Copying largerBuffer into smallerBuffer...\n\n");

    memcpy_safer (smallerBuffer, sizeof(smallerBuffer), largerBuffer, sizeof(largerBuffer), 42);

    dumpBuffer ("smallerBuffer", smallerBuffer, sizeof(smallerBuffer));

    // Test 2: copy smaller buffer into larger buffer.

    printf ("\nInitialized buffers:\n\n");

    // Initialize buffers with something we can identify later.
    initializeBuffer (smallerBuffer, sizeof(smallerBuffer), 0x1);
    dumpBuffer ("smallerBuffer", smallerBuffer, sizeof(smallerBuffer));

    initializeBuffer (largerBuffer, sizeof(largerBuffer), 0x2);
    dumpBuffer ("largerBuffer ", largerBuffer, sizeof(largerBuffer));

    printf ("\nTest 2: Copying smallerBuffer into largerBuffer...\n\n");

    memcpy_safer (largerBuffer, sizeof(largerBuffer), smallerBuffer, sizeof(smallerBuffer), 42);

    dumpBuffer ("largerBuffer ", largerBuffer, sizeof(largerBuffer));

    return EXIT_SUCCESS;
}


/*---------------------------------------------------------------------------*/
// FUNCTIONS
/*---------------------------------------------------------------------------*/

/*---------------------------------------------------------------------------*/
// My ridiculous "safer" memcpy.
/*---------------------------------------------------------------------------*/
void * memcpy_safer ( void * destination, size_t sizeOfDestination,
                      const void * source, size_t sizeOfSource,
                      size_t num )
{
    // Use whichever size is the smallest.
    if ((num > sizeOfDestination) || (num > sizeOfSource))
    {
        if (sizeOfDestination < sizeOfSource)
        {
            num = sizeOfDestination;
        }
        else
        {
            num = sizeOfSource;
        }
    }
    
    return memcpy ( destination, source, num);
}


/*---------------------------------------------------------------------------*/
// Bing CoPilot changes.
/*---------------------------------------------------------------------------*/
void * memcpy_safer2 ( void * destination, size_t sizeOfDestination,
                       const void * source, size_t sizeOfSource,
                       size_t num )
{
    if (!destination || !source) {
        fprintf(stderr, "Error: NULL pointer passed to memcpy_safer.\n");
        return NULL;
    }

    if (num > sizeOfDestination || num > sizeOfSource) {
        size_t adjusted = (sizeOfDestination < sizeOfSource) ? sizeOfDestination : sizeOfSource;
        fprintf(stderr, "Warning: Truncating copy from %zu to %zu bytes.\n", num, adjusted);
        num = adjusted;
    }

    return memcpy(destination, source, num);
}


/*---------------------------------------------------------------------------*/
// Utility function to initialize a buffer to a set value.
/*---------------------------------------------------------------------------*/
void initializeBuffer (void *dataPtr, size_t dataSize, uint8_t value)
{
    if (NULL != dataPtr)
    {
        memset (dataPtr, value, dataSize);
    }
}


/*---------------------------------------------------------------------------*/
// Utility function to dump bytes in a buffer, with an optional prefix.
/*---------------------------------------------------------------------------*/
void dumpBuffer (const char* prefix, void *dataPtr, size_t dataSize)
{
    if (NULL != dataPtr)
    {
        if (NULL != prefix)
        {
            printf ("%s: ", prefix);
        }

        for (size_t idx=0; idx<dataSize; idx++)
        {
            printf ("%02x ", ((uint8_t*)dataPtr)[idx]);
        }
        printf ("\n");
    }
}

// End of memcpy_safer.c

If you want to run it there, you can use this link:

https://onlinegdb.com/Eu7FToIcQ

But of course, I am not using this code. It is ridiculous and requires extra typing.

Besides, I know exactly what I am doing in C and never make any mistakes… Really.

Until next time…

Google Street View scripts and A.I. emojis

When capturing video for Google Street View, Google recommends using 1 frame per second video for walking, and 5 frames per second for biking and lower speeds. A full 30 or even 60 fps video is unnecessarily huge and will take much longer to upload and process … and most of the frames will be discarded by Google anyway.

I had one of the A.I.s (probably CoPilot) automate using the ffmpeg open source command line tool so I could batch convert files in a directory. A very rough work-in-progress version is on my GitHub now:

allenhuffman/GoogleStreetViewScripts: Scripts for converting videos before uploading to Google Street View

I have noticed the A.I.s are starting to put emojis in things — including code and scripts they generate!

I don’t even know how to type emojis in uMacs or VI ;-) but apparently they are supported these days.

Have you noticed the increase in emojis in A.I. responses lately?

I’d end this post with an emoji, but I do not know how to type one in WordPress . . .