BASIC INSTR revisited with special guest C strstr

In a recent post about BASIC INSTR, a few comments were left trying to clarify or justify or explain the behavior of why “” is reported as matching the first character in the search string.

PRINT INSTR("ABC","A")
1

PRINT INSTR("ABC","C")
3

PRINT INSTR("ABC","X")
0

PRINT INSTR("ABC","")
1

Copilot AI at least pretends to know the actual reason, which involves saving a few precious bytes in the early 4K Microsoft ROM. The behavior was then retained for backwards compatibility as Microsoft BASIC expanded. Copilot even reports the behavior continued into VB.NET, though I did not fact check this answer.

LuisCOCO commented:

This is logical, if you search for three letters it searches for that piece but when searching for nothing it starts in the first position and searches for zero characters in the destination since it is useless to search for more letters if you search for fewer letters, that brings a null string as the first data found and compares it with the null to search and gives true

– LuisCOCO

Indeed, without a check for an empty string, the search would stop there. “Compare an X byte string to a Y byte string”. As the scan is performed, I suspect the moment a character does not match, it sets the return value to 0. Without looking at the code (yet), my assumption is that it starts with 1 as the return value and just exits when no more bytes are available to compare (immediately with a zero byte target string) leaving the 1 there. Since code exists to flag it with a zero if it a mismatch is encountered, perhaps they did just want to save a few instructions rather than doing a check then branch to that zero return.

Unless there truly is an intentional reason it works this way.

William “Lost Wizard” Astle, who wrote the 6809 assembly (LWasm) I use quite often, explained some math stuff:

Put another way, the empty string is self-evidently a valid substring of every string at every position within that string so it will naturally match the position where the search starts. This is also mathematically correct when you look at set theory. The null set is a subset of all sets.

– William Astle

It seems math folks are fine with how INSTR works. Though math set theory is not where my mind would have been as a child learning BASIC back in junior high school.

William Astle then sends me on a side quest with this comment…

Incidentally, strstr() in C does exactly the same thing with an empty target string.

– William Astle

Let me C what I can find…

strstr() works similarly to BASIC’s INSTR, except it returns a pointer to where the match was found in memory. If no match was found, it returns a NULL pointer (which is 0, matching INSTR).

const char * strstr ( const char * str1, const char * str2 );

BASIC gives back a base-1 index, so 1 for a match at the first character of the search string, or 5 for a match starting at the 5th character of the search string. Strings in MS BASIC used “normal” base-1 counting numbers for the index. If you use MID$(), LEFT$(), RIGHT$() and similar string functions, they all use 1 for the first character in a string.

C uses base-0 for indexes, so if you have a string in C:

char *st = "ABC";

…the first character would be index 0, st[0], while in BASIC:

ST$="ABC"

…the first character would be index 1, MID$(ST$,1,1).

strstr

A quick test program to confirm…

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main (int argc, char **argv)
{
char *search = "ABC";
char *target = NULL;
char *ptr = NULL;

// const char * strstr ( const char * str1, const char * str2 );
// A pointer to the first occurrence in str1 of the entire sequence of
// characters specified in str2, or a null pointer if the sequence is
// not present in str1.

printf ("search = '%s' (%p)\n", search, search);

target = "A";
ptr = strstr (search, target);
if (NULL == ptr)
{
printf ("target '%s' not found in '%s'\n", target, search);
}
else
{
printf ("target '%s' found in '%s' at index %d\n", target, search,
(ptr - search));
}

target = "C";
ptr = strstr (search, target);
if (NULL == ptr)
{
printf ("target '%s' not found in '%s'\n", target, search);
}
else
{
printf ("target '%s' found in '%s' at index %d\n", target, search,
(ptr - search));
}

target = "X";
ptr = strstr (search, target);
if (NULL == ptr)
{
printf ("target '%s' not found in '%s'\n", target, search);
}
else
{
printf ("target '%s' found in '%s' at index %d\n", target, search,
(ptr - search));
}

target = "";
ptr = strstr (search, target);
if (NULL == ptr)
{
printf ("target '%s' not found in '%s'\n", target, search);
}
else
{
printf ("target '%s' found in '%s' at index %d\n", target, search,
(ptr - search));
}

return EXIT_SUCCESS;
}

Yuck. I should have made a function… This produces:

search = 'ABC' (0x58a8ed9ba008)
target 'A' found in 'ABC' at index 0
target 'C' found in 'ABC' at index 2
target 'X' not found in 'ABC'
target '' found in 'ABC' at index 0

I realize my example is … not optimal. strstr() is returning a pointer to memory, and my example is subtracting the pointer where the target string was found from the pointer where the search string is. If they are the same, the math is 0 for the first character. But if the function actually returns a 0 (NULL), that means it was not found. I should have considered that before writing this example ;-)

But I digress…

Which came first?

A quick (and non fact-checked) web search about when strstr was added to C reveals:

The strstr function was added to the C standard library as part of the ANSI C standard (C89/C90), which was approved in 1989 and published in 1990.

Well, at least it worked the same way BASIC programmers would have been used to. But I do wonder if there was an earlier implementation of BASIC that introduced INSTR, or similar.

Another quick (and non fact-checked) web search says DEC BASIC-PLUS for the PDP-11 had a similar function and was introduced in 1970.

If I can figure out how to use the PDP-11 online emulator, maybe I can figure this out:

https://www.pcjs.org/software/dec/pdp11/tapes/basic

But here is the manual, which specifically documents this behavior:

The next-to-last entry:

“If only the substring is null, and if int-exp is equal to zero, INSTR returns a value of 1.”

https://www.dmv.net/dec/pdf/bp2v27rm.pdf

Though, that manual has a copyright of 1987 and 1991, so… maybe not quite where it started.

I found one from 1975:

https://www.bitsavers.org/pdf/dec/pdp11/rsts_e/V06/DEC-11-ORBPB-A-D_BASIC-PLUS_LangMan_Jul75.pdf

“If B$ is a null string (B$ = “”), the INSTR functions returns the value 1. The null string is a proper substring of any string and is treated conventionally as the first element of A$ in null string search operations. In addition, if both A$ and B$ and null strings, the INSTR function returns the value of 1″

https://www.bitsavers.org/pdf/dec/pdp11/rsts_e/V06/DEC-11-ORBPB-A-D_BASIC-PLUS_LangMan_Jul75.pdf

The 1970 manual does not show INSTR in the index:

http://bitsavers.trailing-edge.com/pdf/dec/pdp11/lang/basic/basic_pts/DEC-11-AJPB-D_PDP-11_BASIC_Programming_Manual_Dec70.pdf

But Google AI tells me it was in a 1971 edition:

http://ftpmirror.your.org/pub/misc/bitsavers/pdf/dec/pdp11/rsts/PL-11-71-01-01-A-D_RSTS-11UsersGuide_May71.pdf

In that 1971 manual, I found this:

http://ftpmirror.your.org/pub/misc/bitsavers/pdf/dec/pdp11/rsts/PL-11-71-01-01-A-D_RSTS-11UsersGuide_May71.pdf

Unfortunately, the 1971 manual does not specifically address what happens with a null/empty string, but we can for sure say it was definitively documented to do that in the 1975 manual, and without any mention to it being a change from the earlier releases, likely was there in 1971.

I think we have a winner. As described, it is clear the intent matches William Astle‘s comment about null being a proper substring of any string. Microsoft implemented it the way it

Conclusion

I guess all I have to say is that I’d have preferred is for INSTR to return 0 if an empty string was passed in.

Unless someone can tell me the benefit of having it return 1, that is. I am a big fan of changing my own personal views when I learn new information. Just ask me about politics sometime. . . #independent

2 thoughts on “BASIC INSTR revisited with special guest C strstr

  1. William Astle

    I would think searching for an empty substring doesn’t come up very often in typical code compared to searching for non-empty substrings and the naïve approach to comparing strings naturally gives the result we see with INSTR and strstr(). You can possibly make a case for it raising an error return of some kind, but I think “not found” would violate correctness given that a sequence of zero characters can never not be found.

    Reply
    1. Allen Huffman Post author

      “Go outside and find nothing.”

      I am unsure what the answer to that would be ;-)

      I very much like that the 1975 manual documented this. Had that been in our BASIC manual, I would have just shrugged and moved on. Unlike some other things that are actual bugs, I’d have accepted it as “okay, you can’t put INKEY$ in INSTR because of this” and moved over.

      I dislike very much that my SchedulePress plug in is posting things immediately when I drag them into a date that is not immediately.

      Reply

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.