C strings and pointers and arrays…

In a previous post about using sizeof() on string literals, there was an interesting comment by S. Enevoldsen:

To better remember this realize that arrays are not pointers, and string literals are arrays (that can decay to pointers).

const char arrayVersion[] = “1.0.42-beta”;
const char* pointerString = “1.0.42-beta”;
printf (“sizeof(arrayVersion) = %d\n”, sizeof(arrayVersion));
printf (“sizeof(pointerString) = %d\n”, sizeof(pointerString));

Outputs

sizeof(arrayVersion) = 12
sizeof(pointerString) = 4

– S. Enevoldsen

If I knew this, I have long forgotten it. Over the years at my “day jobs” I have gotten used to making string pointers like this:

const char *versionStringPtr = "1.0.42-beta";

I generally add the “Ptr” at the end to remind me (or other programmers) that it is a pointer to a string. In my mind, I knew I could have done “char *string” or “char string[]” and gotten the same use from normal code, but I do not recall if I knew they were treated differently.

What do you expect the output of this to be?

#include <stdio.h>
#include <stdlib.h> // for EXIT_SUCCESS
#include <string.h> // for strlen()

int main()
{
    const char *stringPtr = "hello";
    
    printf ("sizeof(stringPtr) = %ld\n", sizeof(stringPtr));
    printf ("strlen(stringPtr) = %ld\n", strlen(stringPtr));

    printf ("\n");

    const char string[] = "hello";

    printf ("sizeof(string) = %ld\n", sizeof(string));
    printf ("strlen(string) = %ld\n", strlen(string));

    return EXIT_SUCCESS;
}

Output would show … what?

sizeof(stringPtr) = ???
strlen(stringPtr) = ???

sizeof(string) = ???
strlen(string) = ???

To be continued…

8 thoughts on “C strings and pointers and arrays…

  1. Sean Patrick Conner

    I would expect the following:

    sizeof(stringPtr) = 8; /* or 4 or 2, depending upon the pointer size */
    strlen(stringPtr) = 5;

    sizeof(string) = 6; /* because of the NUL byte at the end */
    strlen(string) = 5;

    Also, when you want to print a size_t value, using %zu.

    Reply
    1. Allen Huffman Post author

      Wild. I use cplusplus.com for reference and I has a whole block with % things I am completely unaware of. Sadly, my programming on embedded systems usually doesn’t support C99 and other new things. I’ll try to add this to my memory. I only recently learned about %p (PC compiler handles it, but the one I use at work does not.)

      Reply
      1. Sean Patrick Conner

        %zu was added in C99, but %p was added in C89. C99 also added <inttypes.h> for formatting a bunch of different integer types (annoying to use, but at least it exists now).

        Reply
        1. Allen Huffman Post author

          My first “real job” in the mid 1990s used a compiler that was ANSI, but before stdint.h, and we had our own types.h that defined them. A lot changed between that time, and me returning to programming in 2012. Being in an embedded world means I am often using weird compilers and tools (IAR, Renesys, CCS, etc.) that are “C in name only.” It seems.

          Reply
    2. Allen Huffman Post author

      You see more clearly than I. I was immediately puzzled by the output and had to think about it. Had I chosen a much longer string, it would have been more obvious. (Actually, it should have been obvious anyway, but my brain is old and lazy.)

      Reply
      1. Sean Patrick Conner

        I learned pretty quickly how fast C arrays decay into pointers, and how a misdeclaration (like extern int *array; in a header, and int array[100] in a C file) can be deadly. I also learned that a bare char is neither signed nor unsigned, but its own thing (as it’s implementation dependent if char is signed or unsigned by default).

        Reply
        1. Allen Huffman Post author

          Char was, in K&R, a signed value, which never made sense to me. Once uint8_t came to exist, that seemed to take care of it. But I still see chars used for non character binary data, everywhere.

          My deal was seeing how buffer[80] could have a known size, but once it was passed in to a function by address you lost that ability. It’s no wonder buffer overruns remain a problem even in 2024.

          I got in a weird habit of always passing in stuff like…

          &buffer[0]

          I picked that up at a job, and thought it was weird, but it makes it abundantly clear that you are passing in the location of the start of some amount of data.

          And I think it’s weird you can pass in “buffer” or “&buffer” and they both work…

          Reply

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.