char versus C versus C#

Updates:

  • 2020-07-16 – Added a few additional notes.

I am mostly a simple C programmer, but I do touch a bit of C# at my day job.

If you don’t think about what is going on behind the scenes, languages like Java and C# are quite fun to work with. For instance, if I was pulling bytes out of a buffer in C, I’d have to write all the code manually. For example:

Buffer: [ A | B | C | D | D | E | E | E | E ]

Let’s say I wanted to pull three one-byte values out of the buffer (A, B and C), followed by a two-byte value (DD), and a four byte value (EEEE). There are many ways to do this, but one lazy way (which breaks if the data is written on a system with different endianness to how data is stored) might be:

#include <stdint.h>

uint8_t a, b, c;
uint16_t d;
uint32_t e;

a = Buffer[0];
b = Buffer[1];
c = Buffer[2];
memcpy (&d, &Buffer[3], 2);
memcpy (&e, &Buffer[5], 4);

There is much to critique about this example, but this post is not about being “safe” or “portable.” It is just an example.

In C#, I assume there are many ways to achieve this, but the one I was exposed to used a BitConverter class that can pull bytes from a buffer (array of bytes) and load them in to a variable. I think it would look something like this:

UInt8 a, b, c;
UInt16 d;
UInt32 e;

a = Buffer[0];
b = Buffer[1];
c = Buffer[2];
d = BitConverter.ToInt16(Buffer, 3);
e = BitConverter.ToInt32(Buffer, 5);

…or something like that. I found this code in something new I am working on. It makes sense, but I wondered why some bytes were being copied directly (a, b and c) and others went through BitConverter. Does BitConverter not have a byte copy?

I checked the BitConverter page and saw there was no ToUInt8 method, but there was a ToChar method. In C, “char” is signed, representing -127 to 128. If we wanted a byte, we’d really want an “unsigned char” (0-255), and I did not see a ToUChar method. Was that why the author did not use it?

Here’s where I learned something new…

The description of ToChar says it “Returns a Unicode character converted from two bytes“.

Two bytes? Unicode can represent more characters than normal 8-bit ASCII, so it looks like a C# char is not the same as a C char. Indeed, checking the Char page confirms it is a 16-bit value.

I’m glad I read the fine manual before trying to “fix” this code like this:

Char a, b, c;
UInt16 d;
UInt32 e;

// The ToChar will not work as intended!
a = BitConverter.ToChar(Buffer, 0); //Buffer[0];
b = BitConverter.ToChar(Buffer, 1); //Buffer[1];
c = BitConverter.ToChar(Buffer, 2); //Buffer[2];
d = BitConverter.ToInt16(Buffer, 3);
e = BitConverter.ToInt32(Buffer, 5);

For a C programmer experimenting in C# code, that might look fine, but had I just tried it without reading the manual first, I’d have been puzzled why it was not copying the values I expected from the buffer.

Making assumptions about languages, even simple things like a data type “char”, can cause problems.

Thank you, manual.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.