Updates:
- 2020-07-16 – Added a few additional notes.
I am mostly a simple C programmer, but I do touch a bit of C# at my day job.
If you don’t think about what is going on behind the scenes, languages like Java and C# are quite fun to work with. For instance, if I was pulling bytes out of a buffer in C, I’d have to write all the code manually. For example:
Buffer: [ A | B | C | D | D | E | E | E | E ]
Let’s say I wanted to pull three one-byte values out of the buffer (A, B and C), followed by a two-byte value (DD), and a four byte value (EEEE). There are many ways to do this, but one lazy way (which breaks if the data is written on a system with different endianness to how data is stored) might be:
#include <stdint.h> uint8_t a, b, c; uint16_t d; uint32_t e; a = Buffer[0]; b = Buffer[1]; c = Buffer[2]; memcpy (&d, &Buffer[3], 2); memcpy (&e, &Buffer[5], 4);
There is much to critique about this example, but this post is not about being “safe” or “portable.” It is just an example.
In C#, I assume there are many ways to achieve this, but the one I was exposed to used a BitConverter class that can pull bytes from a buffer (array of bytes) and load them in to a variable. I think it would look something like this:
UInt8 a, b, c; UInt16 d; UInt32 e; a = Buffer[0]; b = Buffer[1]; c = Buffer[2]; d = BitConverter.ToInt16(Buffer, 3); e = BitConverter.ToInt32(Buffer, 5);
…or something like that. I found this code in something new I am working on. It makes sense, but I wondered why some bytes were being copied directly (a, b and c) and others went through BitConverter. Does BitConverter not have a byte copy?
I checked the BitConverter page and saw there was no ToUInt8 method, but there was a ToChar method. In C, “char” is signed, representing -127 to 128. If we wanted a byte, we’d really want an “unsigned char” (0-255), and I did not see a ToUChar method. Was that why the author did not use it?
Here’s where I learned something new…
The description of ToChar says it “Returns a Unicode character converted from two bytes“.
Two bytes? Unicode can represent more characters than normal 8-bit ASCII, so it looks like a C# char is not the same as a C char. Indeed, checking the Char page confirms it is a 16-bit value.
I’m glad I read the fine manual before trying to “fix” this code like this:
Char a, b, c; UInt16 d; UInt32 e; // The ToChar will not work as intended! a = BitConverter.ToChar(Buffer, 0); //Buffer[0]; b = BitConverter.ToChar(Buffer, 1); //Buffer[1]; c = BitConverter.ToChar(Buffer, 2); //Buffer[2]; d = BitConverter.ToInt16(Buffer, 3); e = BitConverter.ToInt32(Buffer, 5);
For a C programmer experimenting in C# code, that might look fine, but had I just tried it without reading the manual first, I’d have been puzzled why it was not copying the values I expected from the buffer.
Making assumptions about languages, even simple things like a data type “char”, can cause problems.
Thank you, manual.