Checksums and zeros and XMODEM and randomness.

A year or two ago, I ran across some C code at my day that finally got me to do an experiment…

When I was first using a modem to dial in to BBSes, it was strictly a text-only interface. No pictures. No downloads. Just messages. (Heck a physical bulletin board at least would let you put pictures on it! Maybe whoever came up with the term BBS was just forward thinking?)

The first program I ever had that sent a program over the modem was DFT (direct file transfer). It was magic.

Later, I got one that used a protocol known as XMODEM. It seems like warp speed compared to DFT!

XMODEM would send a series of bytes, followed by a checksum of those bytes, then the other end would calculate a checksum over the received bytes and compare. If they matched, it went on to the next series of bytes… If it did not, it would resend those bytes.

Very simple. And, believe it or not, checksums are still being used by modern programmers today, even though newer methods have been created (such as CRC).

Checking the sum…

A checksum is simple the value you get when you add up all the bytes of some data. Checksum values are normally not floating point, so they will be limited to a fixed range. For example, an 8-bit checksum (using one byte) can hold a value of 0 to 255. A 16-bit checksum (2 bytes) can hold a value of 0-65535. Since checksums can be much higher values, especially if using an 8-bit checksum, the value just rolls over.

For example, if the current checksum calculated value is 250 for an 8-bit checksum, and the next byte being counted is a 10, the checksum would be 250+10, but that exceeds what a byte can hold. The value just rolls over, like this:

250 + 10: 251, 252, 253, 254, 255, 0, 1, 2, 3, 4

Thus, the checksum after adding that 10 is now 4.

Here is a simple 8-bit checksum routine for strings in Color BASIC:

0 REM CHKSUM8.BAS
10 INPUT "STRING";A$
20 GOSUB 100
30 PRINT "CHECKSUM IS";CK
40 GOTO 10

100 REM 8-BIT CHECKSUM ON A$
110 CK=0
120 FOR A=1 TO LEN(A$)
130 CK=CK+ASC(MID$(A$,A,1))
140 IF CK>255 THEN CK=CK-255
150 NEXT
160 RETURN

Line 140 is what handles the rollover. If we had a checksum of 250 and the next byte was a 10, it would be 260. That line would detect it, and subtract 255, making it 4. (The value starts at 0.)

The goal of a checksum is to verify data and make sure it hasn’t been corrupted. You send the data and checksum. The received passes the data through a checksum routine, then compares what it calculated with the checksum that was sent with the message. If they do not match, the data has something wrong with it. If they do match, the data is less likely to have something wrong with it.

Double checking the sum.

One of the problems with just adding (summing) up the data bytes is that two swapped bytes would still create the same checksum. For example “HELLO” would have the same checksum as “HLLEO”. Same bytes. Same values added. Same checksum.

A good 8-bit checksum.

However, if one byte got changed, the checksum would catch that.

A bad 8-bit checksum.

It would be quite a coincidence if two data bytes got swapped during transfer, but I still wouldn’t use a checksum on anything where lives were at stake if it processed a bad message because the checksum didn’t catch it ;-)

Another problem is that if the value rolls over, that means a long message or a short message could cause the same checksum. In the case of an 8-bit checksum, and data bytes that range from 0-255, you could have a 255 byte followed by a 1 byte and that would roll over to 0. A checksum of no data would also be 0. Not good.

Checking the sum: Extreme edition

A 16-bit or 32-bit checksum would just be a larger number, reducing how often it could roll over.

For a 16-bit value, ranging from 0-65535, you could hold up to 257 bytes of value 255 before it would roll over:

255 * 257 = 65535

But if the data were 258 bytes of value 255, it would roll over:

255 * 258 = 65790 -> rollover to 255.

Thus, a 258-byte message of all 255s would have the same checksum as a 1-byte message of a 255.

To update the Color BASIC program for 16-bit checksum, change line 140 to be:

140 IF CK>65535 THEN CK=CK-65535

Conclusion

Obviously, an 8-bit checksum is rather useless, but if a checksum is all you can do, at least use a 16-bit checksum. If you were using the checksum for data packets larger than 257 bytes, maybe a 48-bit checksum would be better.

Or just use a CRC. They are much better and catch things like bytes being out of order.

But I have no idea how I’d write one in BASIC.

One more thing…

I almost forgot what prompted me to write this. I found some code that would flag an error if the checksum value was 0. When I first saw that, I thought “but 0 can be a valid checksum!”

For example, if there was enough data bytes that caused the value to roll over from 65535 to 0, that would be a valid checksum. To avoid any large data causing value to add up to 0 and be flagged bad, I added a small check for the 16-bit checksum validation code:

if ((checksum == 0) && (datasize < 258)) // Don't bother doing this.
{
    // checksum appears invalid.
}
else if (checksum != dataChecksum)
{
    // checksum did not match.
}
else
{
    // guess it must be okay, then! Maybe...
}

But, what about a buffer full of 00s? The checksum would also be zero, which would be valid.

Conclusion: Don’t error check for a 0 checksum.

Better yet, use something better than a checksum…

Until next time…

14 thoughts on “Checksums and zeros and XMODEM and randomness.

  1. William Astle

    You have one error in your code for calculating checksums: when an 8 bit value overflows, you need to subtract 256, not 255. This is exactly modular arithmetic mod 256. In that system, 255 + 1 gives 0, not 1. The same applies for 16 bits but that’s mod 65536 instead. So for the rollover, you subtract the power of 2, not the maximum value.

    Reply
      1. Darren A

        The AND operator in Color Basic actually performs a bitwise operation on 16-bit signed integer operands. This means you could implement your 8-bit roll over as:
        130 CK=CK+ASC(MID$(A$,A,1)) AND 255

        ..then remove line 140.

        Reply
    1. Allen Huffman Post author

      I’m sure there will be plenty of typos and mistakes this month – 30 posts written in advance and scheduled. General rule is, if there’s not a CoCo screen shot, it’s bound to be untested code typed off the top of my head :)

      Reply
    1. Allen Huffman Post author

      GOO? Interesting… One can home XMODEM became a standard not because it was a standard, but because it was an improvement of all the methods that existed before it was released.

      Reply
  2. Sebastian Tepper

    5 ‘CALCULATES 2-BYTE CHECKSUMS
    6 ‘DETECTS INTERCHANGED CHARACTERS
    10 INPUT “ENTER THE STRING TO CHECK:”;A$
    20 A(0)=&HC3:A(1)=&HA5:A(2)=&H96 ‘INIT SCRAMBLE NUMBERS
    20 I=0:S=0 ‘INIT SCRAMBLE INDEX AND SUM OF TOTALS
    30 FOR J=1 TO LEN(A$) ‘LOOP TO READ CHARACTERS
    40 X=ASC(MID$(A$,J,1)) ‘GET NEXT CHARACTER
    50 Y=NOT X AND A(I) OR X AND NOT A(I) ‘XOR CHARACTER WITH SCRAMBLER
    60 IF I65535 THEN S=S-65536 ’16-BIT MODULO
    90 NEXT J
    100 C$=HEX$(S)
    110 IF LEN(C$)<4 THEN C$=”0″+C$ ‘PAD ZEROS AS NEEDED
    120 PRINT “CHECKSUM IS: “;C$
    130 END

    Reply
  3. Sebatian Tepper

    Argh. Lines 70 and 80 disappeared. Line 60 was indeed line 80, and a less-than sign was removed.
    There is no way to post code in this forum.
    Sorry, I give up. Bye.

    Reply
      1. Allen Huffman Post author

        This is a test:

        10 IF A > 10 THEN 20
        20 IF B < 10 THEN 10
        30 PRINT &HFF

        There is a code tag built in to WordPress, which I think may be designed for this purpose. less-than code greater-than, then less-than-slash code greater-than around code, like an HTML code. There is also PRE too but I don’t know what the difference is.

        Is this what we needed to be using all this time? It’s in the UI editor I see on my end.

        Reply
      2. Allen Huffman Post author

        This is using PRE

        10 IF A > 100 THEN PRINT
        20 IF B < 100 THEN PRINT
        

        This is using CODE:


        10 IF A > 100 THEN PRINT
        20 IF B < 100 THEN PRINT

        Let’s see what the difference is.

        Reply
  4. Sebastian Tepper


    10 'CALCULATES 2-BYTE CHECKSUMS
    15 'DETECTS INTERCHANGED CHARACTERS
    20 INPUT "ENTER THE STRING TO CHECK:";A$
    25 A(0)=&HC3:A(1)=&HA5:A(2)=&H96 'INIT SCRAMBLE NUMBERS
    30 I=0:S=0 'INIT SCRAMBLE INDEX AND SUM OF TOTALS
    35 FOR J=1 TO LEN(A$) 'LOOP TO READ CHARACTERS
    40 X=ASC(MID$(A$,J,1)) 'GET NEXT CHARACTER
    45 Y=NOT X AND A(I) OR X AND NOT A(I) 'XOR CHARACTER WITH SCRAMBLER
    50 S=S+Y 'ADD NEW VALUE TO THE SUM OF TOTALS
    55 IF I<2 THEN I=I+1 ELSE I=0 'INCREMENT SCRAMBLER INDEX
    60 IF S>65535 THEN S=S-65536 '16-BIT MODULO
    65 NEXT J
    70 C$=HEX$(S)
    75 IF LEN(C$)<4 THEN C$="0"+C$ 'PAD ZEROS AS NEEDED
    80 PRINT "CHECKSUM IS: ";C$
    85 END

    Reply

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.