What's Wrong With My CRC?

Check out the (non) CRC implementation below. What's wrong with it?

I'm working on a connectivity library for IoT devices. A serious part of every communication protocol is the data integrity check. You send a collection of bytes to another machine and that machine has to know that there were no errors on the way. IP for example already has a good integrity check. The problem is that a TCP socket is basically a stream. This means that you don't really know when a buffer begins and ends. By using an integrity check we can verify that we are looking at full buffers.

The basic concept is to have some correlation between the first and the last byte. For example if first 2 bytes are length of the buffer and the last two bytes are 0x0102 then we can keep scanning the buffer until the first two bytes point to 0x0102. The problem is that every number we may choose can appear inside the buffer as part of the data. For example what if we have 10,000 bytes of 0x0102? We will find 0x0102, assume that it is the length, jump forward and find 0x0102. Magic number is probably one of the worst ways to go. Length is a characteristic of the buffer. We need to have another characteristic of the buffer at the end as well. The preferred method is to use all the data in the buffer to generate an integrity check value. For this there are commonly Checksum, XOR, CRC, and Hush. The question is how much CPU and RAM vs. the quality of error detection.

Checksum was originally a simple addition of all bytes (sum...). The problem is that 0x80+0x10 is the same as 0x10+0x80. This means that replacing one byte might be detected, but there is high probability that one or more bytes might compensate to get the same result. Here is another example: 0x10+0x20+0x30 has the same checksum as 0x60+0x00+0x00. Clearly a different buffer.

Using XOR is very simple and low cost for the CPU. The problem is that 0x80^0x80 is the same as 0x81^0x81. This means that one error in the buffer may be detected but there is good probability that a second error will hide the first. XOR is safe as long as there is only one error in each bit of all bytes (or any odd number of errors for any bit).

CRC - Cyclic Redundancy Check is probably the best way to go. There is a nice algorithm which I will detail below. It is more expensive in CPU and RAM but much more reliable. Every error propagates to continually produce completely different numbers. It is very hard to compensate for one error with another error. If you have 0x12 instead of 0x10, there isn't only one number that has to change to a specific location to compensate. CRC16 means 16 bit number and CRC 32 means 32 bit number. This means that CRC16 is 1 in 64K probability that the buffer is correct. If a byte changed and caused a different CRC16 to be produced, it would take several different bytes to compensate for it because one byte cannot repair a 16 bit value.

Advanced Hush functions usually consume too much RAM and CPU but eventually the output number is what matters. if you have a 16 bit integrity value then even the best algorithm will not save you from a 1 : 64K ratio.

I googled "crc16 algorithm c" and found several results. The first to demonstrate the concept is here: http://www.nongnu.org/avr-libc/user-manual/group__util__crc.html

uint16_t crc16_update(uint16_t crc, uint8_t a)
{
   int i;
   crc ^= a;
   for (i = 0; i < 8; ++i)
   {
      if (crc & 1) crc = (crc >> 1) ^ 0xA001;
      else crc = (crc >> 1);
   }
   return crc;
}

This example is good because it shows you the cycles. For every bit we shift and based on a selected bit value we either xor with a magic number or not. This loop consumes CPU so we can actually prepare the result for every given byte in a lookup table as seen here: http://stackoverflow.com/questions/22432066/how-to-use-table-based-crc-16-code

unsigned int crctable[256] =
{
   0x0000, 0x1189 ........
};

while (len--)
{
   crc = (crc << 8) ^ crctable[((crc >> 8) ^ *c++)];
}

I tried to find something that I could use on an Arduino device without wasting CPU and without using a great portion of the device's memory. Here is what I got (translated from C# without compiling, I'll explain why below)

myCrc ^= *buffer++;
myCrc = (myCrc) ^ (myCrc << 5) ^ (myCrc << 8);

How is it possible that I don't know about such implementation and it does not come up as first hit in google search. What am I missing here?

The reason I used C# was for the tests (and because this is the project that I had open when I wanted to test this solution).

Here is the function:

      private static ushort CRC16(byte[] buffer, out ushort crc, out ushort myCrc)
      {
         myCrc = 0;
         crc = 0;
         int pos = 0;
         while (pos < buffer.Length)
         {
            crc ^= (ushort)(buffer[pos] << 8);
            for (int i = 0; i < 8; ++i)
            {
               if ((crc & 0x8000) != 0) crc = (ushort)((crc << 1) ^ 0x1021);
               else crc = (ushort)(crc << 1);
            }

            myCrc ^= (ushort)(buffer[pos]);
            myCrc = (ushort)((myCrc) ^ (myCrc << 5) ^ (myCrc << 8));
            //myCrc ^= *buffer++;
            //myCrc = (myCrc) ^ (myCrc << 5) ^ (myCrc << 8);

            pos++;
         }
         return (crc);
      }

Here is the tester code:

      private static void BenchmarkCRC16()
      {
         ushort[] crcTable = new ushort[0x10000];
         ushort[] myCrcTable = new ushort[0x10000];
         for (int a = 0; a < 0x17000; a+=13)
         {
            if (0 == (a % 256)) Console.WriteLine("a = 0x" + a.ToString("X"));
            ushort ua = (ushort)a;
            for (int i = 0; i < 0x10000; i++)
            {
               ushort u = (ushort)i;

               ushort crc16 = 0;
               ushort myCrc = 0;
               CRC16(new byte[] { (byte)(u & 0xFF), (byte)((u >> 8) & 0xFF), (byte)(ua & 0xFF), (byte)((ua >> 8) & 0xFF) }, out crc16, out myCrc);
               crcTable[crc16]++;
               myCrcTable[myCrc]++;
            }
         }
         List<int>   crcUtilization = new List<int>();
         List<int> myCrcUtilization = new List<int>();
         for (int i = 0; i < 0x10000; i++)
         {
            bool ok = false;
            for (int t = 0; t < crcUtilization.Count; t++) { if (crcUtilization[t] == crcTable[i]) { ok = true; break; } }
            if (!ok) crcUtilization.Add(crcTable[i]);
            ok = false;
            for (int t = 0; t < myCrcUtilization.Count; t++) { if (myCrcUtilization[t] == myCrcTable[i]) { ok = true; break; } }
            if (!ok) myCrcUtilization.Add(myCrcTable[i]);
         }
// BREAKPOINT HERE and check out crcUtilization and myCrcUtilization (or print)
      }

      private static void CRCTest(byte[] buffer)
      {
         ushort crc16 = 0;
         ushort myCrc = 0;
         CRC16(buffer, out crc16, out myCrc);
      }

The tests show that myCrc and crc16 both have even spread values so every combination of bytes in the original buffer has a unique number within the limit of 64K. For example if you comment out the line "myCrc = (ushort)((myCrc) ^ (myCrc << 5) ^ (myCrc << 8));" in my test function (only XOR) you will see few results covered repeatedly and many values not used at all (zero).

If such implementation is good then it is clearly suitable for AVX vectorization, as opposed to using a for loop or a lookup table.

I am open for any suggestion or thought.

 

 

 

For more complete information about compiler optimizations, see our Optimization Notice.

4 comments

Top
Asaf Shelly's picture

Hi Jim, So how do we define a test that verifies the complexity of the CRC? Currently every input buffer has a unique CRC number. What about myCrc, when the lower byte is zero and the buffer has zeros? myCrc will not change until the buffer has non-zero. How do we measure the effectiveness of a CRC?

jimdempseyatthecove's picture

The method you use to compute/check the CRC determines its effectiveness at detecting different types of errors. The more compute intensive methods (bit shifting xor or multiply) are more robust at detecting multiple errors. This gets back to my additional note "what is more important is that errors are not necessarily random". I am not an expert on data transmission error detection, so it may be advisable to look at the research papers behind each algorithm. Then pick one that meets your criteria.

Depending on your data (e.g. are your messages 7-bit ASCII), the prime multiply by odd number method could be reduced to an 8-bit multiply (message[i]<<1+1), times your 16 or 24 or 32 bit accumulator.

Many of the Arduino microprocessors include hardware CRC support. Example, the Atmel Xmega 8-bit can produce CRC-16 (CRC-CCITT) and CRC-32 (IEEE 802.3). And it can produce the CRC on continuous data going through a DMA channel. IOW, after setup it is a 0 instruction cycle overhead. So for performance and minimum memory footprint, you might want to get closer to the metal.

Asaf Shelly's picture

Hi Jim, thanks. would this implementation (in C#) be good? I got an uneven spread.

//before loop:
         int mulCrc = 427067;  // any prime
//in loop:
            ushort temp = (ushort)(buffer[pos]);
            temp = (ushort)(temp << 2 + 1);
            mulCrc = (ushort)(mulCrc * temp);
//post loop:
            myCrc = (ushort)(mulCrc >> 4);

I probably used signed / unsigned wrong in the multiplication.

My fundamental question is: Why would anyone use a for loop on every bit in the transmission when you can use simple CPU op-codes instead? There has to be something I am overlooking...

jimdempseyatthecove's picture

The CRC hashing function can be anything you want provided that:

a) Sender and receiver agree as to what that function is.
b) If the function is published as meeting one of the common implementations, it then therefore must meet that specification.
c) It provides the best protection for the number of bits and performance you seek.

Not that Wikipedia is to be considered the authority on CRCs, it lists 7 different 16-bit CRC functions.

If the "myCrc" listed above meets your criteria, then go with it. Another technique (a little more compute intensive than myCrc) is to seed an initial value with a prime number and who's bit width is larger than the result you see. Example, use 32-bit or 24-bit initial value (24-bit may work well on Arduino). Then for each byte in the buffer, promote it to unsigned 16-bit, left shift by 2, and add 1. Thus producing an odd intermediary number. Next multiply the two odd values (one 24-bit, the other 16-bit), producing a new 24-bit result (which is also odd). When the buffer is complete, the final 16-bit CRC is the middle 16-bits of the 24-bit value.

I suggest running a verification test to determine "the spread".

Additional note:

While a uniform spread is important, what is more important is that errors are not necessarily random. One algorithm may be better than others in detecting the typical error. Example: think of a synchronous communication between two systems where the clocks on the two systems are not precisely the same. In this case you may experience bit framing errors after so many bits (and you are not using a resynchronization technique such as MFM or modified MFM). In this situation, the error is expected to occur in the last few bytes.

Add a Comment

Have a technical question? Visit our forums. Have site or software product issues? Contact support.