I'm having a hard time figuring out which implementation of the 32-bit variation of the Fletcher checksum algorithm is correct. Wikipedia provides the following optimized implementation:
uint32_t fletcher32( uint16_t const *data, size_t words ) {
uint32_t sum1 = 0xffff, sum2 = 0xffff;
size_t tlen;
while (words) {
tlen = words >= 359 ? 359 : words;
words -= tlen;
do {
sum2 += sum1 += *data++;
} while (--tlen);
sum1 = (sum1 & 0xffff) + (sum1 >> 16);
sum2 = (sum2 & 0xffff) + (sum2 >> 16);
}
/* Second reduction step to reduce sums to 16 bits */
sum1 = (sum1 & 0xffff) + (sum1 >> 16);
sum2 = (sum2 & 0xffff) + (sum2 >> 16);
return sum2 << 16 | sum1;
}
In addition, I've adapted the non-optimized 16-bit example from the Wikipedia article to compute a 32-bit checksum:
uint32_t naive_fletcher32(uint16_t *data, int words) {
uint32_t sum1 = 0;
uint32_t sum2 = 0;
int index;
for( index = 0; index < words; ++index ) {
sum1 = (sum1 + data[index]) % 0xffff;
sum2 = (sum2 + sum1) % 0xffff;
}
return (sum2 << 16) | sum1;
}
Both these implementations yield the same results, e.g. 0x56502d2a
for the string abcdef
. To verify that this is indeed correct, I tried to find other implementations of the algorithm:
- An online checksum/hash generator
- C++ implementation in the srecord project
- There's also a JavaScript implementation
All of these seem to agree that the checksum for abcdef
is 0x8180255
instead of the value given by the implementation on Wikipedia. I've narrowed this down to how the data buffer the implementation operates on. All the above non-wikipedia implementation operate one byte at a time, whereas the Wikipedia implementation computes the checksum using 16-bit words. If I modify the above "naive" Wikipedia implementation to operate per-byte instead, it reads like this:
uint32_t naive_fletcher32_per_byte(uint8_t *data, int words) {
uint32_t sum1 = 0;
uint32_t sum2 = 0;
int index;
for( index = 0; index < words; ++index ) {
sum1 = (sum1 + data[index]) % 0xffff;
sum2 = (sum2 + sum1) % 0xffff;
}
return (sum2 << 16) | sum1;
}
The only thing that changes is the signature, really. So this modified naive implementation and the above mentioned implementations (except Wikipedia) agree that the checksum of abcdef
is indeed 0x8180255
.
My problem now is: which one is correct?
naive_fletcher
's the% 0xffff
in the loop are not necessary. You can do that after the loop. – Ludmilla% 0xffff in the loop are not necessary
as long as there is no overflow. – Vann