Understanding Cyclic Redundancy Code algorithm for beginners

Asked 3/6, 2020 at 8:13 Answered 15/6, 2020 at 21:9

at section 5.5 of the PNG Specification, it discusses this concept in the PNG file format called "CRC" or "Cyclic Redundancy Code". I've never heard of it before, so I'm trying to understand it.

The CRC polynomial employed is

x³² + x²⁶ + x²³ + x²² + x¹⁶ + x¹² + x¹¹ + x¹⁰ + x⁸ + x⁷ + x⁵ + x⁴ + x² + x + 1

In PNG, the 32-bit CRC is initialized to all 1's, and then the data from each byte is processed from the least significant bit (1) to the most significant bit (128). After all the data bytes are processed, the CRC is inverted (its ones complement is taken). This value is transmitted (stored in the datastream) MSB first. For the purpose of separating into bytes and ordering, the least significant bit of the 32-bit CRC is defined to be the coefficient of the x31 term.

So let me tell you what I understand and what I don't understand about this.

I've heard of polynomials, but in this context I'm a bit confused of how they are implemented here.

In this case, what is "x" supposed to represent? The current bit in the 32-bit looP? Which brings us to the next part:

so it says to make an empty 32-bit number (or rather, all set to 1s, so 32 1s), then it says it's "processed from the least significant bit (1) to the most significant bit (128)", but the question is, the "least...most..significant bit" of what?

Of the other data in the chunk?

How does that work, if the chunk is set in bytes, and this is only 32 bits? What if there are more than 32 bits in the chunk data (which there definitely are?)

Does it mean "least..most..significant bit" of the "polynomial"?

But what does the polynomial represent exactly? What is x^32?

What is x represented by?

Any help with the above questions, and perhaps a simple example with the example IDATA chunk (AKA calculating the CRC chunk for it with basic explanations) would be great:

0 0 2 3 IDAT 0 1 0 1 0 1 0 1 0 1 0 C

where the last byte "C" should be replaced with that 32-bit CRC thing it was talking about.

Can someone provide me with a practical example?

Champion answered 3/6, 2020 at 8:13 Comment(11)

" I don't know where they got that big bit number in the for loop, and I don't know where to insert the byte array" - It's one of the standard numbers for a 32 bit CRC. Go to the wiki article, and scroll down to the 32 bit CRC's, you'll see it there How CRCs are chosen is complicated, and I don't know why specific CRCs are chosen out of a group of CRCs that have the same error detection capability. – Transferase 4/6, 2020 at 8:35

@Transferase hi thanks, I noticed on the wiki it gives a few bit examples, using the same polynomial from other example questions. The question is I want to understand how to arrive at that 32-bit array, from the polynomial. I'm still very confused what "x" represents, it mentioned in the other answer that nothing is "plugged into" it, but then what is it, and how does that polynomial give the 4 different bit results? And the other point is, once I have the 32-bit string, but then what do I do to it with the other input, do I loop over the other byte array then do something, but what? – Hamburger 4/6, 2020 at 8:52

@Transferase also I searched for a javascript solution, all I was able to find was this medium.com/the-guardian-mobile-innovation-lab/… but they use a built in function for the CRC.. most of the others were either using canvas, or some other library, or server-side code – Hamburger 4/6, 2020 at 8:54

@Transferase the thing is I'm not interested in multiple formats nor compression, I'm only interested in a very simple PNG generator (mainly to run inside google apps script), and there are hardly any extremely light weight javascirpt PNG libraries (pngJS is too overkill) to achieve what I want – Hamburger 7/6, 2020 at 4:40

OK, but isn't zlib compression header and CRC (which I get the impression is separate from the PNG CRC) needed even if the compression is turned off? You'll need something for that. – Transferase 7/6, 2020 at 7:3

@Transferase have no idea, all I'm saying is I only need the absolute most basic PNG, no "extra" compression, so theoretically whatever minimum compression is necessary, but I thought that was part of the questionn.. – Hamburger 7/6, 2020 at 7:15

I recommend Ben Eater's video on these: youtube.com/watch?v=izG7qT0EpBw, he did a really good job of explaining them – Hoop 14/6, 2020 at 0:9

@YetAnotherUser ok but before iget it explained i need a working functionn / formula for javascript to input any byte array, thn i can have it x planed – Hamburger 14/6, 2020 at 21:37

Because the asker was unable to find a vanilla JavaScript solution, and needs one (according to his comments above), here is a link to mine: greg-tumolo.github.io/CRC – Calyptrogen 21/6, 2020 at 1:22

@Calyptrogen wow amazing, thats exactl what I was looking for! so I just want to udnerstand it a bit, does the number "0xedb88320" have anything to do with that polynomial x^32 etc. thing? – Hamburger 21/6, 2020 at 6:18

Yes; see my final comment on my answer. – Calyptrogen 21/6, 2020 at 17:18

Beware: If you use (00000000)_2 and (00000001)_2 as the binary representations of the 0s and 1s in your example IDAT chunk, you will compute the CRC incorrectly. The ASCII values of '0' and '1' are 48 = (00110000)_2 and 49 = (00110001)_2; similarly, the ASCII values of 'I', 'D', 'A', and 'T' are 73 = (01001001)_2, 68 = (01000100)_2, 65 = (01000001)_2, and 84 = (01010100)_2. So, assuming you meant the values 0 and 1 rather than the characters ‘0’ and ‘1’, what you must compute the CRC of is (01001001 01000100 01000001 01010100 00000000 00000001 00000000 00000001 00000000 00000001 00000000 00000001 00000000 00000001 00000000)_2.

Inconsequentially to the CRC but consequentially to the validity of the chunk, the length field (i.e., the first 4 bytes) of the chunk should contain the length in bytes of the data only, which is 11, which is the ASCII value of a vertical tab (VT), which is a nonprinting character but can be represented in strings by the hexadecimal escape sequence \x0B (in which (B)_16 = 11). Similarly, the first 3 bytes must contain the character for which the ASCII value is 0 (rather than 48), which is null (NUL), which can be represented in strings by the hexadecimal escape sequence \x00. So, the length field must contain something like "\x00\x00\x00\x0B".

Calyptrogen answered 15/6, 2020 at 21:9 Comment(15)

In JavaScript, \0 is another escape sequence for NUL; and, \v is another escape sequence for VT. – Calyptrogen 15/6, 2020 at 21:25

hi thanks, Im just having a bit f trouble understanding this. so is CRC32 like a functio, that takes an input (array of the byte values you mentioned) and returns a response? – Hamburger 16/6, 2020 at 20:56

Yes. For a PNG chunk: the input is the 4-byte type (like “IDAT”) followed by the length*-byte data; and, the output is a 4-byte = 32-bit unsigned integer (hence, CRC-32). *Note: The 4-byte length is not included. – Calyptrogen 16/6, 2020 at 21:46

aha interesting, so another part I was confused about is is the input for the CRC function then always exactly 4 bytes? if not then how can it give a 4-byte output based on something that is greater than 4-bytes? Don't some chunks of the PNG file take up more than 4-bytes, like if there are 10mb worth of pixels? Or is every 4-bytes of that data split up into a separate CRC thing? – Hamburger 16/6, 2020 at 21:48

No. For your example chunk, the input would be 4 + 11 = 15 bytes. The bytes are cycled through the to-be output (register) as they are processed (hence, Cyclic RC). The minimum chunk size is 12 bytes—a 4-byte length containing 0, followed by a 4-byte type, followed by the 4-byte CRC of the type. – Calyptrogen 16/6, 2020 at 22:14

oh ok now Im starting to understand it, and the cyclic part.. so is that what "x" represents in the polynomial examples? x means whatever loop cycle I am on? so for the 12 byte example, the first time x would be 0, then second time, 1, third time, 2, fourth time, 3? If not , how does the polynomial thing work exactly? in other words what is the function body, in essence, what does it do? – Hamburger 16/6, 2020 at 22:18

No. Consider w3.org/TR/2003/REC-PNG-20031110/#D-CRCAppendix. The polynomial in x does not appear explicitly; it appears implicitly as the hexadecimal value edb88320, which, when converted to binary, is the sequence of 32 bits that represent the coefficients of the terms of the polynomial. – Calyptrogen 16/6, 2020 at 22:53

wait so every x term refers to the entire number edb88320? so x^32 is the same as Math.pow(0xedb88320, 32)? If not what is x exacly, and what does ^32 mean (and the other powers-of)? – Hamburger 21/6, 2020 at 18:51

No; every x term refers to a 1 in the binary representation of 0xedb88320, which is 11101101 10111000 10000011 00100000. The number on the left (least-significant) end, 1, is the (coefficient of the) constant (x^0) term. The next number from the left, 1, is the coefficient of the x term. The next number, 1, is the coefficient of the x^2 term. The next number, 0, is the coefficient of the x^3 term (which is absent because 0*x^3 = 0). And so on. There is an implied 1 to the right of the right (most-significant) end that is the coefficient of the x^32 term. – Calyptrogen 28/6, 2020 at 5:42

so what does ^ mean, is it the equivilent of Math.pow(1, 32)? – Hamburger 29/6, 2020 at 0:12

yo just lookin back on it now, so is it always the set number 0xedb88320, or is it different sometimes? if different sometimes then how does one find out what it is? – Hamburger 5/7, 2022 at 3:23

It’s always that number for that implementation. – Calyptrogen 5/7, 2022 at 15:53

But how exactly does one find out what number it is for, say, this implementation? – Hamburger 5/7, 2022 at 23:28

By reading the implementation: w3.org/TR/2003/REC-PNG-20031110/#D-CRCAppendix – Calyptrogen 9/7, 2022 at 19:31

thanks but that is way too complicated for my tiny mind to understand, is there a tutorial anywhere? – Hamburger 10/7, 2022 at 2:17

I would recommend reading Ross Williams' classic "A Painless Guide to CRC Error Detection Algorithms". Therein you will find in-depth explanations and examples.

The polynomial is simply a different way to interpret a string of bits. When you have n bits in a register, they are most commonly interpreted either as just that, a list of n independent bits, or they are interpreted as an integer, where you multiply each bit by two raised to the powers 0 to n-1 and add them up. The polynomial representation is where you instead interpret each bit as the coefficient of a polynomial. Since a bit can only be a 0 or a 1, the resulting polynomials never actually show the 0 or 1. Instead the xⁿ term is either there or not. So the four bits 1011 can be interpreted to be 1 x³ + 0 x² + 1 x¹ + 1 x⁰ = x³ + x + 1. Note that I made the choice that the most significant bit was the coefficient of the x³ term. That is an arbitrary choice, where I could have chosen the other direction.

As for what x is, it is simply a placeholder for the coefficient and the power of x. You never set x to some value, nor determine anything about x. What it does is allow you to operate on those bit strings as polynomials. When doing operations on these polynomials, you treat them just like the polynomials you had in algebra class, except that the coefficients are constrained to the field GF(2), where the coefficients can only be 0 or 1. Multiplication becomes the and operation, and addition becomes the exclusive-or operation. So 1 plus 1 is 0. You get a new and different way to add, multiply, and divide strings of bits. That different way is key to many error detection and correction schemes.

It is interesting, but ultimately irrelevant, that if you set x to 2 in the polynomial representation of a string of bits (with the proper ordering choice), you get the integer interpretation of that string of bits.

Desmarais answered 4/6, 2020 at 1:3 Comment(0)

The spec includes a link to example code:

https://www.w3.org/TR/2003/REC-PNG-20031110/#D-CRCAppendix

The spec has errors or is confusing.

That should be "data from each byte is processed from the least significant bit(0) to the most significant bit bit(7).

The CRC is a 33 term polynomial, where each term has a one bit coefficient, 0 or 1, with the 0 coefficients ignored when describing the polynomial.

Think of the CRC as being held in a 32 bit register. The sequence is to xor a byte of data into the right most byte of the CRC register, bits 7 through 0 (which technically correspond to the polynomial coefficients of x^24 to x^31). Then the CRC is "cycled" to the right for 8 bits (via table lookup). Once all data bytes have gone through this cycle, based on the comment from Mark Adler, it's the CRC is appended to data most significant byte first, (CRC>>24)&0xff, (CRC>>16)&0xff, (CRC>>8)&0xff, (CRC)&0xff.

The wiki article may help. For the example in the computation section, the dividend would be an array of data bytes with the bits of each byte reversed, the bits of the 33 bit polynomial would be non-reversed (0x104C11DB7). After doing the computation, the bits of the remainder would be reversed and appended to the data bytes.

https://en.wikipedia.org/wiki/Cyclic_redundancy_check

Mark Adler's answer includes a link to a good tutorial for a CRC. His answer also explains the x's used in a polynomial. It's just like a polynomial in algebra, except the coefficients can only be 0 or 1, and addition (or subtraction) is done using XOR.

what is x

From the wiki example:

data     = 11010011101100 = x^13 + x^12 + x^10 + x^7 + x^6 + x^5 + x^3 + x^2
divisor  =           1011 = x^3 + x + 1

Three 0 bits are appended to the data, effectively multiplying it by x^3:

dividend = 11010011101100000 = x^16 + x^15 + x^13 + x^10 + x^9 + x^8 + x^6 + x^5

Then the crc = dividend % divisor, with coefficients restricted to 0 or 1.

(x^16 + x^15 + x^13 + x^10 + x^9 + x^8 + x^6 + x^5) % (x^3 + x + 1) = x^2
11010011101100000 % 1011 = 100

Transferase answered 3/6, 2020 at 14:28 Comment(14)

No, the CRC is not always appended to the data in the order that would give a fixed residue. In fact, in this particular case, PNG appends the CRC in the opposite byte order. The CRC on the type and data fields is calculated, and then simply compared to the CRC in the chunk after being reassembled from its big-endian ordering in the data. – Desmarais 4/6, 2020 at 1:18

@MarkAdler - thanks - I updated my answer. The link states MSB first and it isn't clear if MSB meant most significant byte of the reversed polynomial, or MSB of the register holding the CRC. – Transferase 4/6, 2020 at 2:46

My understanding is yes, since it was used in HDLC. – Desmarais 4/6, 2020 at 3:56

You asked if the CRC had been implemented in hardware. It is unlikely that PNG has been implemented in hardware. – Desmarais 4/6, 2020 at 20:29

Yes, this CRC has been implemented in hardware. PNG has not. – Desmarais 4/6, 2020 at 20:33

@MarkAdler - This CRC would need more hardware than what is used for a typical CRC, which is normally implemented as a LFSR for both calculation and output (with what would be data bits set to 0). For this CRC, after calculating the CRC, the LFSR would have to be mapped to another register to shift the CRC bits out. CRC checking would need a mapped compare. I was just wondering what was behind the choice of a non-typical CRC implementation (reversing the bytes of the CRC). – Transferase 4/6, 2020 at 20:37

The definition of the CRC does not define the byte ordering in a byte stream. It only defines the bit ordering in the 32 (or whatever n is) bits. The format definition, in this case PNG, defines the byte ordering. The reason is that "network byte ordering" was chosen as the ordering for all integers in the PNG format, which is big endian. – Desmarais 4/6, 2020 at 20:42

hi thanks, although I am still a bit lost, you said the X terms could either be 0 or 1, but how do you determine which ones are either 0 or 1? and how does it relate to the byte string taken as an input? you mentioned somethign about each bit in each of the bytes in the string being reversed, but I'm a bit confused how that all fits into a 32 bit polynomial, if all of the bytes will add up to more than 32 total bits? – Hamburger 7/6, 2020 at 4:45

@Transferase OK but I'm still confused, what is "x"? if x is 1 why doesn't it just say "1"? but then again, if it is 1, then how does 1^4 make sense, one to any power is just 4? – Hamburger 7/6, 2020 at 7:16

@bluejayke - I added more to my answer. – Transferase 7/6, 2020 at 8:59

@Transferase ah thanks, I'm starting to understand a bit now... so "data", is that the initial input byte, that gets processed by the CRC (in the case of a PNG, the necessary parts of the chunk, in bytes?) also, what is the divisor, and how was it acquired? is it always 1011 for all data inputs? and also, i ngeneral, is x^something +x^somethingElse a standard way of representing byte info? – Hamburger 7/6, 2020 at 9:43

@bluejayke - in this case, the divisor is 0x04C11DB7 (not 0xB), but since it's a right shirting CRC, the bits are reversed (left/right) for the entire operation so the divisor is bit reversed to 0xedb88320. – Transferase 7/6, 2020 at 14:55

@Transferase hi thanks, I'm just trying to understand how that conclusion was reached, I'm just a beginner, I know nothing about CRC32 besides for what is mentioned here and in the articles, but I haven't seen anywhere how that conclusion was reached, but bbasically, another question is mainly what exact formula is used, given an input of byte string data, to get the CRC? that's the main question, simply how to use it, given any byte string of data (in this case the PNG chunk), what exact formula is used to get the necessary CRC32 ? – Hamburger 8/6, 2020 at 4:35

@bluejayke - it's not a formula, but instead an algorithm, emulating long hand division, except in binary, as shown in the wiki article. The choice for the divisor is arbitrary, usually a common one known to detect some number of bit errors in a message up to some number of bits. – Transferase 8/6, 2020 at 7:5