Canonical huffman encoding algo
Asked Answered
J

2

5

Hello I am trying to implement Canonical huffman encoding but i dont understand wiki and google guides, I need explain more abstractly...

I tried this: 1. Get list of regular huffman encoding length's codes. like this:

A - code: 110, length: 3.
B - code: 111, length: 3.
C - code: 10, length 2.
D - code: 01, length 2.
E - code: 00, length 2.
  1. I sorting the table by symbol and length like this:
C - code: 10, length 2.
D - code: 01, length 2.
E - code: 00, length 2.
A - code: 110, length: 3.
B - code: 111, length: 3.

now i dont know how to proceed...

tnx a lot

Jock answered 1/1, 2016 at 21:55 Comment(0)
P
16

Throw out the codes you get from the Huffman algorithm. You don't need those. Just keep the lengths.

Now assign the codes based on the lengths and the symbols. Sort by length, from shortest to longest, and within each length, sort the symbols in ascending order. (How you do that exactly doesn't matter, so long as every symbol is strictly less than or greater than any other symbol, and the encoder and decoder agree on how to do it.)

So we do the ordering:

C - 2
D - 2
E - 2
A - 3
B - 3

Two's come before three's, and within the 2's, C, D, E are in order, and within the 3's, A, B are in order.

Now we assign the code in integer order within each length, adding a zero bit at the end each time we go up a length:

C - 2 - 00
D - 2 - 01
E - 2 - 10
A - 3 - 110 <- after incrementing to 11, a zero was added to make 110
B - 3 - 111

That is a canonical code.

You could do it other ways if you like and still be canonical, e.g. counting backwards from 11, so long as the encoder and decoder agree on the approach. The whole point is to only have to transmit the lengths for each symbol from the encoder to the decoder, so as to not have to transmit the codes themselves which take more space.

Phlebitis answered 2/1, 2016 at 18:11 Comment(3)
Hello. Thanks for this answer. I was looking for the same thing. What I don't get is the way you assign code in 'integer order'. I mean, isn't 00 = 0, 01=1, 10=2 , 110 = 6, 111 = 7 ? Why did we jump from 2 to 6? Or is it not in binary order, and integer order means something else?Polson
@BogdanDaniel That is explained in the gray box in the answer. When you go to the next length, you add zero bits to the end. The integer order is within each length.Phlebitis
Thank you. I get it now. I missed the 'each length' bit.Polson
T
-1

You should sort symbols by there frequency, so most often would be on top and least often would be on bottom. (Overall frequency - 1):

A (0.5)
B (0.2)
C (0.15)
D (0.15)

Then mark one symbol with 0 and other with 1, summ there frequencies and insert into proper position in list and again mark two least with 0 and 1:

A (0.5)      A (0.5)
B (0.2)      C&D (0.3) 0
C (0.15) 0   B (0.2)   1
D (0.15) 1

And again...

A (0.5)      A (0.5)        A (0.5)      0
B (0.2)      C&D (0.3) 0    B&C&D (0.5)  1
C (0.15) 0   B (0.2)   1
D (0.15) 1

Until you obtain last pair. The path, marked by 0 and 1 from tail to symbol would be corresponding Huffman code:

A 0
B 11
C 100
D 101
Tax answered 1/1, 2016 at 22:14 Comment(2)
Also, the OP is asking specifically about canonical Huffman codes.Harmonious
Look up "canonical Huffman code". You're totally not getting it.Phlebitis

© 2022 - 2024 — McMap. All rights reserved.