Confused about Huffman Trees
Asked Answered
W

3

7

A quick tutorial on generating a huffman tree

Confused about Huffman Trees. Near the end of that link above, it shows the tree with 2 elements left, and then the completed tree. I'm confused about the way that it is branched. Is there a specific way a huffman tree needs to be branched?

For example, 57:* with its right child 35:* is branched off to the right. Could it have been 35 branched to the left with 22 branched to the right? Also, why wasn't 22:* paired up with 15:4 - it just paired with 20:5 to create a new tree.

From initial obersvations it seems the tree does not need to be balanced or have any specific order other than that the frequencies of a leaf add up to the value of the parent node. Could two people creating a huffman tree with the same data end up with different encoding values?

Winfrid answered 8/6, 2010 at 1:10 Comment(0)
N
5

The key to Huffman trees is this:

Sort this list by frequency and make the two-lowest elements into leaves

If you have more than two elements that have the lowest frequency (e.g. 3,4,4...), any two will do (3 and either of 4s - not two 4s). Also, it is not important which of these lowest elements is assigned 0 and which is 1. These two facts allow different yet valid Huffman encodings to arise from the same data.

The Huffman tree is supposed to be balanced by frequencies, not by the number of nodes. Thus the following is balanced:

(100 (50 (25 (12 (12 1)))))

and this is not:

(((100 50) 25) ((12 12) 1)))

Specifically in your question, 15 is paired with 20 and not 22 because 15 and 20 are the two lowest remaining values (both lower than 22). Either branching (left or right) would have been fine, as long as it's consistent (always smaller-left, or always smaller-right, within the same algorithm, so that the encoding can be reconstructed at the other end).

Narrowminded answered 8/6, 2010 at 1:22 Comment(4)
Note to the poster: Observe that these decisions don't change how well your Huffman encoding compresses data. No matter how you arrange the leaves, all the values will be at the same depth in the tree each time, meaning that the length of the codes will always be sorted by the frequency of the value.Modestomodesty
@mquander: Couldn't have said better myself.Narrowminded
@mquander: There are some cases where an arbitrary choice ("any two will do") puts a symbol at a different depth in tree A than in tree B. These arbitrary decisions create slightly different trees. But you are right that even these more significant-looking differences still don't change how well they compress data -- the proof is to long to fit in this comment :-).Retardation
Re: being consistent with the branching, the linked tutorial has lowest-left for all layers except for the last one, which is annoying..Dotty
A
6

The posts so far are wrong and misleading: the choice of leaves with equal weights does matter and they do change how well they compress data.

Here's a counter example that demonstrates how different choices lead to different compression rates: ABBBCCCDDDDEEEEEEEE

A:1, B:3, C:3, D:4, E:8. First step: take A and B to form a node with weight 4. Second step:

If you take the newly created node in the first step with C, then you get (19 (11 (7 (4 (1-A) (3-B)) (3-C)) (4-D)) (8-E)) which gives 37-bits compressed data.

If, on the other hand, you take D, which also has the weight 4, instead of the newly created node, you get (19 (11 (4 (1-A) (3-B)) (7 (3-C) (4-D))) (8-E)) which gives 41-bits compressed data.

Asexual answered 11/12, 2012 at 11:17 Comment(2)
So how can you combat this. I have my compressor reconstructing a different table than the decompressor. How can I discriminate between nodes.Monitor
Unless your Huffman algorithm is deterministic you'll have to integrate the built Huffman tree into the file.Dotty
N
5

The key to Huffman trees is this:

Sort this list by frequency and make the two-lowest elements into leaves

If you have more than two elements that have the lowest frequency (e.g. 3,4,4...), any two will do (3 and either of 4s - not two 4s). Also, it is not important which of these lowest elements is assigned 0 and which is 1. These two facts allow different yet valid Huffman encodings to arise from the same data.

The Huffman tree is supposed to be balanced by frequencies, not by the number of nodes. Thus the following is balanced:

(100 (50 (25 (12 (12 1)))))

and this is not:

(((100 50) 25) ((12 12) 1)))

Specifically in your question, 15 is paired with 20 and not 22 because 15 and 20 are the two lowest remaining values (both lower than 22). Either branching (left or right) would have been fine, as long as it's consistent (always smaller-left, or always smaller-right, within the same algorithm, so that the encoding can be reconstructed at the other end).

Narrowminded answered 8/6, 2010 at 1:22 Comment(4)
Note to the poster: Observe that these decisions don't change how well your Huffman encoding compresses data. No matter how you arrange the leaves, all the values will be at the same depth in the tree each time, meaning that the length of the codes will always be sorted by the frequency of the value.Modestomodesty
@mquander: Couldn't have said better myself.Narrowminded
@mquander: There are some cases where an arbitrary choice ("any two will do") puts a symbol at a different depth in tree A than in tree B. These arbitrary decisions create slightly different trees. But you are right that even these more significant-looking differences still don't change how well they compress data -- the proof is to long to fit in this comment :-).Retardation
Re: being consistent with the branching, the linked tutorial has lowest-left for all layers except for the last one, which is annoying..Dotty
G
2

Everything is explained on the page. 22:* wasn't paired up with 15:4 because in each step, two nodes with the lowest elements are combined. This creates an unique order.

Huffman codes can be different (if you have multiple values with the same frequency or exchange 0 and 1 representation of left/right), but huffman lengths can't be.

Branching left/right is just a matter of how to draw the tree or represent it graphical, so this doesn't matter.

Gwyn answered 8/6, 2010 at 1:26 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.