Huffman code for a single character?

Asked 15/3, 2014 at 21:17 Answered 21/2, 2022 at 6:37

Lets say I have a massive string of just a single character say x. I need to use huffman encoding. A huffman encoding is a fully binary tree. So how does one create a huffman code for just a single character when we dont need two leaves at all ?

Motherly answered 15/3, 2014 at 21:17 Comment(0)

jbr's answer is fine; this is just a longer version of it.

Huffman is meant to produce a minimal-length sequence of bits that contains all the information in the original sequence of symbols, assuming that the decoder already knows the set of symbols. If there's only one symbol, the input data contains no information except its length.

In Huffman-based data formats, length is usually encoded separately, not as part of the Huffman-encoded bit sequence itself. The decoder of a single-symbol Huffman code therefore has all the information it needs to reconstruct the input without needing to read anything from the Huffman-encoded bit sequence. it is logical, then, that the Huffman encoder's output should be 0 bits long.

If you don't have a length encoded separately, then you must have a symbol to represent End Of Sequence so the decoder knows when to stop reading. Then your Huffman tree will have 2 nodes and you won't run into this special case.

Ducks answered 15/3, 2014 at 21:46 Comment(0)

If you only have one symbol, then you only need 1 bit per symbol. So you really don't have to do anything except count the number of bits and translate each into your symbol.

Sydelle answered 15/3, 2014 at 21:27 Comment(2)

I understand that, but my code was failing for a test case of single character. Is is safe to assume Huffman algorithm is NOT meant for single char's ? – Motherly 15/3, 2014 at 21:29

I'd say so. There is really no point, as the shortest possible encoding is simply the symbols plus the string length. – Sydelle 15/3, 2014 at 21:32

You simply could add an edge case in your code. For example: check if there is only one character in your hash table, which returns only the root of the tree without any leafs. In this case, you could add a code for this root node in your encoding function, like 0. In the encoding function, you should refer to this edge case too.

Coats answered 21/2, 2022 at 6:37 Comment(0)

Recommended topics

Hot tags