I created a Python script to compress text by using the Huffman algorithm. Say I have the following string:
string = 'The quick brown fox jumps over the lazy dog'
Running my algorithm returns the following 'bits':
result = '01111100111010101111010011111010000000011000111000010111110111110010100110010011010100101111100011110001000110101100111101000010101101110110111000111010101110010111111110011000101101000110111000'
By comparing the amount of bits of the result with the input string, the algorithm seems to work:
>>> print len(result), len(string) * 8
194 344
But now comes the question: how do I write this to a file, while still being able to decode it. You can only write to a file per byte, not per bit. By writing the 'codes' as bytes, there is no compression at all!
I am new at computer science, and the online resources just don't cut it for me. All help is much appreciated!
Edit: note that I had my codes something like this (in case of another input string 'xxxxxxxyzz'
):
{'y': '00', 'x': '1', 'z': '10'}
The way I create the resulting string is by concatenating these codes in order of the input string:
result = '1111111001010'
How to get back to the original string from this result? Or am I getting this completely wrong? Thank you!