How to write Huffman coding to a file using Python? - McMap

About

How to write Huffman coding to a file using Python?

Asked 19/7, 2018 at 14:46 Answered 19/7, 2018 at 14:52

Solved python encoding compression huffman-code

R

1

6

I created a Python script to compress text by using the Huffman algorithm. Say I have the following string:

string = 'The quick brown fox jumps over the lazy dog'

Running my algorithm returns the following 'bits':

result = '01111100111010101111010011111010000000011000111000010111110111110010100110010011010100101111100011110001000110101100111101000010101101110110111000111010101110010111111110011000101101000110111000'

By comparing the amount of bits of the result with the input string, the algorithm seems to work:

>>> print len(result), len(string) * 8
194 344

But now comes the question: how do I write this to a file, while still being able to decode it. You can only write to a file per byte, not per bit. By writing the 'codes' as bytes, there is no compression at all!

I am new at computer science, and the online resources just don't cut it for me. All help is much appreciated!

Edit: note that I had my codes something like this (in case of another input string 'xxxxxxxyzz'):

{'y': '00', 'x': '1', 'z': '10'}

The way I create the resulting string is by concatenating these codes in order of the input string:

result = '1111111001010'

How to get back to the original string from this result? Or am I getting this completely wrong? Thank you!

Rossetti answered 19/7, 2018 at 14:46 Comment(7)

This might be useful to you: #21221416 – Susuable 19/7, 2018 at 14:48

Of course a string like that can be converted, but is that really what you want to do? The temporary result would still be big that way. – Durman 19/7, 2018 at 15:0

@harold storing the file as 25 bytes instead of 43 is a ~40% improvement, that seems worthwhile. What temporary result are you referring to? – Thereupon 19/7, 2018 at 15:4

@Thereupon the string of "bits", before converting every group of 8 characters of it to a byte. – Durman 19/7, 2018 at 15:7

@harold gotcha. so you're just suggesting OP do this in a way that doesn't create that string all at once? – Thereupon 19/7, 2018 at 15:11

@Thereupon yes, piece by piece, while he's appending the individual codes - then the string being held in memory never gets really big. Actually it can be done with integer arithmetic too instead of strings, then there is no conversion at all any more – Durman 19/7, 2018 at 15:17

Hey guys, thanks for your answers. I edited my post to clarify my question a bit. – Rossetti 19/7, 2018 at 16:3

M

9

First you need to convert your input string to bytes:

def _to_Bytes(data):
  b = bytearray()
  for i in range(0, len(data), 8):
    b.append(int(data[i:i+8], 2))
  return bytes(b)

Then, open a file to write in binary mode:

result = '01111100111010101111010011111010000000011000111000010111110111110010100110010011010100101111100011110001000110101100111101000010101101110110111000111010101110010111111110011000101101000110111000'
with open('test.bin', 'wb') as f:
  f.write(_to_Bytes(result))

Now, writing the original string to a file, a comparison of bytes can take place:

import os
with open('test_compare.txt', 'a') as f:
  f.write('The quick brown fox jumps over the lazy dog')

_o = os.path.getsize('test_compare.txt')
_c = os.path.getsize('test.bin')
print(f'Original file: {_o} bytes')
print(f'Compressed file: {_c} bytes')
print('Compressed file to about {}% of original'.format(round((((_o-_c)/_o)*100), 0)))

Output:

Original file: 43 bytes
Compressed file: 25 bytes
Compressed file to about 42.0% of original

To get back to the original, you can write a function that determines the possible ordering of characters:

d = {'y': '00', 'x': '1', 'z': '10'}
result = '1111111001010'
from typing import Generator
def reverse_encoding(content:str, _lookup) -> Generator[str, None, None]:
  while content:
    _options = [i for i in _lookup if content.startswith(i) and (any(content[len(i):].startswith(b) for b in _lookup) or not content[len(i):])]
    if not _options:
      raise Exception("Decoding error")
    yield _lookup[_options[0]]
    content = content[len(_options[0]):]

print(''.join(reverse_encoding(result, {b:a for a, b in d.items()})))

Output:

'xxxxxxxyzz'

Mita answered 19/7, 2018 at 14:52 Comment(3)

Thank you for your answer! I edited my post to clarify my question concerning the decoding of the file. – Rossetti 19/7, 2018 at 16:2

Thank you, this is what I was looking for! – Rossetti 19/7, 2018 at 16:24

@PimKlaassen Glad to help – Mita 19/7, 2018 at 16:25

Recommended topics

#Godot #Unity #Godot 4.X #Mongodb

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

© 2022 - 2024 — McMap. All rights reserved.