I'm experimenting with the lzma module in Python 2.7.6 to see if I could create compressed files using the XZ format for a future project that will make use of it. My code used during the experiment was:
import lzma as xz
in_file = open('/home/ki2ne/Desktop/song.wav', 'rb')
input_data = in_file.read()
compressed_data = xz.compress(input_data)
out_file = open('/home/ki2ne/Desktop/song.wav.xz', 'wb')
in_file.close()
out_file.close()
and I noticed there were two different checksums (MD5 and SHA256) from the resulting file compared to when I used the plain xz (although I could decompress fine with either method - the checksums of the decompressed versions of both files were the same). Would this be a problem?
UPDATE: I found a fix for it by installing the backport (from Python 3.3) via peterjc's Git repository (link here), and now it's showing identical checksums. Not sure if it helps, but I made sure the LZMA Python module in my repository wasn't installed to avoid possible name conflicts.
Here's my test code to confirm this:
# I have created two identical text files with some random phrases
from subprocess import call
from hashlib import sha256
from backports import lzma as xz
f2 = open("test2.txt" , 'rb')
f2_buf = buffer(f2.read())
call(["xz", "test1.txt"])
f2_xzbuf = buffer(xz.compress(f2_buf))
f1 = open("test1.txt.xz", 'rb')
f1_xzbuf = buffer(f1.read())
f1.close(); f2.close()
f1sum = sha256(); f2sum = sha256()
f1sum.update(f1_xzbuf); f2sum.update(f2_xzbuf)
if f1sum.hexdigest() == f2sum.hexdigest():
print "Checksums OK"
else:
print "Checksum Error"
I've also verified it using the regular sha256sum as well (when I wrote the data to file).
out_file
at some point. – Hamhung