Faster alternative to Python's zipfile module?
Asked Answered
A

1

21

Is there a noticeably faster alternative to Python 2.7.4 zipfile module (with ZIP_DEFLATED) for zipping a large number of files into a single zip file? I had a look at czipfile https://pypi.python.org/pypi/czipfile/1.0.0, but that appears to be focused on faster decrypting (not compressing).

I am routinely having to process a large number of image files (~12,000 files of a combination of .exr and .tiff files) with each file between ~1MB - 6MB in size (and ~9 GB for all the files) into a single zip file for shipment. This zipping takes ~90 minutes to process (running on Windows 7 64bit).

If anyone can recommend a different python module (or alternatively a C/C++ library or even a standalone tool) that would be able to compress a large number of files into a single .zip file in less time than the zipfile module, that would be greatly appreciated (anything close to ~5-10% faster (or more) would be very helpful).

Akkerman answered 22/4, 2013 at 23:53 Comment(6)
Worst case, you can always call the shell and execute something like 7zip from pythonBonfire
Do the image files you're zipping already use compression versions of their respective file formats? If so, you're likely wasting significant processing time trying to compress them again with little gain by using ZIP_DEFLATED instead of ZIP_STORED. Also, the docs for both Python 2 & 3 say that the zipfile module only supports the decryption of encrypted files in ZIP archives, not creating them -- so how exactly are you doing this?Mig
@Bonfire thanks for the suggestion, I'll give it a try and test the performance. The files need to be .zip files, otherwise I would experiment more with other packing formats.Akkerman
@Akkerman 7zip can compress and extract zip, 7z, bzip2, gzip and many other formats. It has tunable compression level/method related parameters so you can choose a compromise between compression time, compression ratio and decompression time.Bonfire
@Mig you're right, the question should not have used the term "encryption" as that is not supported by the zipfile module. I should have focused specifically on the fast compression in this question (although the option of also encrypting the .zip would be useful). The image files (both .exr and .tiff) should all be compressed, so likely the ZIP_STORED option will make a difference. I'll try it out and see what speed (and .zip size) difference that option makes, thanks for the suggestion.Akkerman
If you care about decompressing, see also: #4998410, #61930945, #37141786Coreycorf
R
14

As Patashu mentions, outsourcing to 7-zip might be the best idea.

Here's some sample code to get you started:

import os
import subprocess

path_7zip = r"C:\Program Files\7-Zip\7z.exe"
path_working = r"C:\temp"
outfile_name = "compressed.zip"
os.chdir(path_working)

ret = subprocess.check_output([path_7zip, "a", "-tzip", outfile_name, "*.txt", "*.py", "-pSECRET"])

As martineau mentioned you might experiment with compression methods. This page gives some examples on how to change the command line parameters.

Recognizor answered 23/4, 2013 at 1:47 Comment(4)
Thanks for the help everyone. The outsourcing to 7-zip (on a 1GB subset of the .exr files) did speed up the compression time by nearly ~52% compared to the zipfile compression which is really great. I'm still running some tests on the uncompressed version that @Mig suggested compared with the tweaked settings of 7zip, but it looks like the 7zip solution will definitely be an improvement.Akkerman
@michaelhubbard.ca: Using 7-Zip is a good choice, it's an excellent utility for this sort of thing. Note there's a command-line only version of it called 7za.exe that's only 574 KB and all you probably need. Also, please add a comment somewhere and let us know what difference just storing vs compressing makes.Mig
@michaelhubbard.ca: I forgot to mention that 7-Zip also has very good encryption options, since you said you were interested in that option.Mig
@Mig thanks for the follow up and extra info. The stored (uncompressed) time for the zipfile was also a significant improvement (~38% faster than the zipfile compressed version) and the resulting archive was only ~3% larger than the compressed version. Thanks again for the help.Akkerman

© 2022 - 2024 — McMap. All rights reserved.