Zip / 7zip Compression Differences
Asked Answered
C

2

15

I have a number of zip files that I need to distribute to users, around 130 of them. Each zip file contains a number of similar text, html, xml, and jpg files. In total, the zip files total 146 megabytes; unzipped, their contents total 551mb.

I want to distribute all these files together to users in as small a format as possible. I looked into two different ways of doing it, each using two different compression schemes, zip and 7zip (which I understand is either LZMA or a variant thereof):

  1. Compress all the zip files into a compressed file and send that file (single.zip/7z)
  2. Compress the unzipped contents of the zip files into a compressed file and send that file (combined.zip/7z)

For example, say that I have 3 zip files, A.zip, B.zip and C.zip, each of which contains one text file, one html file, and one XML file. With method 1, a single compressed file would be created containing A.zip, B.zip and C.zip. With method 2, a single compressed file would be created containing A.txt, A.html, A.xml, B.txt, B.html, B.xml, C.txt, C.html, and C.xml.

My assumption was that under either compression scheme, the file generated by method 2 would be smaller or at least the same size as the file generated by method 1, as you might be able to exploit efficiencies by considering all the files together. At the very least, method 2 would avoid the overhead of multiple zip files.

The surprising results (the sizes of files generated by the 7zip tool) were as follows:

  1. single.zip - 142mb
  2. single.7z - 124mb
  3. combined.zip - 149mb
  4. combined.7z - 38mb

I'm not surprised that the 7zip format produced smaller files than the zip format (result 2/4 vs result 1/3), as it generally compresses better than zip. What was surprising was that for the zip format, compressing all 130 zip files together resulted in a smaller output file than compressing all their uncompressed contents (result 3 vs result 1).

Why is it more efficient to zip several zip files together, than to zip their unzipped contents together?

The only thing I can think of is that during compression, the 7zip format builds a dictionary across all the file contents, so it can exploit similarities between files, while the zip format builds the dictionary per-file. Is that true? And even that still doesn't explain why result 3 was 7mb larger than result 1.

Thanks for your help.

Casuist answered 24/2, 2014 at 15:56 Comment(0)
B
11
  • Both .zip and .7z are lossless compression formats. .7z is newer and is likely to give you a better compression ratio, but it's not as widely supported as .zip, and I think it's somewhat more computationally expensive to compress/decompress.

  • The how much better is dependent on the types of files you are compressing but according to the wikipedia article on 7zip

    In 2011, TopTenReviews found that the 7z compression was at least 17% better than ZIP, and 7-Zip's own site has since 2002 reported that while compression ratio results are very dependent upon the data used for the tests, "Usually, 7-Zip compresses to 7z format 30–70% better than to zip format, and 7-Zip compresses to zip format 2–10% better than most other zip-compatible programs."

Boscage answered 27/8, 2020 at 4:24 Comment(1)
cool I didn't realize that 7zip zips zips better than other zip programsKiki
E
4

Why is it more efficient to zip several zip files together, than to zip their unzipped contents together?

Your assumption is correct: 7zip uses Solid compression which zip does not. And it works similar to your dictionary idea. By combining common parts of different files into one 'block' and so reducing the size.

Esotropia answered 4/10, 2021 at 20:20 Comment(2)
I had 3.7GB worth of SQL files, compressed using 7zip it came down to 21MB. I was utterly shocked. I'd recommend 7z over zip any day.Chayachayote
Well SQL is extremely redundant. If you have something like video files or images that are usually already compressed it will not be nearly as efficient.Maize

© 2022 - 2024 — McMap. All rights reserved.