Is it possible to unzip a compressed file with multiple threads? [closed]
Asked Answered
P

2

27

I unzip using this :

unzip -q "file.zip" -d path

How can I unzip faster with utilizing multiple cores and threads?

Thanks

Phantom answered 16/2, 2018 at 8:57 Comment(8)
Could you please explain a bit, what do you mean by multi-core here?Lucillalucille
Why is multicore an issue here ? Do you mean you have big zip files and want to use multi threading to speed up the process ?Calipee
Interesting read: quora.com/…Droshky
I would like to be able to tell unzip to use 4 CPUs, for example is it possible?Phantom
Following the argumentation in the article I've linked, even if it would be possible, you won't gain so much extra speed.Droshky
hek2mgl It does not solve my problem, I would like to use the same toolPhantom
you can use pigz zlib.net/pigz a multithread implementation of gzip both when compressing and decompressing. Since gzip works on a single file when compressing a directory (eventually with subdirectories) you have to first make a tar archive.Cerulean
A python alternative(s) (dead project, but with 11 forks) based on pigz docs: github.com/vinlyx/mgzipPostpone
W
12

In short: No, unzipping with multiple cores is not available.

The decompression normally has lower CPU-intensity than the compression (where multiple cores are often involved).

You wouldn't have much of an advantage anyway as the read/write-operations are more of the bottlenecks during decompression.

Wilkes answered 16/2, 2018 at 9:20 Comment(6)
Ok thanks for your explain :)Phantom
NVMe SSDs are super fast, but unzipping a 6gb file still takes ages. Pretty sure the CPU is the bottleneck. What gives?Autogenous
Using github.com/lukehutch/quickunzip unzipping Xcode10.2.zip takes 2m26s, using Archive Utility on Mac - 3m55sAutogenous
It is the CPU bottlenecking it, 1 core will always be maxed out newer NVMe based drives, indicating the bottleneckDuodenitis
try ripunzip: medium.com/@adetaylor/does-parallel-unzipping-work-cdaf89124c88 (I built static version for local testing github.com/hemnstill/StandaloneTools/releases/tag/…) its x2 - x4 times faster then unzipConcord
As long as your NVMe drive is fast enough, decompressing large zip files can benefit from multiple CPU cores. Lots of cores, in fact. See the benchmarks at youtu.be/c2tzTMN6-qU?t=636. There's plenty of scaling with core count on that chart. Threadripper generally has lower clock rates than its normal desktop brethren, but beats them easily.Oney
S
0

thy pigz which takes advantage of the multi cores and unzips in multiple threads

Slip answered 16/2, 2018 at 9:1 Comment(3)
as per the man page, pigz doesn't use multi threading for decompressionCalipee
Still better than unzip. pigz uses a single thread (the main thread) for decompression, but will create three other threads for reading, writing, and check calculation, which can speed up decompression under some circumstances.Abstemious
A python alternative (dead project, but with 11 forks) based on pigz docs: github.com/vinlyx/mgzipPostpone

© 2022 - 2024 — McMap. All rights reserved.