How to unzip a piped zip file (from "wget -qO-")?
Asked Answered
N

7

56

Any ideas on how to unzip a piped zip file like this:

wget -qO- http://downloads.wordpress.org/plugin/akismet.2.5.3.zip

I wished to unzip the file to a directory, like we used to do with a normal file:

wget -qO- http://downloads.wordpress.org/plugin/akismet.2.5.3.zip | unzip -d ~/Desktop
Nicolas answered 20/8, 2011 at 14:54 Comment(3)
While the question is valid, if you are using git to work with WordPress, there is now a Git mirror of each of them. Ignore my comment if its not your case :) Otherwise save yourself the problems of figuring out how to use such a path to automate your installation and head over to use Git submodule/Composer using github.com/wp-pluginsJonquil
zip requires random access to work. It cannot read incrementally from a pipe -- which is why the zsh-based answer creates a temporary file, not trying to work as a pipe.Couturier
usually you only want to write successful response to stdout. see also: write http error body to stderrCommie
V
7
wget -q -O tmp.zip http://downloads.wordpress.org/plugin/akismet.2.5.3.zip && unzip tmp.zip && rm tmp.zip
Vidrine answered 22/8, 2011 at 3:22 Comment(4)
The use of && is better once the next command only starts if the previous finished successfully. ThanksNicolas
This is not extracting de zip in a piped manner. With your proposal you need to use more disk space, and wear it out (important in SSD if the files are big). It is also more efficient to directly parallelise the download and the extraction.Saunderson
Also, -qO- -O tmp.zip is tautologic: you pass -O - and then -O tmp.zip which is pointless here.Sine
The question specifically asks for unzip from pipe. This answer uses temporary files instead, which may not work on read-only filesystems or other specific use-casesChoreodrama
P
66

The ZIP file format includes a directory (index) at the end of the archive. This directory says where, within the archive each file is located and thus allows for quick, random access, without reading the entire archive.

This would appear to pose a problem when attempting to read a ZIP archive through a pipe, in that the index is not accessed until the very end and so individual members cannot be correctly extracted until after the file has been entirely read and is no longer available. As such it appears unsurprising that most ZIP decompressors simply fail when the archive is supplied through a pipe.

The directory at the end of the archive is not the only location where file meta information is stored in the archive. In addition, individual entries also include this information in a local file header, for redundancy purposes.

Although not every ZIP decompressor will use local file headers when the index is unavailable, the tar and cpio front ends to libarchive (a.k.a. bsdtar and bsdcpio) can and will do so when reading through a pipe, meaning that the following is possible:

wget -qO- http://downloads.wordpress.org/plugin/akismet.2.5.3.zip | bsdtar -xvf- -C ~/Desktop
Psychoanalysis answered 16/4, 2014 at 11:35 Comment(3)
I have a .zip-file here that contains files with executable permissions. When I download and pipe into bsdtar, the exec bits get thrown away. When I download to disk and extract with bsdtar or unzip then, the exec bits are honoured.Decrial
What is the rationale behind including a directory (index) at the end of the archive? Where is to read about that?Lcm
@Lcm Look up the history of the ZIP filetype. It's because when creating a zip file, you may not know until the end where all the files have come from. Going back to insert a header at the start of a file you've already written is a challenge I suspect Phil Katz may have preferred to avoid.Uppercase
B
25

BusyBox's unzip can take stdin and extract all the files.

wget -qO- http://downloads.wordpress.org/plugin/akismet.2.5.3.zip | busybox unzip -

The dash after unzip is to use stdin as input.

You can even,

cat file.zip | busybox unzip -

But that's just redundant of unzip file.zip.

If your distro uses BusyBox by default (e.g. Alpine), just run unzip -.

Bermejo answered 11/10, 2018 at 12:7 Comment(4)
Busybox 1.22.0 fails with Archive: - unzip: lseek: Illegal seek in Debian. What version of Busybox did you use?Sweetbread
v1.27.2 on Ubuntu 18.10Bermejo
This didn't work for me on Alpine 3.10 (via Docker). (Not ragging on you, I think it's a useful answer and that comments about working/non-working versions are also helpful)Nevarez
unzip in some versions of BusyBox (e.g. 1.27.2) doesn't support Zip64, thus it works only for member files smaller than 4 GiB.Elite
P
17

just use zcat

wget -qO- http://downloads.wordpress.org/plugin/akismet.2.5.3.zip | zcat >> myfile.txt
  • This will only extract first file. You will see this error message "gzip: stdin has more than one entry--rest ignored" after the first file is extracted.
Pullet answered 14/9, 2017 at 9:58 Comment(4)
This is an O <-- RememberSibbie
This was what I was looking for. Some files I curl now and again are just single files zipped (don't know why, they're not particularly large) and I don't have control over them being in this format. Using zcat was the solution for me here!Hepsibah
Annoying gotcha - this only works with GNU zcat/gzip, NOT BSD gzipTiedeman
zcat works perfectly.Bibliographer
G
15

While the following will not work in bash, it will work in zsh. Since many zsh users may end up here, it may still be useful:

% unzip =( wget -qO- http://downloads.wordpress.org/plugin/akismet.2.5.3.zip )
Archive:  /tmp/zshLCod6x
   creating: akismet/
  inflating: akismet/admin.php       
  inflating: akismet/akismet.css     
  inflating: akismet/akismet.gif     
  inflating: akismet/akismet.js      
  inflating: akismet/akismet.php     
  inflating: akismet/legacy.php      
  inflating: akismet/readme.txt      
  inflating: akismet/widget.php      
% 

As you can notice the temporary downloaded zip file has been deleted straight away :

% ls /tmp/zshLCod6x
ls: cannot access '/tmp/zshLCod6x': No such file or directory
% 
Gingersnap answered 14/11, 2013 at 22:14 Comment(4)
Note that this will anyway download the full file before running unzip, which is not the original question.Saunderson
True. Unfortunately, the zip file format puts its "central directory" at the end of the file, and the unzipping algorithm first reads that directory before processing the files. Hence, a true piping solution that correctly unzips isn't really a possibility. (This is also a problem for web applications that want to process large uploaded zip files - it cannot be done in a streaming fashion.)Gingersnap
While it is true that there is an index at the end of the file, containing "authoritative" information on which files have been deleted from the archive (without the need to regenerate it at each deletion), I can successfully extract a simple ZIP in a pipelined way with bsdtar, because there are headers indeed preceding each file. bsdtar would probably give bad results in case the archive has been modified ("phantom" files would appear, since it is not known till the end of the archive which ones are the latest version).Saunderson
Very neat - i had never seen that form of process substitution in zsh before zsh.sourceforge.io/Intro/intro_7.htmlTiedeman
V
7
wget -q -O tmp.zip http://downloads.wordpress.org/plugin/akismet.2.5.3.zip && unzip tmp.zip && rm tmp.zip
Vidrine answered 22/8, 2011 at 3:22 Comment(4)
The use of && is better once the next command only starts if the previous finished successfully. ThanksNicolas
This is not extracting de zip in a piped manner. With your proposal you need to use more disk space, and wear it out (important in SSD if the files are big). It is also more efficient to directly parallelise the download and the extraction.Saunderson
Also, -qO- -O tmp.zip is tautologic: you pass -O - and then -O tmp.zip which is pointless here.Sine
The question specifically asks for unzip from pipe. This answer uses temporary files instead, which may not work on read-only filesystems or other specific use-casesChoreodrama
R
5

I'd take a look at funzip (http://www.info-zip.org/mans/funzip.html). The man page for it notes,

...filter for extracting from a ZIP archive in a pipe

Sorry I don't have an example, but it looks like it does come with the Linux unzip utility.

Renin answered 20/8, 2011 at 14:59 Comment(1)
It only dumps the FIRST FILE. funzip without a file argument acts as a filter; that is, it assumes that a ZIP archive (or a gzip'd(1) file) is being piped into standard input, and it extracts the first member from the archive to stdout.Saunderson
E
2

Reposting my answer:

I wrote a Python (2.x) script to do streaming extraction of ZIP archives, you can get it from here: https://raw.githubusercontent.com/pts/unzip_scan/master/unzip_scan.py . Usage: cat file.zip | sh unzip_scan.py -.

Elite answered 28/1, 2021 at 14:0 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.