Is it possible to download just part of a ZIP archive (e.g. one file)? [closed]
Asked Answered
E

6

19

Is there a way by which I can download only a part of a .rar or .zip file without downloading the whole file?

There is a ZIP file containing files A, B, C, and D. I only need A. Can I somehow tweak the download to download only A or if possible extract the file in the server itself and get A only?

Excrescent answered 17/12, 2011 at 6:54 Comment(2)
Despite the silly title, I think it's a pretty good question. Yes, it's "possible". However, the amount of work required is not trivial... for the end-user it's "not feasible" (unless someone has already created such a tool).Mycenaean
It depends a lot on your transfer protocol - you'll obviously need to use a protocol that can transfer ranges of files, rather than only complete files. For example, if your transfer protocol is NFS, then you might find that the standard archive tools are transparently doing exactly this.Dunfermline
S
12

The trick is to do what Sergio suggests without doing it manually. This is easy if you mount the ZIP file via an HTTP-backed virtual filesystem and then use the standard unzip command on it. This way the unzip utility's I/O calls are translated to HTTP range GETs, which means only the chunks of the ZIP file that you want get transferred over the network.

Here's an example for Linux using HTTPFS, a very lightweight virtual filesystem (it uses FUSE). There are similar tools for Windows.

Get/build httpfs:

$ wget http://sourceforge.net/projects/httpfs/files/httpfs/1.06.07.02
$ mv 1.06.07.10 httpfs_1.06.07.10.tar.bz2
$ tar -xjf httpfs_1.06.07.10.tar.bz2
$ rm httpfs
$ ./make_httpfs

Mount a remote ZIP file and extract one file from it:

$ mkdir mount_pt
$ sudo ./httpfs http://server.com/zipfile.zip mount_pt
$ sudo ls mount_pt
zipfile.zip
$ sudo unzip -p mount_pt/zipfile.zip the_file_I_want.txt > the_file_I_want.txt
$ sudo umount mount_pt

Of course you can also use whatever other tools beside the command-line one (I need sudo because it seems FUSE is set up that way on my machine, you shouldn't have to need it).

Shrubby answered 10/3, 2013 at 11:27 Comment(3)
Why do you use sudo?Flatus
Is there a simpler solution? I've tried this but I get annoying errors with fuse mount point. Also how to list the content of the zip in order to first know the exact name of the file we are targeting?Pliable
httpfs changed the filenames on sourceforge. Replace the 1st 2 commands with this 1 command: wget https://sourceforge.net/projects/httpfs/files/httpfs/1.06.07.02/httpfs_1.06.07.10.tar.bz2Onceover
K
9

In a way, yes, you can.

ZIP file format says that there's a "central directory". Basically, this is a table that stores what files are in the archive and what offsets do they have.

So, using Content-Range you could download part of the file from the end (the central directory is the last thing in a ZIP file) and try to identify the central directory in it. If you succeed then you know the file list and offsets, so you can proceed and get those chunks separately and decompress them yourself.

This approach is quite error-prone and is not guaranteed to work. But so is hacking in general :-)

Another possible approach would be to build a custom server for that (see pst's answer for more details).

Kanishakanji answered 17/12, 2011 at 7:12 Comment(3)
I wonder if there is a library that can map HTTP content range requests as some sort of perverse stream IO ... :) (Actually, it would be possible [fsvo], as described, for a number of languages that accept stream inputs. Not something I'd want to touch though.)Mycenaean
This is not hacking but the way to do the task right. Actually, HTTP here becomes just a way to access ZIP stream, and any ZIP component that works with streams can be used to extract just one file from the remote stream.Grovel
@EugeneMayevski'EldoSCorp Yes, you're probably right, I didn't look at it this way :-)Kanishakanji
O
3

There are several ways for a normal person to be able to download an individual file from a compressed ZIP file, unfortunately they aren't common knowledge. There are some open-source tools and online web services, including:

Onceover answered 5/9, 2013 at 12:46 Comment(1)
I wonder, if partial-zip worked for you. To me it seems like nice promise, which did not deliver anything to me.Chefoo
M
0

I think Sergio Tulentsev's idea is brilliant.

However, if there is control over the server -- e.g., custom code can be deployed -- then it is a rather trivial operation (in the scheme of things :) to map/handle a request, extract the relevant portion of the ZIP archive, and send the data back in the HTTP stream.

The request might look like:

http://foo.bar/myfile.zip_a.jpeg

Which would mean extract -- and return -- "a.jpeg" from "myfile.zip".

(I intentionally chose this silly format so that browsers would likely choose "myfile.zip_a.jpeg" as the name in the download dialog when it appears.)

Of course, how this is implemented depends on the server/language/framework and there may already be existing solutions that support a similar operation (but I know not).

Mycenaean answered 17/12, 2011 at 7:37 Comment(0)
Y
0

You can arrange for your file to appear in the back of the ZIP file.

Download 100k:

$ curl -r -100000 https://www.keepassx.org/releases/2.0.2/KeePassX-2.0.2.zip -o tail.zip
% Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                             Dload  Upload   Total   Spent    Left  Speed
100   97k  100   97k    0     0  84739      0  0:00:01  0:00:01 --:--:-- 84817

Check what files we did get:

$ unzip -t tail.zip
  (please check that you have transferred or created the zipfile in the
  appropriate BINARY mode and that you have compiled UnZip properly)
error [tail.zip]:  attempt to seek before beginning of zipfile
  (please check that you have transferred or created the zipfile in the
  appropriate BINARY mode and that you have compiled UnZip properly)
error [tail.zip]:  attempt to seek before beginning of zipfile
  (please check that you have transferred or created the zipfile in the
  appropriate BINARY mode and that you have compiled UnZip properly)
error [tail.zip]:  attempt to seek before beginning of zipfile
  (please check that you have transferred or created the zipfile in the
  appropriate BINARY mode and that you have compiled UnZip properly)
error [tail.zip]:  attempt to seek before beginning of zipfile
  (please check that you have transferred or created the zipfile in the
  appropriate BINARY mode and that you have compiled UnZip properly)
    testing: KeePassX-2.0.2/share/translations/keepassx_uk.qm   OK
    testing: KeePassX-2.0.2/share/translations/keepassx_zh_CN.qm   OK
    testing: KeePassX-2.0.2/share/translations/keepassx_zh_TW.qm   OK
    testing: KeePassX-2.0.2/zlib1.dll   OK
At least one error was detected in tail.zip.

Then extract the last file:

$ unzip tail.zip KeePassX-2.0.2/zlib1.dll
Archive:  tail.zip
error [tail.zip]:  missing 7751495 bytes in zipfile
  (attempting to process anyway)
  inflating: KeePassX-2.0.2/zlib1.dll
Yellow answered 11/7, 2016 at 20:5 Comment(0)
M
0

Based on the good input I have written a code-snippet in Powershell to show how it could work:

# demo code downloading a single DLL file from an online ZIP archive
# and extracting the DLL into memory to mount it finally to the main process.

cls
Remove-Variable * -ea 0

# definition for the ZIP archive, the file to be extracted and the checksum:
$url = 'https://github.com/sshnet/SSH.NET/releases/download/2020.0.1/SSH.NET-2020.0.1-bin.zip'
$sub = 'net40/Renci.SshNet.dll'
$md5 = '5B1AF51340F333CD8A49376B13AFCF9C'

# prepare HTTP client:
Add-Type -AssemblyName System.Net.Http
$handler = [System.Net.Http.HttpClientHandler]::new()
$client  = [System.Net.Http.HttpClient]::new($handler)

# get the length of the ZIP archive:
$req = [System.Net.HttpWebRequest]::Create($url)
$req.Method = 'HEAD'
$length = $req.GetResponse().ContentLength
$zip = [byte[]]::new($length)

# get the last 10k:
# how to get the correct length of the central ZIP directory here?
$start = $length-10kb
$end   = $length-1
$client.DefaultRequestHeaders.Add('Range', "bytes=$start-$end")
$result = $client.GetAsync($url).Result
$last10kb = $result.content.ReadAsByteArrayAsync().Result
$last10kb.CopyTo($zip, $start)

# get the block containing the DLL file:
# how to get the exact file-offset from the ZIP directory?
$start = $length-3537kb
$end   = $length-3201kb
$client.DefaultRequestHeaders.Clear()
$client.DefaultRequestHeaders.Add('Range', "bytes=$start-$end")
$result = $client.GetAsync($url).Result
$block = $result.content.ReadAsByteArrayAsync().Result
$block.CopyTo($zip, $start)

# extract the DLL file from archive:
Add-Type -AssemblyName System.IO.Compression
$stream = [System.IO.Memorystream]::new()
$stream.Write($zip,0,$zip.Length)
$archive = [System.IO.Compression.ZipArchive]::new($stream)
$entry = $archive.GetEntry($sub)
$bytes = [byte[]]::new($entry.Length)
[void]$entry.Open().Read($bytes, 0, $bytes.Length)

# check MD5:
$prov = [Security.Cryptography.MD5CryptoServiceProvider]::new().ComputeHash($bytes)
$hash = [string]::Concat($prov.foreach{$_.ToString("x2")})
if ($hash -ne $md5) {write-host 'dll has wrong checksum.' -f y ;break}

# load the DLL:
[void][System.Reflection.Assembly]::Load($bytes)

# use the single demo-call from the DLL:
$test = [Renci.SshNet.NoneAuthenticationMethod]::new('test')
'done.'
Magnification answered 10/4, 2021 at 7:13 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.