Ruby: Download zip file and extract
Asked Answered
H

2

8

I have a ruby script that downloads a remote ZIP file from a server using rubys opencommand. When I look into the downloaded content, it shows something like this:

PK\x03\x04\x14\x00\b\x00\b\x00\x9B\x84PG\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\n\x00\x10\x00foobar.txtUX\f\x00\x86\v!V\x85\v!V\xF6\x01\x14\x00K\xCB\xCFOJ,RH\x03S\\\x00PK\a\b\xC1\xC0\x1F\xE8\f\x00\x00\x00\x0E\x00\x00\x00PK\x01\x02\x15\x03\x14\x00\b\x00\b\x00\x9B\x84PG\xC1\xC0\x1F\xE8\f\x00\x00\x00\x0E\x00\x00\x00\n\x00\f\x00\x00\x00\x00\x00\x00\x00\x00@\xA4\x81\x00\x00\x00\x00foobar.txtUX\b\x00\x86\v!V\x85\v!VPK\x05\x06\x00\x00\x00\x00\x01\x00\x01\x00D\x00\x00\x00T\x00\x00\x00\x00\x00

I tried using the Rubyzip gem (https://github.com/rubyzip/rubyzip) along with its class Zip::ZipInputStream like this:

stream = open("http://localhost:3000/foobar.zip").read # this outputs the zip content from above
zip = Zip::ZipInputStream.new stream

Unfortunately, this throws an error:

 Failure/Error: zip = Zip::ZipInputStream.new stream
 ArgumentError:
   string contains null byte

My questions are:

  1. Is it possible, in general, to download a ZIP file and extract its content in-memory?
  2. Is Rubyzip the right library for it?
  3. If so, how can I extract the content?
Helbonna answered 16/10, 2015 at 14:43 Comment(0)
H
13

I found the solution myself and then at stackoverflow :D (How to iterate through an in-memory zip file in Ruby)

input = HTTParty.get("http://example.com/somedata.zip").body
Zip::InputStream.open(StringIO.new(input)) do |io|
  while entry = io.get_next_entry
    puts entry.name
    parse_zip_content io.read
  end
end
  1. Download your ZIP file, I'm using HTTParty for this (but you could also use ruby's open command (require 'open-uri').
  2. Convert it into a StringIO stream using StringIO.new(input)
  3. Iterate over every entry inside the ZIP archive using io.get_next_entry (it returns an instance of Entry)
  4. With io.read you get the content, and with entry.name you get the filename.
Helbonna answered 16/10, 2015 at 15:48 Comment(3)
I tried this same code above but doesn't work for me, I see this error: rubyzip-0.9.9/lib/zip/zip_input_stream.rb:52:in initialize': can't convert StringIO into String (TypeError)`Cartan
I'm using rubyzip 1.1.7, so maybe thats the problem? Have you copy & pasted my code from above? At which line (in your code) does the error happen?Helbonna
yea, turns out that the remote zip I was accessing was corrupted. All good now. Thanks.Cartan
E
7

Like I commented in https://mcmap.net/q/759521/-how-to-iterate-through-an-in-memory-zip-file-in-ruby, we can just use Zip::File.open_buffer:

require 'open-uri'

content = open('http://localhost:3000/foobar.zip')

Zip::File.open_buffer(content) do |zip|
  zip.each do |entry|
    puts entry.name
    # Do whatever you want with the content files.
  end
end
Ettie answered 9/4, 2017 at 5:4 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.