How to iterate through an in-memory zip file in Ruby

F

8

15

I am writing a unit test, and one of them is returning a zip file and I want to check the content of this zip file, grab some values from it, and pass the values to the next tests.

I'm using Rack Test, so I know the content of my zip file is inside last_response.body. I have looked through the documentation of RubyZip but it seems that it's always expecting a file. Since I'm running a unit test, I prefer to have everything done in the memory as not to pollute any folder with test zip files, if possible.

Fieldstone answered 5/12, 2012 at 19:13 Comment(0)

S

8

See @bronson’s answer for a more up to date version of this answer using the newer RubyZip API.

The Rubyzip docs you linked to look a bit old. The latest release (0.9.9) can handle IO objects, so you can use a StringIO (with a little tweaking).

Even though the api will accept an IO, it still seems to assumes it’s a file and tries to call path on it, so first monkey patch StringIO to add a path method (it doesn’t need to actually do anything):

require 'stringio'
class StringIO
  def path
  end
end

Then you can do something like:

require 'zip/zip'
Zip::ZipInputStream.open_buffer(StringIO.new(last_response.body)) do |io|
  while (entry = io.get_next_entry)
    # deal with your zip contents here, e.g.
    puts "Contents of #{entry.name}: '#{io.read}'"
  end
end

and everything will be done in memory.

Sheilahshekel answered 5/12, 2012 at 21:23 Comment(0)

M

20

Matt's answer is exactly right. Here it is updated to the new API:

Zip::InputStream.open(StringIO.new(input)) do |io|
  while entry = io.get_next_entry
    if entry.name == 'doc.kml'
      parse_kml(io.read)
    else
      raise "unknown entry in kmz file: #{entry.name}"
    end
  end
end

And there's no need to monkeypatch StringIO anymore. Progress!

Masterwork answered 9/12, 2013 at 16:44 Comment(2)

what does input need to be? File? path? – Huysmans 25/1, 2021 at 20:56

A string of binary Data. That is the content of the zip file if read from file or if requested from url – Ammann 9/6, 2022 at 14:7

S

8

See @bronson’s answer for a more up to date version of this answer using the newer RubyZip API.

The Rubyzip docs you linked to look a bit old. The latest release (0.9.9) can handle IO objects, so you can use a StringIO (with a little tweaking).

Even though the api will accept an IO, it still seems to assumes it’s a file and tries to call path on it, so first monkey patch StringIO to add a path method (it doesn’t need to actually do anything):

require 'stringio'
class StringIO
  def path
  end
end

Then you can do something like:

require 'zip/zip'
Zip::ZipInputStream.open_buffer(StringIO.new(last_response.body)) do |io|
  while (entry = io.get_next_entry)
    # deal with your zip contents here, e.g.
    puts "Contents of #{entry.name}: '#{io.read}'"
  end
end

and everything will be done in memory.

Sheilahshekel answered 5/12, 2012 at 21:23 Comment(0)

C

8

Zip::File.open_buffer(content) do |zip|
  zip.each do |entry|
    decompressed_data += entry.get_input_stream.read
  end
end

Cristal answered 31/7, 2015 at 17:43 Comment(1)

some explanation can go a long way to encourage understanding rather than copy-paste-move on – Fibre 31/7, 2015 at 22:56

A

4

With RubyZip version 1.2.1 (or maybe some previous versions too), we just need to use open_buffer method of Zip::File class.

From RubyZip documentation:

Like #open, but reads zip archive contents from a String or open IO stream, and outputs data to a buffer. (This can be used to extract data from a downloaded zip archive without first saving it to disk.)

Example:

Zip::File.open_buffer(last_response.body) do |zip|
  zip.each do |entry|
    puts entry.name
    # Do whatever you want with the content files.
  end
end

Astrology answered 9/4, 2017 at 5:0 Comment(2)

Is this working for you? When I do this, I get the error detailed here – Ghazi 17/4, 2017 at 20:49

Works for me, recommend this – Stratfordonavon 14/3, 2020 at 8:55

A

1

You could use Tempfile to dump the zip file into a temporary file. Tempfile creates an operation-system specific temporary file which will be cleaned up by the OS after your program finishes.

Adjutant answered 5/12, 2012 at 19:49 Comment(1)

On POSIX systems, the temporary file is already "deleted" when you get it, so there's no clean-up required. It's the closest thing you can get to a naked filehandle to a transient file object. – Bunni 5/12, 2012 at 20:38

W

1

Inspired by Matt's answer I have a slightly modified solution for those who have to use 0.9.x rubyzip gem. Mine doesn't require a new class definition.

sio = StringIO.new(response.body)
sio.define_singleton_method(:path) {} #needed to create fake method path TO satisfy the ancient rubyzip 0.9.8 gem
Zip::ZipInputStream::open_buffer(sio) { |io|
    while (entry = io.get_next_entry)
        puts "Contents of #{entry.name}"
     end
}

Weidman answered 19/11, 2015 at 17:4 Comment(0)

A

1

This worked for me. In my case I have only one file so I used a fixed path, but you can use entry.name to build your path.

input = HTTParty.get(link).body
Zip::File.open_buffer(input) do |zip_file|
    zip_file.each do |entry|
      entry.extract(path)
    end
end

Apanage answered 2/5, 2017 at 23:23 Comment(0)

A

0

Just an update on this one due to changes at rubyzip:

Zip::InputStream.open(StringIO.new(zip_file)) do |io|
  while (entry = io.get_next_entry)
    # deal with your zip contents here, e.g.
    puts "Contents of #{entry.name}: '#{io.read}'"
  end
end

Allotropy answered 27/5, 2014 at 0:27 Comment(0)

Recommended topics

Hot tags