What is the fastest way to extract 1 file from a zip file which contain a lot of file?
Asked Answered
V

4

11

I tried the java.util.zip package, it is too slow.

Then I found LZMA SDK and 7z jbinding but they are also lacking something.

The LZMA SDK does not provide a kind of documentation/tutorial of how-to-use, it is very frustrating. No javadoc.

While the 7z jbinding does not provide a simple way to extract only 1 file, however, it only provide way to extract all the content of the zip file. Moreover, it does not provide a way to specify a location to place the unzipped file.

Any idea please?

Vandiver answered 30/3, 2011 at 8:56 Comment(0)
I
16

What does your code with java.util.zip look like and how big of a zip file are you dealing with?

I'm able to extract a 4MB entry out of a 200MB zip file with 1,800 entries in roughly a second with this:

OutputStream out = new FileOutputStream("your.file");
FileInputStream fin = new FileInputStream("your.zip");
BufferedInputStream bin = new BufferedInputStream(fin);
ZipInputStream zin = new ZipInputStream(bin);
ZipEntry ze = null;
while ((ze = zin.getNextEntry()) != null) {
    if (ze.getName().equals("your.file")) {
        byte[] buffer = new byte[8192];
        int len;
        while ((len = zin.read(buffer)) != -1) {
            out.write(buffer, 0, len);
        }
        out.close();
        break;
    }
}
Infighting answered 30/3, 2011 at 9:11 Comment(6)
We are doing a web app for outsider usage.....we are talking about 20 request per second. We've examine that the speed of unzipping a file using java.util.zip would be between 0.5s to 2s. That's slow.Vandiver
I assume you're talking about 20 requests per second that need a single file out of a zip. Why not just completely unzip those ahead of time and serve them directly from the file system?Infighting
your code would be even faster if you had a BufferedInputStream between the FileInputStream and the ZipInputStream.Dup
Also, shouldn't you do a zin.close() at the end?Torsibility
whether zin.closeEntry() is not required ?Basham
Reading the whole ZIP file to extract only one entry is not a good way. Using getEntry() method of ZipFile or using FileSystem is much better.Bautista
U
13

I have not benchmarked the speed but with java 7 or greater, I extract a file as follows.
I would imagine that it's faster than the ZipFile API:

A short example extracting META-INF/MANIFEST.MF from a zip file test.zip:

// file to extract from zip file
String file = "MANIFEST.MF";
// location to extract the file to
File outputLocation = new File("D:/temp/", file);
// path to the zip file
Path zipFile = Paths.get("D:/temp/test.zip");

// load zip file as filesystem
try (FileSystem fileSystem = FileSystems.newFileSystem(zipFile)) {
    // copy file from zip file to output location
    Path source = fileSystem.getPath("META-INF/" + file);
    Files.copy(source, outputLocation.toPath());
}
Ulterior answered 8/10, 2014 at 12:37 Comment(1)
Works, and lightning fast ... this should be the accepted answer (assuming Java 7 or higher).Volpe
A
5

Use a ZipFile rather than a ZipInputStream.

Although the documentation does not indicate this (it's in the docs for JarFile), it should use random-access file operations to read the file. Since a ZIPfile contains a directory at a known location, this means a LOT less IO has to happen to find a particular file.

Some caveats: to the best of my knowledge, the Sun implementation uses a memory-mapped file. This means that your virtual address space has to be large enough to hold the file as well as everything else in your JVM. Which may be a problem for a 32-bit server. On the other hand, it may be smart enough to avoid memory-mapping on 32-bit, or memory-map just the directory; I haven't tried.

Also, if you're using multiple files, be sure to use a try/finally to ensure that the file is closed after use.

Accipiter answered 31/3, 2011 at 14:32 Comment(0)
B
0

The code snippet below assumes you know both the target zip filepath and the target entry filepath inside it.

No need to iterate through the files as ZipFile provides a method getEntry to retrieve an entry directly as well as methods to get a byte[] or a FileInputStream with its contents.

In this example it reads a protobuf binary file with about 340KB from a zip file in ~11ms. One may use a similar approach to read any other file type.


    /* Relevant imports */
    import com.google.protobuf.Message;
    import com.google.protobuf.Parser;
    import java.nio.file.Path;
    import java.util.zip.ZipEntry;
    import java.util.zip.ZipFile;
    
    public final class ZipFileUtils {

        ...

        public static <T extends Message> Message readMessageFromZip(
                                                final Path zipPath, 
                                                final Path entryPath, 
                                                final Parser<T> messageParser        
                                             ) throws IOException {
            try (ZipFile zipFile = new ZipFile(zipPath.toFile())) {
                ZipEntry zipEntry = zipFile.getEntry(entryPath.toString());
                return messageParser.parseFrom(zipFile.getInputStream(zipEntry));
            }
        }
    }

Boxberry answered 25/8, 2023 at 20:30 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.