Preferred way to use Java ZipOutputStream and BufferedOutputStream
Asked Answered
R

2

69

In Java does it matter whether I instantiate a ZipOutputStream first, or the BufferedOutputStream first? Example:

FileOutputStream dest = new FileOutputStream(file);
ZipOutputStream zip = new ZipOutputStream(new BufferedOutputStream(dest));

// use zip output stream to write to

Or:

FileOutputStream dest = new FileOutputStream(file);
BufferedOutputStream out = new BufferedOutputStream(new ZipOutputStream(dest));

// use buffered stream to write to

In my non-scientific timings I can't seem to tell much of a difference here. I can't see anything in the Java API that says if one of these ways is necessary or preferred. Any advice? It seems like compressing the output first and then buffering it for writes would be more efficient.

Rakehell answered 22/1, 2013 at 15:43 Comment(1)
Theoretically, compressing then buffering is going to be faster. However, GZipOutputStream has an internal buffer, so it doesn't write individual bytes out to the underlying stream. Depending on the underlying stream type (eg, file vs socket) and the relative sizes of the buffers, you may or may not see any difference.Mesomorphic
G
108

You should always wrap the BufferedOutputStream with the ZipOutputStream, never the other way around. See the below code:

FileOutputStream fos = new FileOutputStream("hello-world.zip");
BufferedOutputStream bos = new BufferedOutputStream(fos);
ZipOutputStream zos = new ZipOutputStream(bos);

try {
    for (int i = 0; i < 10; i++) {
        // not available on BufferedOutputStream
        zos.putNextEntry(new ZipEntry("hello-world." + i + ".txt"));
        zos.write("Hello World!".getBytes());
        // not available on BufferedOutputStream
        zos.closeEntry();
    }
}
finally {
    zos.close();
}

As the comments say the putNextEntry() and closeEntry() methods are not available on the BufferedOutputStream. Without calling those methods ZipOutputStream throws an exception java.util.zip.ZipException: no current ZIP entry.

For the sake of completeness, it is worth noting that the finally clause only calls close() on the ZipOutputStream. This is because by convention all built-in Java output stream wrapper implementations propagate closing.

EDIT

I just tested it the other way around. It turns out that wrapping a ZipOutputStream with BufferedOutputStream and then only calling write() on it (without creating / closing entries) will not throw a ZipException. Instead the resulting ZIP file will be corrupt, without any entries inside it.

Germicide answered 19/6, 2013 at 11:38 Comment(3)
In that case, is there any sense for buffering? I am not arguing here, just being curious if anyone checked, so far.Arius
As you can see in the first part of MrSmith42's answer, using an inner BufferedOutputStream could be potentially beneficial, by buffering the already compressed output stream before writing to the disk. You will use a bit more memory (for keeping the zip compressed bytes in the memory buffer before flushing to disk) but is more efficient, as disk I/O is done in larger chucks of bytes (the size of the buffer the BufferedOutputStream was initialized with).Germicide
What buffer size BufferedOutputStream inside a ZipOutputStream is the most performant for you, you should figure out yourself,Germicide
F
24

You should:

ZipOutputStream out =  new ZipOutputStream(new BufferedOutputStream(dest));

because you want to buffer the writing to the disc (because this is much more efficient in big data blocks than in a lot of little ones).


This

new BufferedOutputStream(new ZipOutputStream(dest));

would buffer before zip compression. But this all happens in the memory and does not need buffering because a lot of little memory accesses are about the same speed as a few big ones. In memory general the needed time is proportional to the number of bytes read/write.

As mentioned in the comments:

The methods of ZipOutputStream which are not part of BufferedOutputStream would not be available also. E.g. putNextEntry and closeEntry.

Firstborn answered 22/1, 2013 at 15:45 Comment(3)
I am sure my answer is correct. But feel free to try it both ways and compare the performance (or debug them).Firstborn
My point was that there is no meaning of comparing any performance between the two. Wrapping the ZipOutputStream in a BufferedOutputStream is meaningless altogether, as it does not expose the putNextEntry and closeEntry methods.Germicide
Down-voting as the answer does not mention the fact that the methods of the ZipOutputStream are not available when on the stream if wrapping the wrong way.Tallie

© 2022 - 2024 — McMap. All rights reserved.