How to create a multipart zip file and read it back?
Asked Answered
T

2

15

How would I properly zip bytes to a ByteArrayOutputStream and then read that using a ByteArrayInputStream? I have the following method:

private byte[] getZippedBytes(final String fileName, final byte[] input) throws Exception {
    ByteArrayOutputStream bos = new ByteArrayOutputStream();
    ZipOutputStream zipOut = new ZipOutputStream(bos);
    ZipEntry entry = new ZipEntry(fileName);
    entry.setSize(input.length);
    zipOut.putNextEntry(entry);
    zipOut.write(input, 0, input.length);
    zipOut.closeEntry();
    zipOut.close();

    //Turn right around and unzip what we just zipped
    ZipInputStream zipIn = new ZipInputStream(new ByteArrayInputStream(bos.toByteArray()));

    while((entry = zipIn.getNextEntry()) != null) {
        assert entry.getSize() >= 0;
    }

    return bos.toByteArray();
}

When I execute this code, the assertion at the bottom fails because entry.size is -1. I don't understand why the extracted entity doesn't match the entity that was zipped.

Triviality answered 20/12, 2016 at 16:29 Comment(3)
Why? You already have the bytes. Why would you want zip and unzip them just to get back what you already have?Kentigerma
This is just a sample as a proof of concept. In my actual scenario I'm creating a mock multipart file with the zipped file's bytes so I can test that another class is correctly unzipping the content.Triviality
What is the size of bos.toByteArray()?Siddons
C
16

Why is the size -1?

Calling getNextEntry in a ZipInputStream just position the read cursor at start of the entry to read.

The size (along with other metadata) is stored at the end of the actual data, therefore is not readily available when the cursor is positioned at the start.

These information becomes available only after you read the whole entry data or just go to the next entry.

For example, going to the next entry:

// position at the start of the first entry
entry = zipIn.getNextEntry();
ZipEntry firstEntry = entry;    
// size is not yet available
System.out.println("before " + firstEntry.getSize()); // prints -1

// position at the start of the second entry
entry = zipIn.getNextEntry();
// size is now available
System.out.println("after " + firstEntry.getSize()); // prints the size

or reading the whole entry data:

// position at the start of the first entry
entry = zipIn.getNextEntry();
// size is not yet available
System.out.println("before " + entry.getSize()); // prints -1

// read the whole entry data
while(zipIn.read() != -1);

// size is now available
System.out.println("after " + entry.getSize()); // prints the size

Your misunderstanding is quite common and there are a number of bug reports regarding this problem (which are closed as "Not an Issue"), like JDK-4079029, JDK-4113731, JDK-6491622.

As also mentioned in the bug reports, you could use ZipFile instead of ZipInputStream which would allow to reach the size information prior to access the entry data; but to create a ZipFile you need a File (see the constructors) instead of a byte array.

For example:

File file = new File( "test.zip" );
ZipFile zipFile = new ZipFile(file);

Enumeration enumeration = zipFile.entries();
while (enumeration.hasMoreElements()) {
    ZipEntry zipEntry = (ZipEntry) enumeration.nextElement();
    System.out.println(zipEntry.getSize()); // prints the size
}

How to get the data from the input stream?

If you want to check if the unzipped data is equal to the original input data, you could read from the input stream like so:

byte[] output = new byte[input.length];
entry = zipIn.getNextEntry();
zipIn.read(output);

System.out.println("Are they equal? " + Arrays.equals(input, output));

// and if we want the size
zipIn.getNextEntry(); // or zipIn.read();
System.out.println("and the size is " + entry.getSize());

Now output should have the same content as input.

Cerebellum answered 31/12, 2016 at 0:58 Comment(2)
Apparently using ZipInputStream#closeEntry() has the same effect as ZipInputStream#getNextEntry() as far as ZipEntry#getSize() is concerned. In any case, both the above approaches will not allow the preceding entry's data to be read once they are invoked.Jimjimdandy
@RavindraHV if you think about it, it's quite logical: according to the Javadoc of closeEntry(): "Closes the current ZIP entry and positions the stream for reading the next entry." This actually means for me (based on my limited knowledge of the ZIP layout) that the entry must be read into a blackhole to be able to "close" it. In this case probably they leveraged some common facility in getNextEntry() and closeEntry(), and that common facility in turn sets the size of the previous entry.Canary
E
0

How to zip byte[] and unzip it back?

I routinely use the following methods to deflate/inflate (zip/unzip) small byte[] (i.e. when it fits in memory). It is based on the example given in the Deflater javadoc and uses Deflater class to compress data and Inflater class to uncompress it back:

public static byte[] compress(byte[] source, int level) {
    Deflater compresser = new Deflater(level);
    compresser.setInput(source);
    compresser.finish();
    byte[] buf = new byte[1024];
    ByteArrayOutputStream bos = new ByteArrayOutputStream(1024);
    int n;
    while ((n = compresser.deflate(buf)) > 0)
        bos.write(buf, 0, n);
    compresser.end();
    return bos.toByteArray(); // You could as well return "bos" directly
}

public static byte[] uncompress(byte[] source) {
    Inflater decompresser = new Inflater();
    decompresser.setInput(source);
    byte[] buf = new byte[1024];
    ByteArrayOutputStream bos = new ByteArrayOutputStream(1024);
    try {
        int n;
        while ((n = decompresser.inflate(buf)) > 0)
            bos.write(buf, 0, n);
        return bos.toByteArray();
    } catch (DataFormatException e) {
        return null;
    } finally {
        decompresser.end();
    }
}

There is no need for a ByteArrayInputStream, but you could use an InflaterInputStream wrapping it, if you really want to (but using the Inflater directly is easier).

Endplay answered 1/1, 2017 at 23:27 Comment(5)
For those who want to downvote again without commenting on how to improve the answer (which is not illegal), the question is "How to zip bytes to ByteArrayOutputStream and back", not "How to use ZipFile" to achieve compression.Endplay
The title should probably be edited (I'll find something better later), but the issue at hand is reading the ZipEntry details as you're unzipping. Your answer doesn't address that.Dermal
@SotiriosDelimanolis thank you for the feedback. To me it sounded like the title is ok but ZipEntry was the wrong tool to achieve it, hence my answer. But thanks again to have let me a chance to explain myself :)Endplay
Based on this SO Update #4 the Deflater (and Inflater) has its own problems.Canary
@D.Kovács thanks for the link. The workaround given in the OpenJDK bug is to call Deflater/Inflater.end(), which is done is the above code.Endplay

© 2022 - 2024 — McMap. All rights reserved.