Java: Error creating a GZIPInputStream: Not in GZIP format
Asked Answered
I

3

10

I am trying to use the following Java code to compress and uncompress a String. But the line that creates a new GZipInputStream object out of a new ByteArrayInputStream object throws a "java.util.zip.ZipException: Not in GZIP format" exception. Does anyone know how to solve this?

        String orig = ".............";

        // compress it
        ByteArrayOutputStream baostream = new ByteArrayOutputStream();
        OutputStream outStream = new GZIPOutputStream(baostream);
        outStream.write(orig.getBytes());
        outStream.close();
        String compressedStr = baostream.toString();

        // uncompress it
        InputStream inStream = new GZIPInputStream(new ByteArrayInputStream(compressedStr.getBytes()));
        ByteArrayOutputStream baoStream2 = new ByteArrayOutputStream();
        byte[] buffer = new byte[8192];
        int len;
        while((len = inStream.read(buffer))>0)
            baoStream2.write(buffer, 0, len);
        String uncompressedStr = baoStream2.toString();
Intima answered 22/1, 2013 at 19:50 Comment(0)
H
13

Mixing String and byte[]; that does never fit. And only works on the the same OS with same encoding. Not every byte[] can be converted to a String, and the conversion back could give other bytes.

The compressedBytes need not represent a String.

Explicitly set the encoding in getBytes and new String.

    String orig = ".............";

    // Compress it
    ByteArrayOutputStream baostream = new ByteArrayOutputStream();
    OutputStream outStream = new GZIPOutputStream(baostream);
    outStream.write(orig.getBytes("UTF-8"));
    outStream.close();
    byte[] compressedBytes = baostream.toByteArray(); // toString not always possible

    // Uncompress it
    InputStream inStream = new GZIPInputStream(
            new ByteArrayInputStream(compressedBytes));
    ByteArrayOutputStream baoStream2 = new ByteArrayOutputStream();
    byte[] buffer = new byte[8192];
    int len;
    while ((len = inStream.read(buffer)) > 0) {
        baoStream2.write(buffer, 0, len);
    }
    String uncompressedStr = baoStream2.toString("UTF-8");

    System.out.println("orig: " + orig);
    System.out.println("unc:  " + uncompressedStr);
Hawn answered 22/1, 2013 at 20:6 Comment(0)
M
7

Joop seems to have the solution up there, but I feel I must add this: Compression in general, and GZIP in particular will produce a binary stream. You MUST not try to construct a String from this stream - it WILL break.

If you need to take it to a plain text representation, look into Base64 encoding, hex encoding, heck, even simple binary encoding.

In short, String objects are for things that humans read. Byte arrays (and many other things) are for things machines read.

Maker answered 22/1, 2013 at 20:23 Comment(0)
A
0

You encoded baostream to a string with your default platform encoding, probably UTF-8. You should be using baostream.getBytes() to work with binary data, not strings.

If you insist on a string, use an 8-bit encoding, e.h. baostream.toString("ISO-8859-1"), and read it back with the same charset.

Aeromarine answered 22/1, 2013 at 19:55 Comment(3)
even with specified character encoding on both ends, storing the bytes directly into strings can get dicey. A better way to use a String would probably be to use Base64 encoding of the binary data. Apache Commons codec provides a really nice class for Base64 encoding and decodingPimentel
also UTF-8 is definitely an 8-bit encoding.Pimentel
The colloquial meaning of "8-bit encoding" is that every code unit is exactly 8 bits, which is most definitely not the case with UTF-8 in codepoints above ascii. But yes, abuse of an encoding leads indirectly to abuse of terminology here, and your suggestion of base64 encoding is much better.Aeromarine

© 2022 - 2024 — McMap. All rights reserved.