Extract .gz files in java
Asked Answered
S

2

10

I'm trying to unzip some .gz files in java. After some researches i wrote this method:

    public static void gunzipIt(String name){

    byte[] buffer = new byte[1024];

    try{

        GZIPInputStream gzis = new GZIPInputStream(new FileInputStream("/var/www/html/grepobot/API/"+ name + ".txt.gz"));
        FileOutputStream out = new FileOutputStream("/var/www/html/grepobot/API/"+ name + ".txt");

        int len;
        while ((len = gzis.read(buffer)) > 0) {
            out.write(buffer, 0, len);
        }

        gzis.close();
        out.close();

        System.out.println("Extracted " + name);

    } catch(IOException ex){
        ex.printStackTrace();
    }
}

when i try to execute it i get this error: java.util.zip.ZipException: Not in GZIP format

how can i solve it? Thanks in advance for your help

Satin answered 6/12, 2016 at 16:8 Comment(2)
Use the command file /var/www/html/grepobot/API/someName.txt.gz to verify that the file is gzipped data?Gitt
of course it is. If i execute gunzip /var/www/html/grepobot/API/someName.txt.gz it works. at the moment i execute a command with ProcessSatin
D
2

Test a sample, correct, gzipped file to see whether the problem lies in your code or not.

There are many possible ways to build a (g)zip file. Your file may have been built differently from what Java's built-in support expects, and the fact that one uncompressor understands a compression variant is no guarantee that Java will also recognize that variant. Please verify exact file type with file and/or other uncompression utilities that can tell you which options were used when compressing it. You may also have a look at the file itself with a tool such as hexdump. This is the output of the following command:

$ hexdump -C lgpl-2.1.txt.gz | head

00000000  1f 8b 08 08 ed 4f a9 4b  00 03 6c 67 70 6c 2d 32  |.....O.K..lgpl-2|
00000010  2e 31 2e 74 78 74 00 a5  5d 6d 73 1b 37 92 fe 8e  |.1.txt..]ms.7...|
00000020  ba 1f 81 d3 97 48 55 34  13 7b 77 73 97 78 2b 55  |.....HU4.{ws.x+U|
00000030  b4 44 d9 bc 95 25 2d 29  c5 eb ba ba aa 1b 92 20  |.D...%-)....... |
00000040  39 f1 70 86 99 17 29 bc  5f 7f fd 74 37 30 98 21  |9.p...)._..t70.!|
00000050  29 7b ef 52 9b da 58 c2  00 8d 46 bf 3c fd 02 d8  |){.R..X...F.<...|
00000060  da fe 3f ef 6f 1f ed cd  78 36 1b 4f ed fb f1 ed  |..?.o...x6.O....|
00000070  78 3a ba b1 f7 8f ef 6e  26 97 96 fe 1d df ce c6  |x:.....n&.......|
00000080  e6 e0 13 f9 e7 57 57 56  69 91 db 37 c3 d7 03 7b  |.....WWVi..7...{|
00000090  ed e6 65 93 94 7b fb fa  a7 9f 7e 32 c6 5e 16 bb  |..e..{....~2.^..|

In this case, I used standard gzip on this license text. The 1st few bytes are unique to GZipped files (although they do not specify variants) - if your file does not start with 1f 8b, Java will complain, regardless of remaining contents.

If the problem is due to the file, it is possible that other uncompression libraries available in Java may deal with the format correctly - for example, see Commons Compress

Dobbins answered 11/12, 2020 at 21:40 Comment(0)
M
-3
import com.horsefly.utils.GZIP;
import org.apache.commons.io.FileUtils;
....
String content = new String(new GZIP().decompresGzipToBytes(FileUtils.readFileToByteArray(fileName)), "UTF-8");

in case someone needs it.

Medicare answered 12/8, 2020 at 18:16 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.