Gets the uncompressed size of this GZIPInputStream?

Asked 6/9, 2011 at 8:50 Answered 18/2, 2019 at 5:24

I have a GZIPInputStream that I constructed from another ByteArrayInputStream. I want to know the original (uncompressed) length for the gzip data. Although I can read to the end of the GZIPInputStream, then count the number, it will cost much time and waste CPU. I would like to know the size before read it.

Is there a similiar method like ZipEntry.getSize() for GZIPInputStream:

public long getSize ()
Since: API Level 1
Gets the uncompressed size of this ZipEntry.

Circular answered 6/9, 2011 at 8:50 Comment(2)

Note that GZIP only safes the size modulu 2^32 (i.e. it only stores the lower 32 bit of the size, in a field named ISIZE). If your data is potentially bigger than 4 GB, then that information won't help you. – Heterosexuality 6/9, 2011 at 8:55

To continue in that vein, there are two other reasons that the last four bytes are not a reliable measure of the uncompressed data, even for small files. The only reliable way is to decompress the stream and count the bytes. – Mesarch 6/4, 2018 at 19:35

Is there a similiar method like ZipEntry.getSize() for GZIPInputStream

No. It's not in the Javadoc => it doesn't exist.

What do you need the length for?

Deuteragonist answered 6/9, 2011 at 9:34 Comment(4)

I tend to agree with this. Even the GZip docs state it can't find the uncompressed size for all files - gnu.org/software/gzip/manual/gzip.html#Invoking-gzip. You could use --list to get the uncompressed size, but that probably 'wastes' the same CPU as you would reading with Java. – Grasmere 6/9, 2011 at 9:41

After think again, it seems useless for me. – Circular 6/9, 2011 at 9:54

I am working for a ebook(Gzipformat). every chapter is a GZIP, I would like to know the total length of the book for reading percent computation. – Circular 6/9, 2011 at 10:4

@David Guo doing that computation on the gzipped lengths would probably be accurate enough. – Deuteragonist 6/9, 2011 at 22:25

It is possible to determine the uncompressed size by reading the last four bytes of the gzipped file.

I found this solution here:

http://www.abeel.be/content/determine-uncompressed-size-gzip-file

Also from this link there is some example code (corrected to use long instead of int, to cope with sizes between 2GB and 4GB which would make an int wrap around):

RandomAccessFile raf = new RandomAccessFile(file, "r");
raf.seek(raf.length() - 4);
byte b4 = raf.read();
byte b3 = raf.read();
byte b2 = raf.read();
byte b1 = raf.read();
long val = ((long)b1 << 24) | ((long)b2 << 16) | ((long)b3 << 8) | (long)b4;
raf.close();

val is the length in bytes. Beware: you can not determine the correct uncompressed size, when the uncompressed file was greater than 4GB!

Uveitis answered 23/9, 2011 at 12:31 Comment(3)

As per the original GZIP format specification: "A gzip file consists of a series of "members" (compressed data sets). The format of each member is specified in the following section. The members simply appear one after another in the file, with no additional information before, between, or after them." Therefore if your gzip file contains more than one "member", you are reading only the size of last "member" in those four bytes. – Nichollenicholls 25/7, 2018 at 11:49

if you know you only have one "member" then I guess this would be an acceptable answer though. – Mass 27/1, 2020 at 7:3

@OlegMuravskiy Theoretically, assuming a standard gz file using the deflate algorithm, you could scan the stream of deflate blocks until you find a final block, read the size for that member, and then repeat until the end of the file. You'd still have to read nearly the entire file, but it would still be a lot faster than actually performing the decompression. – Sialkot 5/6 at 18:20

Based on @Alexander's answer:

RandomAccessFile raf = new RandomAccessFile(inputFilePath + ".gz", "r");
raf.seek(raf.length() - 4);
byte[] bytes = new byte[4];
raf.read(bytes);
fileSize = ByteBuffer.wrap(bytes).order(ByteOrder.LITTLE_ENDIAN).getInt();
if (fileSize < 0)
  fileSize += (1L << 32);
raf.close();

Hornstone answered 2/10, 2012 at 4:31 Comment(1)

Works, however I find that the length returned is almost exactly the .length() of the compressed file less than the final uncompressed size. – Memphian 19/10, 2012 at 2:36

Is there a similiar method like ZipEntry.getSize() for GZIPInputStream

No. It's not in the Javadoc => it doesn't exist.

What do you need the length for?

Deuteragonist answered 6/9, 2011 at 9:34 Comment(4)

After think again, it seems useless for me. – Circular 6/9, 2011 at 9:54

I am working for a ebook(Gzipformat). every chapter is a GZIP, I would like to know the total length of the book for reading percent computation. – Circular 6/9, 2011 at 10:4

@David Guo doing that computation on the gzipped lengths would probably be accurate enough. – Deuteragonist 6/9, 2011 at 22:25

There is no reliable way to get the length other than decompressing the whole thing. See Uncompressed file size using zlib's gzip file access function .

Mesarch answered 2/10, 2012 at 4:38 Comment(0)

If you can guess at the compression ratio (a reasonable expectation if the data is similar to other data you've already processed), then you can work out the size of arbitrarily large files (with some error). Again, this assumes a file containing a single gzip stream. The following assumes the first size greater than 90% of the estimated size (based on estimated ratio) is the true size:

estCompRatio = 6.1;
RandomAccessFile raf = new RandomAccessFile(inputFilePath + ".gz", "r");
compLength = raf.length();
byte[] bytes = new byte[4];
raf.read(bytes);
uncLength = ByteBuffer.wrap(bytes).order(ByteOrder.LITTLE_ENDIAN).getInt();
raf.seek(compLength - 4);
uncLength = raf.readInt();
while(uncLength < (compLength * estCompRatio * 0.9)){
  uncLength += (1L << 32);
}

[setting estCompRatio to 0 is equivalent to @Alexander's answer]

Behring answered 29/10, 2012 at 4:23 Comment(0)

A more compact version of the calculation based on the 4 tail bytes (avoids using a byte buffer, calls Integer.reverseBytes to reverse the byte order of read bytes).

private static long getUncompressedSize(Path inputPath) throws IOException
{
    long size = -1;
    try (RandomAccessFile fp = new RandomAccessFile(inputPath.toFile(), "r")) {        
        fp.seek(fp.length() - Integer.BYTES);
        int n = fp.readInt();
        size = Integer.toUnsignedLong(Integer.reverseBytes(n));
    }
    return size;
}

Colossians answered 5/4, 2018 at 21:45 Comment(0)

Get the FileChannel from the underlying FileInputStream instead. It tells you both file size and current position of the compressed file. Example:

@Override
public void produce(final DataConsumer consumer, final boolean skipData) throws IOException {
    try (FileInputStream fis = new FileInputStream(tarFile)) {
        FileChannel channel = fis.getChannel();
        final Eta<Long> eta = new Eta<>(channel.size());
        try (InputStream is = tarFile.getName().toLowerCase().endsWith("gz")
            ? new GZIPInputStream(fis) : fis) {
            try (TarArchiveInputStream tais = (TarArchiveInputStream) new ArchiveStreamFactory()
                .createArchiveInputStream("tar", new BufferedInputStream(is))) {

                TarArchiveEntry tae;
                boolean done = false;
                while (!done && (tae = tais.getNextTarEntry()) != null) {
                    if (tae.getName().startsWith("docs/") && tae.getName().endsWith(".html")) {
                        String data = null;
                        if (!skipData) {
                            data = new String(tais.readNBytes((int) tae.getSize()), StandardCharsets.UTF_8);
                        }
                        done = !consumer.consume(data);
                    }

                    String progress = eta.toStringPeriodical(channel.position());
                    if (progress != null) {
                        System.out.println(progress);
                    }
                }
                System.out.println("tar bytes read: " + tais.getBytesRead());
            } catch (ArchiveException ex) {
                throw new IOException(ex);
            }
        }
    }
}

Noblenobleman answered 18/2, 2019 at 5:24 Comment(0)

-1

No, unfortunately if you wanted to get the uncompressed size, you would have to read the entire stream and increment a counter like you mention in your question. Why do you need to know the size? Could an estimation of the size work for your purposes?

Antecede answered 6/9, 2011 at 13:29 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags