GZIPInputStream is prematurely closed when reading from s3
Asked Answered
W

1

1
new BufferedReader(new InputStreamReader(
       new GZIPInputStream(s3Service.getObject(bucket, objectKey).getDataInputStream())))

creates Reader that returns null from readLine() after ~100 lines if file is greater then several MB. Not reproducible on gzip files less then 1 MB. Does anybody knows how to handle this?

Watkin answered 7/7, 2015 at 17:42 Comment(0)
J
0

From the documentation of BufferedReader#readLine():

Returns:

A String containing the contents of the line, not including any line-termination characters, or null if the end of the stream has been reached

I would say it pretty clear what this means: The end of the file/stream has been encountered - no more data is available.

Notable quirks with the GZIP format: Multiple files can just be appended to one-another to create a larger file with multiple gzipped objects. It seems that the GZIPInputStream only reads the first of those.

That also explains why it is working for "small files". Those contain only one zipped object, so the whole file is read.

Note: If the GZIPInputStream determines undestructively that one gzip-file is over, you could just open another GZIPInputStream on the same InputStream and read multiple objects.

Jackboot answered 7/7, 2015 at 17:52 Comment(5)
Hi WorldSEnder. It looks to me that issue is not clear for you. Please read carefully "after ~100 lines if file is greater then several MB". It ss not the end of file. There are about 7000 lines. Thanks!Watkin
@Denys, you are reading from s3Service.getObject(bucket, objectKey) how do you know it is about 7000 lines? btw lines are unimportant, we talk about binary data. Can you see if you read exactly s3Service.getObject(bucket, objectKey).getObjectMetadata().getContentLength() bytes?Jackboot
I know this because I extracted that gzip manually. At this moment I am truing the latest version of jets3t.Watkin
I found root case. gzip file was generated in append mode that probably is not well supported by GZIPOutputStream. Thanks everyone who tried to help! Unfortunately for some reason I can't post this as answer to my question.Watkin
@Denys, I added a sentence to the answer. I guess that is what you mean to say?Jackboot

© 2022 - 2024 — McMap. All rights reserved.