Using GZipStream to compress empty input results in an invalid gz file in C#
Asked Answered
I

2

9

I am using the C# GZipStream class to compress some input data. The problem is when that input is empty. In that scenario, it ends up creating a 0 byte file. When I try to use 7zip to unzip the resulting .gz file, it gives an error saying the format is invalid. If I have a non-empty input, it works fine. Please tell me how I can create a valid .gz file that will uncompress into a 0 byte file?

var file = new FileStream("foo.txt.gz", FileMode.Create, FileAccess.ReadWrite);
var gzip = new GZipStream(file, CompressionMode.Compress);
var writer = new StreamWriter(gzip);

for (string line in input) {
    writer.Write(line);
}

writer.Close();
gzip.Close();
file.Close();

In the code above, if my 'input' array is empty, I end up writing a file called foo.txt.gz with 0 bytes, and 7zip says the file is invalid. But if I have a non-empty array, I get a valid file. Please tell me how I can modify my code to resolve the issue such that I get a valid .gz file even when the input is empty. Thanks!


EDIT: This may be a bug in .NET. If you notice the same issue and agree that it is a bug, please vote on: https://connect.microsoft.com/VisualStudio/feedback/details/888912/gzipstream-creates-invalid-gz-files-when-input-is-empty

Izard answered 3/6, 2014 at 21:4 Comment(1)
This Connect bug has gone....and I can't find any other references to this issue.Abrupt
W
4

Unfortunately, this looks like a bug with the implementation of GZipStream in the .NET library.

According to the documentation, it should "appear as a valid, empty compressed file" according to MSDN (http://msdn.microsoft.com/en-ca/library/as1ff51s.aspx). But, when I tested your code, and some variations, I also get a completely empty file.

As a comparison, if I create an empty gzip file using Cygwin (echo -n | gzip -9 > empty.gz), I get a 20 byte file.

I suppose you could work around it by detecting when your input is empty and manually writing out an empty GZIP file. You could either refer to the GZIP file documentation (Wikipedia would be a good place to start) to create the file manually, or hard-code the 20 bytes required for an empty file in your program (with this solution, the internal timestamp and some other flags might be wrong, but that might not affect you in practice).

Alternatively, use a 3rd-party compression library like SharpZipLib (http://icsharpcode.github.io/SharpZipLib/) or DotNetZip (http://dotnetzip.codeplex.com/) that implements GZIP and use their implementation instead of GZipStream.

Whatever answered 3/6, 2014 at 21:33 Comment(1)
I added a bug report at Microsoft; vote on it: connect.microsoft.com/VisualStudio/feedback/details/888912/…Izard
I
1

I know this is an old issue, but if you realise that your input stream is empty, before disposing of the GZipStream class, you can just do an empty write, and it'll save the 20 bytes to the output stream as expected, creating a valid gz file.

You can use the following snippet:

gs.Write(Array.Empty<byte>(), 0, 0);
Intercolumniation answered 20/5, 2022 at 10:46 Comment(1)
@MarkAdler the snippet I added actually fixes this problem. If you just create a GZipStream and immediately save it, you'll end up with a 0 byte .gz file, which is invalid. If you do any write, the GZipStream will create the GZip header structure. So when adding a 0 byte write I showed, the saved .gz file will have a basic gz stream that will decompress to a 0 byte file, as expected, without having any side-effects for cases when you write any other content to the streamIntercolumniation

© 2022 - 2024 — McMap. All rights reserved.