When searching on how to inflate gzip compressed data on iOS, the following method appears in number of results:
- (NSData *)gzipInflate
{
if ([self length] == 0) return self;
unsigned full_length = [self length];
unsigned half_length = [self length] / 2;
NSMutableData *decompressed = [NSMutableData dataWithLength: full_length + half_length];
BOOL done = NO;
int status;
z_stream strm;
strm.next_in = (Bytef *)[self bytes];
strm.avail_in = [self length];
strm.total_out = 0;
strm.zalloc = Z_NULL;
strm.zfree = Z_NULL;
if (inflateInit2(&strm, (15+32)) != Z_OK) return nil;
while (!done)
{
// Make sure we have enough room and reset the lengths.
if (strm.total_out >= [decompressed length])
[decompressed increaseLengthBy: half_length];
strm.next_out = [decompressed mutableBytes] + strm.total_out;
strm.avail_out = [decompressed length] - strm.total_out;
// Inflate another chunk.
status = inflate (&strm, Z_SYNC_FLUSH);
if (status == Z_STREAM_END) done = YES;
else if (status != Z_OK) break;
}
if (inflateEnd (&strm) != Z_OK) return nil;
// Set real length.
if (done)
{
[decompressed setLength: strm.total_out];
return [NSData dataWithData: decompressed];
}
else return nil;
}
But I've come across some examples of data (deflated on a Linux machine with Python's gzip module) that this method running on iOS is failing to inflate. Here's what's happening:
In the last iteration of the while loop inflate() returns Z_BUF_ERROR and the loop is exited. But inflateEnd(), which is called after the loop, returns Z_OK. The code then assumes that since inflate() never returned Z_STREAM_END, the inflation failed and returns null.
According to this page, http://www.zlib.net/zlib_faq.html#faq05 Z_BUF_ERROR is not a fatal error, and my tests with limited examples show that the data is successfully inflated if the inflateEnd() returns Z_OK, even though the last call of inflate() did not return Z_OK. It seems like the inflateEnd() finished up inflating the last chunk of data.
I don't know much about compression and how gzip works, so I'm hesitant to make changes to this code without fully understanding what it does. I'm hoping someone with more knowledge about the topic can shed some light on this potential logic flaw in the code above, and suggest a way to fix it.
Another method that Google turns up, that seems to suffer from the same problem can be found here: https://github.com/nicklockwood/GZIP/blob/master/GZIP/NSData%2BGZIP.m
Edit:
So, it is a bug! Now, how to we fix it? Below is my attempt. Code review, anyone?
- (NSData *)gzipInflate
{
if ([self length] == 0) return self;
unsigned full_length = [self length];
unsigned half_length = [self length] / 2;
NSMutableData *decompressed = [NSMutableData dataWithLength: full_length + half_length];
int status;
z_stream strm;
strm.next_in = (Bytef *)[self bytes];
strm.avail_in = [self length];
strm.total_out = 0;
strm.zalloc = Z_NULL;
strm.zfree = Z_NULL;
if (inflateInit2(&strm, (15+32)) != Z_OK) return nil;
do
{
// Make sure we have enough room and reset the lengths.
if (strm.total_out >= [decompressed length])
[decompressed increaseLengthBy: half_length];
strm.next_out = [decompressed mutableBytes] + strm.total_out;
strm.avail_out = [decompressed length] - strm.total_out;
// Inflate another chunk.
status = inflate (&strm, Z_SYNC_FLUSH);
switch (status) {
case Z_NEED_DICT:
status = Z_DATA_ERROR; /* and fall through */
case Z_DATA_ERROR:
case Z_MEM_ERROR:
case Z_STREAM_ERROR:
(void)inflateEnd(&strm);
return nil;
}
} while (status != Z_STREAM_END);
(void)inflateEnd (&strm);
// Set real length.
if (status == Z_STREAM_END)
{
[decompressed setLength: strm.total_out];
return [NSData dataWithData: decompressed];
}
else return nil;
}
Edit 2:
Here's a sample Xcode project that illustrates the issue I'm running in. The deflate happens on the server side and the data is base64 and url encoded before being transported via HTTP. I've embedded the url encoded base64 string in the ViewController.m. The url-decode and base64-decode as well as your gzipInflate methods are in NSDataExtension.m
https://dl.dropboxusercontent.com/u/38893107/gzip/GZIPTEST.zip
Here's the binary file as deflated by python gzip library:
https://dl.dropboxusercontent.com/u/38893107/gzip/binary.zip
This is the URL encoded base64 string that gets transported over the HTTP: https://dl.dropboxusercontent.com/u/38893107/gzip/urlEncodedBase64.txt