HttpWebRequest & Native GZip Compression
Asked Answered
W

6

63

When requesting a page with Gzip compression I am getting a lot of the following errors:

System.IO.InvalidDataException: The CRC in GZip footer does not match the CRC calculated from the decompressed data

I am using native GZipStream to decompress and am looking at addressing this. With that in mind is there a work around for addressing this or another GZip library (free?) which will handle this issue properly?

I am verifying the webResponse ContentEncoding is GZIP

Update 5/11 A simplified snippit

//Caller
public void SOSampleGet(string url) 
{
    // Initialize the WebRequest.
    webRequest = (HttpWebRequest)WebRequest.Create(url);
    webRequest.Method = WebRequestMethods.Http.Get;
    webRequest.KeepAlive = true;
    webRequest.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
    webRequest.Headers.Add("Accept-Encoding", "gzip,deflate");
    webRequest.Referer = WebUtil.GetDomain(url);

    HttpWebResponse webResponse = (HttpWebResponse)webRequest.GetResponse();    

    using (Stream stream = GetStreamForResponse(webResponse, READTIMEOUT_CONST))
    {
        //use stream
    }
}

//Method
private static Stream GetStreamForResponse(HttpWebResponse webResponse, int readTimeOut)
{
    Stream stream;
    switch (webResponse.ContentEncoding.ToUpperInvariant())
    {
        case "GZIP":
            stream = new GZipStream(webResponse.GetResponseStream(), CompressionMode.Decompress);
            break;
        case "DEFLATE":
            stream = new DeflateStream(webResponse.GetResponseStream(), CompressionMode.Decompress);
            break;

        default:
            stream = webResponse.GetResponseStream();
            stream.ReadTimeout = readTimeOut;
            break;
        }    
    return stream;
}
Whence answered 8/5, 2009 at 13:55 Comment(2)
Is it for a specific site, or is this happening from responses everywhere? If its only one site, it could be that the problem lies on the other side.Echeverria
Note also that "deflate", according to the HTTP spec, is really "zlib" (which wraps deflate), and not deflate at all (it's a misnomer). Because of this confusion, though, some servers will send deflate, and other zlib, and clients need to support both (by heuristic guess) just in case. Yuck.Veda
J
141

What about the webrequest AutomaticDecompression Property available since .net 2? Simply add:

webRequest.AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate;

It also adds the gzip,deflate to the accept encoding header.

See http://msdn.microsoft.com/en-us/library/system.net.httpwebrequest.automaticdecompression.aspx

Joule answered 15/10, 2011 at 1:38 Comment(2)
How would you do this using HttpClient?Indeterminism
@MartinKearn, you do HttpClientHandler handler = new HttpClientHandler(); handler.AutomaticDecompression = System.Net.DecompressionMethods.GZip | DecompressionMethods.Deflate; _client = new HttpClient(handler); See #20991101 I believe it requires .net 4.5.Joule
D
7

For .NET Core things are a little more involved. A GZipStream is needed as there isn't a property (as of writing) for AutomaticCompression. See my answer here: https://mcmap.net/q/303522/-does-net-39-s-httpwebresponse-uncompress-automatically-gziped-and-deflated-responses

Code from answer:

var req = WebRequest.CreateHttp(uri);

/*
 * Headers
 */
req.Headers[HttpRequestHeader.AcceptEncoding] = "gzip, deflate";

/*
 * Execute
 */
try
{
    using (var resp = await req.GetResponseAsync())
    {
        using (var str = resp.GetResponseStream())
        using (var gsr = new GZipStream(str, CompressionMode.Decompress))
        using (var sr = new StreamReader(gsr))

        {
            string s = await sr.ReadToEndAsync();  
        }
    }
}
catch (WebException ex)
{
    using (HttpWebResponse response = (HttpWebResponse)ex.Response)
    {
        using (StreamReader sr = new StreamReader(response.GetResponseStream()))
        {
            string respStr = sr.ReadToEnd();
            int statusCode = (int)response.StatusCode;

            string errorMsh = $"Request ({url}) failed ({statusCode}) on, with error: {respStr}";
        }
    }
}
Delorisdelorme answered 12/6, 2017 at 21:8 Comment(0)
T
2

Are you flushing and closing the stream? Try wrapping your GZipStream with a Using Statement.

Tedder answered 8/5, 2009 at 14:58 Comment(1)
Its wrapped in a Try/Catch/Finally calling Dispose() of the stream in the finally block.Whence
C
2

I found some sample code that shows the entire request/response for GZip encoded pages. It uses GZipStream.

http://www.know24.net/blog/Decompress+GZip+Deflate+HTTP+Responses.aspx

Cupid answered 8/5, 2009 at 15:31 Comment(2)
Link is broken, but I looked it up through archive.org and the basic method works great :)Schulz
For those not familiar with archive.org, like me, the new link is: web.archive.org/web/20200214173529/http://www.know24.net/blog/…Flocculent
E
1

See my comment above, but this usually is a symptom of a corrupted file. If the site is your own, replace the file you are trying to access.

Echeverria answered 8/5, 2009 at 14:54 Comment(1)
Not my site, it seems particular to a few sites I am requesting from however.Whence
C
-2

The native GZipStream can read a compressed GZIP (RFC 1952) stream, but it can't handle the ZIP file format.

From http://www.geekpedia.com/tutorial190_Zipping-files-using-GZipStream.html:

The disadvantage of using the GZipStream class over a 3rd party product is that it has limited capabilities. One of the limitations is that you cannot give a name to the file that you place in the archive. When GZipStream compresses the file into a ZIP archive, it takes the sequence of bytes from that file and uses compression algorithms that create a smaller sequence of bytes. The new sequence of bytes is put into the new ZIP file. When you open the ZIP file you will open the archived file itself; most popular ZIP extractors (WinZip, WinRar, etc.) will show you the content of the ZIP as a file that has the same as the archive itself.


EDIT: The above note is incorrect. GZipStream does not produce a ZIP file. It is not a "Single file ZIP stream". It is a GZIP Stream. They are different things. There's no guarantee that tools that handle ZIP archives will handle a .gz file.


For an implementation that can read ZIP archives, as opposed to single-file ZIP streams, try #ziplib (SharpZipLib, formerly NZipLib).

Caddie answered 8/5, 2009 at 14:0 Comment(2)
I don't believe the original poster is talking about dealing with compressed/archived files. Rather, the use case is requesting a web page while sending an Accept-Encoding: header to the server, indicating that the client supports gzip. That header allows the server to compress the content before sending it to the client, saving bandwidth. Modern web browsers can do this, and many servers are configured to respond accordingly.Guillema
Are you checking if the server actually replies with a gzip stream, for example with wireshark? Wireshark can decode and verify the reply, even if it's gzipped.Caddie

© 2022 - 2024 — McMap. All rights reserved.