GZipStream decompression performance is poor
Asked Answered
C

4

6

I have a .NET 2.0 WinForms app that connects to a backend WAS server. I am using GZipStream to decode data coming back from a HttpWebRequest call made to the server. The data returned is compressed CSV, which Apache is compressing. The entire server stack is Hibernate-->EJB-->Spring-->Apache.

For small responses, the performance is fine (<50ms). When I get a response >150KB, it takes more than 60 seconds to decompress. The majority of the time seems to be spent in the GZipStream constructor.

This is the code showing where I get the response stream from the HttpWebResponse call:

using (Stream stream = this.Response.GetResponseStream())
{
 if (this.CompressData && this.Response.ContentEncoding == "gzip")
 {
        // Decompress the response
  byte[] b = Decompress(stream);
  this.ResponseBody = encoding.GetString(b);
    }
 else
 {
  // Just read the stream as a string
  using (StreamReader sr = new StreamReader(stream))
  {
   this.ResponseBody = sr.ReadToEnd();
  }
 }
}

Edit 1

Based on the comment from Lucero, I modified the Decompress method to the following, but I do not see any performance benefit from loading the ResponseStream into a MemoryStream before instantiating the GZipStream.

private static byte[] Decompress(Stream stream)
{
 using (MemoryStream ms = new MemoryStream())
 {
  byte[] buffer = new byte[4096];
  int read = 0;

  while ((read = stream.Read(buffer, 0, buffer.Length)) > 0)
  {
   ms.Write(buffer, 0, read);
  }

  ms.Seek(0, SeekOrigin.Begin);

  using (GZipStream gzipStream = new GZipStream(ms, CompressionMode.Decompress, false))
  {
   read = 0;
   buffer = new byte[4096];

   using (MemoryStream output = new MemoryStream())
   {
    while ((read = gzipStream.Read(buffer, 0, buffer.Length)) > 0)
    {
     output.Write(buffer, 0, read);
    }

    return output.ToArray();
   }
  }
 }
}

Based on the code above, can anyone see any issues? This seems quite basic to me, but it's driving me nuts.

Edit 2

I profiled the application using ANTS Profiler, and during the 60s of decompression, the CPU is near zero and the memory usage does not change.

Edit 3

The actual slowdown appears to be during the read of

this.Response.GetResponseStream
The entire 60s is spent loading the response stream into the MemoryStream. Once it's there, the call to GZipStream is quick.
Edit 4

I found that using HttpWebRequest.AutomaticDecompression exhibits the same performance issue, so I'm closing this question.

Crispi answered 5/10, 2009 at 19:51 Comment(3)
Voting to close because the decompression is not the correct issue.Crispi
When you say adding the memory stream did not improve performance, are you in fact measuring the time it takes to zip separately from the time it takes to write the entire responds to the memory stream? My suspicion is, given that CPU is near zero, that the bottleneck is not the zipping, but how fast you can download the response.Erotica
Did you solve this problem?Madera
O
1

Try first loading the data into a MemoryStream and then decompress the MemoryStream...

Odum answered 5/10, 2009 at 19:55 Comment(5)
I tried this - see the modified question. Thank you for the suggestion.Crispi
I see. Is the time still spent in the constructor of the GZip stream, or now somewhere else?Odum
This is (as far as I can tell) spent in the constructor of the GZip stream.Crispi
Accessing the same URI with a brpwser (Firefox, IE, whatever) works fine without delay?Odum
Yes - I can access it using a curl script and it returns without delay. And, the curl script is using compression (--compressed argument to curl).Crispi
F
1

DotNetZip has a GZipStream class that can be used as a drop-in replacement for the System.IO.Compression.GZipStream.

DotNetZip is free.

NB: If you are only doing GZipStream, then you need the Ionic.Zlib.dll, not the Ionic.Zip.dll.

Filch answered 5/10, 2009 at 20:4 Comment(4)
I tried using the DotNetZip/Zlib library but found the same performance issue.Crispi
If that's the case then it seems like it's not the DeflateStream. Maybe you have a memory issue. Maybe you should test more iterations - it's difficult to draw conclusions on performance based on a single iteration, a single trial.Filch
I don't follow what you mean on "test more iterations"? This is one of many requests to the same server. The majority of the requests only get ~<10k data back. This is the only "large" request, and it's only ~150k.Crispi
Like I said in the other comment, I don't think it is a code problem. Check your server logs to see what is happening right when you fire that code. Is Antivirus locking that file for a brief moment? Gotta be something.Penknife
B
1

I'll drop my three cents to the subject, just to notify C#-users that a 7Zip seems to expose its API in plain C#. I think you all know the 7Zip tool quite well, and at least for me, regardless of how well- or ill- designed its API is --- knowing that is a big help in terms of better performance of handling ZIP files/streams.

ref: http://www.splinter.com.au/compressing-using-the-7zip-lzma-algorithm-in/

Balbriggan answered 23/8, 2012 at 9:47 Comment(0)
P
0

Sorry to not answer your question directly, but have you looked at SharpZip yet? I found it much easier to use than Gzip. If you have trouble solving your current problem, perhaps it would perform the task better.

http://www.icsharpcode.net/OpenSource/SharpZipLib/

Penknife answered 5/10, 2009 at 20:1 Comment(3)
I have tried SharpZipLib and it exhibits the same poor performance as both System.IO.Compression.GZipStream and DotNetZip. I am going to step through the SharpZipLib source to see if anything jumps out at me.Crispi
Interesting... I have a large xml file which is about 70 megs uncompressed that decompresses in about 15 seconds on a system. I'm starting to wonder if it is really related to your code. Could you take a look at your Antivirus on that system? Perhaps it is hanging up. We've had major problems with Etrust from IBM hanging up files for much longer than they should. I can provide a code sample if you like but again I think it's not code related.Penknife
I'm trying to think of what else could be your bottle neck. You could try running a memory tester on that system. Maybe it has some faulty RAM? I'm just brain storming for ya. Just seems odd.Penknife

© 2022 - 2024 — McMap. All rights reserved.