This is a really interesting question. Compression is highly CPU intensive, relying on lots of searching and comparisons. So it's very appropriate to want to parallelize it, when you've got multiple CPUs with unimpeded memory access.
There is a class called ParallelDeflateOutputStream
within the DotNetZip library that does what you are describing. The class is documented here.
It can be used only for compression - no decompression. Also it is strictly an output stream - you cannot read
in order to compress. Considering these constraints, it is basically a DeflateOutputStream, that internally uses multiple threads.
The way it works: It breaks up the incoming stream into chunks, then drops off each chunk into a separate worker thread to be compressed individually. Then it merges all those compressed streams back into one ordered stream at the end.
Suppose the "chunk" size maintained by the stream is N bytes. As the caller invokes Write(), data is buffered into a bucket or chunk. Inside the Stream.Write()
method, when the first "bucket" is full, it calls ThreadPool.QueueUserWorkItem
, allocating the bucket to the workitem. Subsequent writes into the stream begin filling the next bucket, and when that is full, Stream.Write()
calls QUWI
again. Each worker thread compresses its bucket, using a "Flush Type" of Sync
(see the deflate spec), and then marks its compressed blob ready for output. This various outputs are then re-ordered (because chunk n does not necessarily get compressed before chunk n+1), and written to the captive output stream. As each bucket is written, it is marked empty, ready to be re-filled by the next Stream.Write()
. Each chunk must be compressed with the flush type of Sync in order to allow their re-combination via simple concatenation, for the combined bytestream to be a legal DEFLATE Stream. The final chunk needs Flush type = Finish.
The design of this stream means that callers don't need to write with multiple threads. Callers just create the stream as normal, like the vanilla DeflateStream used for output, and write into it. The stream object uses multiple threads, but your code doesn't interface directly with them. The code for a "user" of the ParallelDeflateOutputStream
looks like this:
using (FileStream raw = new FileStream(CompressedFile, FileMode.Create))
{
using (FileStream input = File.OpenRead(FileToCompress))
{
using (var compressor = new Ionic.Zlib.ParallelDeflateOutputStream(raw))
{
// could tweak params of parallel deflater here
int n;
var buffer = new byte[8192];
while ((n = input.Read(buffer, 0, buffer.Length)) != 0)
{
compressor.Write(buffer, 0, n);
}
}
}
}
It was designed for use within the DotNetZip ZipFile class, but it is quite usable as a standalone compressing output stream. The resulting stream can be de-DELFATED (inflated?) with any inflater. The result is fully compliant to the spec.
The stream is tweakable. You can set the size of the buffers it uses, and the level of parallelism. It doesn't create buckets without bound, because for large streams (gb scale and so on) that would cause out of memory condiitons. So there's a fixed limit to the number of buckets, and therefore the degree of parallelism, that can be supported.
On my dual-core machine, this stream class nearly doubled the compression speed of large (100mb and larger) files, when compared to the standard DeflateStream. I don't have any larger multi-core machines so I couldn't test it further. The tradeoff is that the parallel implementation uses more CPU and more memory, and also compresses slightly less efficiently (1% less for large files) because of the sync framing I described above. The performance advantage will vary depending on the I/O throughput on your output stream, and whether the storage can keep up with the parallel compressor threads.
Caveat:
It is a DEFLATE stream, not GZIP. For the differences, read RFC 1951 (DEFLATE) and
RFC 1952 (GZIP).
But if you really NEED gzip, the source for this stream is available, so you can view it and maybe get some ideas for yourself. GZIP is really just a wrapper on top of DEFLATE, with some additional metadata (like Adler checksum, and so on - see the spec). It seems to me that it would not be very difficult to build a ParallelGzipOutputStream
, but it may not be trivial, either.
The trickiest part for me was getting the semantics of Flush() and Close() to work properly.
EDIT
Just for fun, I built a ParallelGZipOutputStream, that basically does what I described above, for GZip. It uses .NET 4.0's Tasks in lieu of QUWI to handle the parallel compression. I tested it just now on a 100mb text file generated via a Markov Chain engine. I compared the results of that class against some other options. Here's what it looks like:
uncompressed: 104857600
running 2 cycles, 6 Flavors
System.IO.Compression.GZipStream: .NET 2.0 builtin
compressed: 47550941
ratio : 54.65%
Elapsed : 19.22s
ICSharpCode.SharpZipLib.GZip.GZipOutputStream: 0.86.0.518
compressed: 37894303
ratio : 63.86%
Elapsed : 36.43s
Ionic.Zlib.GZipStream: DotNetZip v1.9.1.5, CompLevel=Default
compressed: 37896198
ratio : 63.86%
Elapsed : 39.12s
Ionic.Zlib.GZipStream: DotNetZip v1.9.1.5, CompLevel=BestSpeed
compressed: 47204891
ratio : 54.98%
Elapsed : 15.19s
Ionic.Exploration.ParallelGZipOutputStream: DotNetZip v1.9.1.5, CompLevel=Default
compressed: 39524723
ratio : 62.31%
Elapsed : 20.98s
Ionic.Exploration.ParallelGZipOutputStream:DotNetZip v1.9.1.5, CompLevel=BestSpeed
compressed: 47937903
ratio : 54.28%
Elapsed : 9.42s
Conclusions:
The GZipStream that's builtin to .NET is pretty fast. It's also not very efficient, and
it is not tunable.
The "BestSpeed" on the vanilla (non-parallelized) GZipStream in DotNetZip is about 20% faster than the .NET builtin stream, and gives about the same compression.
Using multiple Tasks for compression can cut about 45% off the time required on my dual-core laptop (3gb RAM), comparing the vanilla DotNetZip GZipStream to the parallel one. I suppose the time savings would be higher for machines with more cores.
There is a cost to parallel GZIP - the framing increases the size of the compressed file by about 4%. This won't change with the number of cores used.
The resulting .gz file can be decompressed by any GZIP tool.