ZipFile and ZipArchive classes from System.IO.Compression and async I/O

Asked 10/9, 2016 at 0:20 Answered 27/10, 2023 at 15:52

Solved c#.net asynchronous io system.io.compression

.NET 4.5 has added new classes to work with zip archives. Now you can do something like this:

using (ZipArchive archive = ZipFile.OpenRead(zipFilePath))
{
    foreach (ZipArchiveEntry entry in archive.Entries)
    {
        // Extract it to the file
        entry.ExtractToFile(entry.Name);

        // or do whatever you want
        using (Stream stream = entry.Open())
        {
            ...
        }
    }
}

Obviously, if you work with large archives it may take seconds or even minutes to read the files from the archive. So if you were writing some GUI app (WinForms or WPF) you would probably run such code in a separate thread otherwise you will block UI thread and make your app users very upset.

However all I/O operations in this code will be executed in the blocking mode which is considered as "not cool" in 2016. So there are two questions:

Is it possible to get async I/O with System.IO.Compression classes (or maybe with some other third-party .NET library)?
Does it even make sense to do that? I mean compressing/extracting algorithms are very CPU-consuming anyway, so if we even switch from ~~CPU-bound~~ blocking I/O to async I/O, the performance gain can be relatively small (of course in percentage, not absolute values).

UPDATE:

To reply to the answer from Peter Duniho: yes, you're right. For some reason I didn't think about this option:

using (Stream zipStream = entry.Open())
using (FileStream fileStream = new FileStream(...))
{
    await zipStream.CopyToAsync(fileStream);
}

which definitely works. Thanks!

By the way

await Task.Run(() => entry.ExtractToFile(entry.Name));

will still ~~be CPU-bound blocking I/O operation, just in separate thread~~ consume the thread from the thread pool during I/O operations.

However as I can see developers of .NET still use blocking I/O for some archive operations (like this code to enumerate entries in the archive for example: ZipArchive.cs on dotnet@github). I also found an open issue about the lack of asynchronous API for ZipFile APIs.

I guess at this time we have partial async support but it is far from complete.

Inveracity answered 10/9, 2016 at 0:20 Comment(4)

"will still be CPU-bound blocking I/O operation, just in separate thread" -- no, not likely. As long as the CPU can decompress the data faster than the disk can store it, it's still I/O bound (though, SSD storage makes it harder to guess ahead of time which is faster). This is also why I wrote that it all depends on what you mean by "async I/O". All I/O is inherently asynchronous; sometimes this is surfaced to the API (e.g. IOCP, which many .NET I/O methods are based on), sometimes not. But I/O operations don't consume CPU, regardless of how you initiate them. – Proudfoot 10/9, 2016 at 2:18

Yes, you're right, I/O operations doesn't consume CPU. By async I/O I mean such API function that does not block the thread which requested this I/O operation. – Inveracity 10/9, 2016 at 2:38

It would obviously be better if an I/O operation could avoid blocking a thread. But the thread pool deals reasonably well with its threads being temporarily blocked during an I/O operation. The main thing is to not be blocking a thread that would otherwise being something useful. If it's really that important, a method like ExtractToFile() is easy enough to implement as a true, non-thread-consuming async I/O method (as you've already shown in your update above). – Proudfoot 10/9, 2016 at 2:41

Vote for .NET feature request github.com/dotnet/runtime/issues/1541 "Add async ZipFile APIs" – Andraandrade 3/4 at 9:35

Is it possible to get async I/O with System.IO.Compression classes (or maybe with some other third-party .NET library)?

Depending on what you actually mean by "async I/O", you can do it with the built-in .NET types. For example:

using (ZipArchive archive = await Task.Run(() => ZipFile.OpenRead(zipFilePath)))
{
    foreach (ZipArchiveEntry entry in archive.Entries)
    {
        // Extract it to the file
        await Task.Run(() => entry.ExtractToFile(entry.Name));

        // or do whatever you want
        using (Stream stream = entry.Open())
        {
            // use XXXAsync() methods on Stream object
            ...
        }
    }
}

Wrap these in XXXAsync() extension methods if you like.

Does it even make sense to do that? I mean compressing/extracting algorithms are very CPU-consuming anyway, so if we even switch from CPU-bound I/O to async I/O, the performance gain can be relatively small (of course in percentage, not absolute values).

At least three reasons to do it:

CPUs are very fast. In many cases, I/O is still the bottleneck so asynchronously waiting on I/O is useful.
Multi-core CPUs are the norm. So having one core working on decompression while another does other work is useful.
Asynchronous operations are not entirely, and in some cases not at all, about performance. Asynchronously processing your archives allows a user interface to remain responsive, which is useful.

Proudfoot answered 10/9, 2016 at 0:30 Comment(0)

-1

From the discussion of this on the issue that is still open with Microsoft, use a destination stream with CopyToAsync to make an actual async method instead of just a background task.

private static async Task ExtractToFileAsync(this ZipArchiveEntry source, string destinationFileName, bool overwrite, CancellationToken cancellationToken = default)
{
    const int bufferSize = 128 * 1024;

    if (source is null)
        throw new ArgumentNullException(nameof(source));

    if (destinationFileName is null)
        throw new ArgumentNullException(nameof(destinationFileName));

    var mode = overwrite ? FileMode.Create : FileMode.CreateNew;
    using (Stream destination = new FileStream(destinationFileName, mode, FileAccess.Write, FileShare.None, bufferSize, true))
    {
        using (var stream = source.Open())
        {
            await stream.CopyToAsync(destination, bufferSize, cancellationToken);
        }
    }

    File.SetLastWriteTime(destinationFileName, source.LastWriteTime.DateTime);
}

Deadbeat answered 27/10, 2023 at 15:52 Comment(0)

Recommended topics

Hot tags