Extracting files from a Zip archive programmatically using C# and System.IO.Packaging
Asked Answered
X

6

50

I have a bunch of ZIP files that are in desperate need of some hierarchical reorganization and extraction. What I can do, currently, is create the directory structure and move the zip files to the proper location. The mystic cheese that I am missing is the part that extracts the files from the ZIP archive.

I have seen the MSDN articles on the ZipArchive class and understand them reasonable well. I have also seen the VBScript ways to extract. This is not a complex class so extracting stuff should be pretty simple. In fact, it works "mostly". I have included my current code below for reference.

 using (ZipPackage package = (ZipPackage)Package.Open(@"..\..\test.zip", FileMode.Open, FileAccess.Read))
 {
    PackagePartCollection packageParts = package.GetParts();
    foreach (PackageRelationship relation in packageParts)
    {
       //Do Stuff but never gets here since packageParts is empty.
    }
 }

The problem seems to be somewhere in the GetParts (or GetAnything for that matter). It seems that the package, while open, is empty. Digging deeper the debugger shows that the private member _zipArchive shows that it actually has parts. Parts with the right names and everything. Why won't the GetParts function retrieve them? I'ver tried casting the open to a ZipArchive and that didn't help. Grrr.

Xylidine answered 3/2, 2009 at 16:8 Comment(1)
FYI, I have posted a request on MS Connect to add support for generic ZIP archive. You can vote too at connect.microsoft.com/VisualStudio/feedback/…Mra
T
47

If you are manipulating ZIP files, you may want to look into a 3rd-party library to help you.

For example, DotNetZip, which has been recently updated. The current version is now v1.8. Here's an example to create a zip:

using (ZipFile zip = new ZipFile())
{
  zip.AddFile("c:\\photos\\personal\\7440-N49th.png");
  zip.AddFile("c:\\Desktop\\2005_Annual_Report.pdf");
  zip.AddFile("ReadMe.txt");

  zip.Save("Archive.zip");
}

Here's an example to update an existing zip; you don't need to extract the files to do it:

using (ZipFile zip = ZipFile.Read("ExistingArchive.zip"))
{
  // 1. remove an entry, given the name
  zip.RemoveEntry("README.txt");

  // 2. Update an existing entry, with content from the filesystem
  zip.UpdateItem("Portfolio.doc");

  // 3. modify the filename of an existing entry 
  // (rename it and move it to a sub directory)
  ZipEntry e = zip["Table1.jpg"];
  e.FileName = "images/Figure1.jpg";

  // 4. insert or modify the comment on the zip archive
  zip.Comment = "This zip archive was updated " + System.DateTime.ToString("G"); 

  // 5. finally, save the modified archive
  zip.Save();
}

here's an example that extracts entries:

using (ZipFile zip = ZipFile.Read("ExistingZipFile.zip"))
{
  foreach (ZipEntry e in zip)
  {
    e.Extract(TargetDirectory, true);  // true => overwrite existing files
  }
}

DotNetZip supports multi-byte chars in filenames, Zip encryption, AES encryption, streams, Unicode, self-extracting archives. Also does ZIP64, for file lengths greater than 0xFFFFFFFF, or for archives with more than 65535 entries.

free. open source

get it at codeplex or direct download from windows.net - CodePlex has been discontinued and archived

Tergum answered 10/2, 2009 at 6:9 Comment(5)
cheeso, i agree wit you,,, but i am not able to build the code which i downloaded frm codeplex.. please tell how to build.. if i build main solution its throwing lot or errors.. i dont no how to buildBerner
why are you building it? There's a binary. Download the DLL.Tergum
Why recommend a 3rd party library, shouldn't the System.IO.Packaging namespace suffice? Or is your last paragraph detailing what the built-in .NET framework Zip functionality does not include?Gapin
Ah, I see - the built-in packaging/zip utility is meant to work only with the "Open Packaging Convention" as Luke pointed out in another answer. Thanks.Gapin
This is probably the best way to do it in 2020: learn.microsoft.com/en-us/dotnet/standard/io/…Conspicuous
G
46

From MSDN,

In this sample, the Package class is used (as opposed to the ZipPackage.) Having worked with both, I've only seen flakiness happen when there's corruption in the zip file. Not necessarily corruption that throws the Windows extractor or Winzip, but something that the Packaging components have trouble handling.

Hope this helps, maybe it can provide you an alternative to debugging the issue.

using System;
using System.IO;
using System.IO.Packaging;
using System.Text;

class ExtractPackagedImages
{
    static void Main(string[] paths)
    {
        foreach (string path in paths)
        {
            using (Package package = Package.Open(
                path, FileMode.Open, FileAccess.Read))
            {
                DirectoryInfo dir = Directory.CreateDirectory(path + " Images");
                foreach (PackagePart part in package.GetParts())
                {
                    if (part.ContentType.ToLowerInvariant().StartsWith("image/"))
                    {
                        string target = Path.Combine(
                            dir.FullName, CreateFilenameFromUri(part.Uri));
                        using (Stream source = part.GetStream(
                            FileMode.Open, FileAccess.Read))
                        using (Stream destination = File.OpenWrite(target))
                        {
                            byte[] buffer = new byte[0x1000];
                            int read;
                            while ((read = source.Read(buffer, 0, buffer.Length)) > 0)
                            {
                                destination.Write(buffer, 0, read);
                            }
                        }
                        Console.WriteLine("Extracted {0}", target);
                    }
                }
            }
        }
        Console.WriteLine("Done");
    }

    private static string CreateFilenameFromUri(Uri uri)
    {
        char [] invalidChars = Path.GetInvalidFileNameChars();
        StringBuilder sb = new StringBuilder(uri.OriginalString.Length);
        foreach (char c in uri.OriginalString)
        {
            sb.Append(Array.IndexOf(invalidChars, c) < 0 ? c : '_');
        }
        return sb.ToString();
    }
}
Gildus answered 3/2, 2009 at 17:21 Comment(5)
Looking at that code, I just threw up on my shoes. PackagePartCollection? PartRelationship? PackagePart? Part URIs? ToLowerInvariant? All I wanted was a ZIP FILE...Tergum
Yeah, that would be the part the OpenPackage developers seemed to forget. Working with OpenPackage is much more about working with the virtual components, as opposed to the physical representation.Gildus
This is the only answer that answers the real question of how do I use X to do Y, it has code and everything, doesn't go in a tangent and show how to use Z to do Y, and it's it has the least votes? Come on people.Hives
According to the documentation, both Package.Open and package.GetParts default to the ZipPackage implementations, which require the "Open Packaging Conventions" standard mentioned by Luke, joshuam, and sharptooth. In other words, this is great if you are messing with office documents, but useless for most user zipped files.Korwun
learn.microsoft.com/en-us/dotnet/standard/io/…Conspicuous
T
31

From "ZipPackage Class" (MSDN):

While Packages are stored as Zip files* through the ZipPackage class, all Zip files are not ZipPackages. A ZipPackage has special requirements such as URI-compliant file (part) names and a "[Content_Types].xml" file that defines the MIME types for all the files contained in the Package. The ZipPackage class cannot be used to open arbitary Zip files that do not conform to the Open Packaging Conventions standard.

For further details see Section 9.2 "Mapping to a ZIP Archive" of the ECMA International "Open Packaging Conventions" standard, http://www.ecma-international.org/publications/files/ECMA-ST/Office%20Open%20XML%20Part%202%20(DOCX).zip (342Kb) or http://www.ecma-international.org/publications/files/ECMA-ST/Office%20Open%20XML%20Part%202%20(PDF).zip (1.3Mb)

*You can simply add ".zip" to the extension of any ZipPackage-based file (.docx, .xlsx, .pptx, etc.) to open it in your favorite Zip utility.

Tungsten answered 3/2, 2009 at 17:34 Comment(0)
S
13

I was having the exact same problem! To get the GetParts() method to return something, I had to add the [Content_Types].xml file to the root of the archive with a "Default" node for every file extension included. Once I added this (just using Windows Explorer), my code was able to read and extract the archived contents.

More information on the [Content_Types].xml file can be found here:

http://msdn.microsoft.com/en-us/magazine/cc163372.aspx - There is an example file below Figure 13 of the article.

var zipFilePath = "c:\\myfile.zip"; 
var tempFolderPath = "c:\\unzipped"; 

using (Package package = ZipPackage.Open(zipFilePath, FileMode.Open, FileAccess.Read)) 
{ 
    foreach (PackagePart part in package.GetParts()) 
    { 
        var target = Path.GetFullPath(Path.Combine(tempFolderPath, part.Uri.OriginalString.TrimStart('/'))); 
        var targetDir = target.Remove(target.LastIndexOf('\\')); 

        if (!Directory.Exists(targetDir)) 
            Directory.CreateDirectory(targetDir); 

        using (Stream source = part.GetStream(FileMode.Open, FileAccess.Read)) 
        { 
            FileStream targetFile = File.OpenWrite(target);
            source.CopyTo(targetFile);
            targetFile.Close();
        } 
    } 
} 

Note: this code uses the Stream.CopyTo method in .NET 4.0

Sherris answered 9/4, 2012 at 12:1 Comment(1)
Thank you for answering question in the way it was asked!Fuhrman
B
6

I agree withe Cheeso. System.IO.Packaging is awkward when handling generic zip files, seeing as it was designed for Office Open XML documents. I'd suggest using DotNetZip or SharpZipLib

Bookstall answered 10/2, 2009 at 6:51 Comment(0)
T
2

(This is basically a rephrasing of this answer)

Turns out that System.IO.Packaging.ZipPackage doesn't support PKZIP, that's why when you open a "generic" ZIP file no "parts" are returned. This class only supports some specific flavor of ZIP files (see comments at the bottom of MSDN description) used among other as Windows Azure service packages up to SDK 1.6 - that's why if you unpack a service package and then repack it using say Info-ZIP packer it will become invalid.

Terreverte answered 21/8, 2012 at 8:34 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.