How to validate multi part compressed (i.e zip) files have all parts or not in C#?
Asked Answered
S

2

6

I want to validate multipart compressed files like Zip because when any part missing for compressed files then it raises an error, but I want to validate it before extraction and different software creates a different naming structure.

I also refer one DotNetZip related questions.

The below screenshot is from 7z software.

enter image description here

And the second screenshot is from DotNetZip from C#.

enter image description here

One more thing is that I also want to test that it's also corrupted or not like 7z software. Please refer below screenshot for my requirements.

enter image description here

Please help me with these issues.

Shears answered 31/1, 2020 at 14:50 Comment(1)
The ZIP specification has a number of different versions that added features. Not all the older tools support the later features. The ZIP specification allows new files added to an existing zip. One possibility is the version of ZIP is not recognizing added files. When new files are added to the zip they are added at the end of the ZIP file and I've seen cases where some tools do not recognize the added files. The solution would be to create a new zip file when adding files rather than to add the files to an existing zip.Disapprobation
D
1

From your comments I understood that the issue you have is to identify the files (get the list of parts belonging together). You can get a list of files like

List<string> files = System.IO.Directory.EnumerateFiles(@"D:\Zip\ForExtract\multipart\",
            "500mbInputData.*", SearchOption.TopDirectoryOnly).OrderBy(x => x).ToList();

or for your second case

List<string> files = System.IO.Directory.EnumerateFiles(@"D:\Zip\ForExtract\multipart\", 
            "500mbInputData.zip.*", SearchOption.TopDirectoryOnly).OrderBy(x => x).ToList();

and then use the file list in your CombinationStream. The rest of the code would look like Manoj Choudhari wrote. You could also put the path and the file name with wild card into a parameter, so I'd suggest to add the following parameters to the function:

public static bool IsZipValid(string basePath, string fileNameWithWildcard)
{
    try
    {
        List<string> files = System.IO.Directory.EnumerateFiles(
                    basePath, fileNameWithWildcard, 
                    SearchOption.TopDirectoryOnly).OrderBy(x => x).ToList();

        using (var zipFile = // ... rest is as Manoj wrote

and use it like:

if (IsZipValid(@"D:\Zip\ForExtract\multipart\", "500mbInputData.*")) { // ... }

or

if (IsZipValid(@"D:\Zip\ForExtract\multipart\", "500mbInputData.zip.*")) { // ... }

To find out which kind of files you have in the basepath, you could write a helper function like

List<string> getZipFormat(string path)
{
    bool filesFound(string basePath, string pattern) => System.IO.Directory.EnumerateFiles(
            basePath, pattern, SearchOption.TopDirectoryOnly).Any();

    var isTar = filesFound(path, "*.tar.???");
    var isZip = filesFound(path, "*.z??");
    var is7Zip = filesFound(path, "*.7z.???");

    var result = new List<string>();
    if (isTar) result.Add("TAR");
    if (isZip) result.Add("ZIP");
    if (is7Zip) result.Add("7ZIP");
    return result;
}

Modify it to your needs - it will return a list of strings containing "TAR", "ZIP" or "7ZIP" (or more than one of them), depending on the patterns matching against the files in the base directory.

Usage (example for multi-zipformat check):

    var isValid = true;
    var basePath = @"D:\Zip\ForExtract\multipart\";
    foreach(var fmt in getZipFormat(basePath))
    switch (fmt)
    {
    case "TAR": 
        isValid = isValid & IsZipValid(basePath, "500mbInputData.tar.*");
        break;
    case "ZIP":
        isValid = isValid & IsZipValid(basePath, "500mbInputData.zip.*");
        break;
    case "7ZIP":
        isValid = isValid & IsZipValid(basePath, "500mbInputData.7z.*");
        break;
    default: 
        break;
    }

Note: As per my experiments with this, it could happen that the files remain open although your program has ended - meaning your files will still be locked the next time you run your code. So, I'd strongly suggest to explicitly close them, like

    var fStreams = files.Select(x => 
            new FileStream(x, FileMode.Open) as System.IO.Stream).ToList();
    using (var cStream = new CombinationStream(fStreams))
    using (var zipFile = new ZipArchive(cStream, ZipArchiveMode.Read))
    {
        // Do whatever you want...

        // ... but ensure you close the files
        fStreams.Select(s => { s.Close(); return s; });
    };
Donnydonnybrook answered 11/2, 2020 at 8:29 Comment(2)
We support multiple extensions like zip, tar, 7z, and rar so I can't use ".*" for an extension, but it's really helpful for me.Shears
@HirenJasani: Glad I could help. It is possible to use ? as well, like "*.zip.???"and also "*.z??" as wildcards - scan the directory for those patterns and if you find files in the base path check its integrity by using the code above. I have written a helper function and updated the answer.Donnydonnybrook
B
2

I am not sure if you will be able to see the exact error as shown in your snapshot. But I have a code which may help you to find if the multipart file is readble.

I have used nuget Package CombinationStream.

The ZipArchive constructor throws ArgumentException or InvalidDataException if the stream is not readable.

Below is the code:

public static bool IsZipValid()
{
    try
    {
        string basePath = @"C:\multi-part-zip\";
        List<string> files = new List<string> {
                                basePath + "somefile.zip.001",
                                basePath + "somefile.zip.002",
                                basePath + "somefile.zip.003",
                                basePath + "somefile.zip.004",
                                basePath + "somefile.zip.005",
                                basePath + "somefile.zip.006",
                                basePath + "somefile.zip.007",
                                basePath + "somefile.zip.008"
                            };

        using (var zipFile = new ZipArchive(new CombinationStream(files.Select(x => new FileStream(x, FileMode.Open) as Stream).ToList()), ZipArchiveMode.Read))
        {
            // Do whatever you want
        }
    }
    catch(InvalidDataException ex)
    {
        return false;
    }

    return true;
}

I am not sure if this is what you are looking for or you need more details in the error. But hope this helps you to come to solution of your issue.

Beechnut answered 6/2, 2020 at 19:57 Comment(4)
It's working but if I have naming patterns are fixed, because different software uses a different pattern, that's why I can't use a CombinationStream because it's need filename list.Shears
In my opinion, you can match the patterns using regex, and still use the same code. And anyway if you do not know filenames of splitted zip file, you will not be able to validate its integrity.Beechnut
Your using statement isn't complete - it ends with new FileStream(x, FileMode - where is the closing bracket? Are there any parameters missing?Donnydonnybrook
@Donnydonnybrook - thanks for letting me know. Updated the code.Beechnut
D
1

From your comments I understood that the issue you have is to identify the files (get the list of parts belonging together). You can get a list of files like

List<string> files = System.IO.Directory.EnumerateFiles(@"D:\Zip\ForExtract\multipart\",
            "500mbInputData.*", SearchOption.TopDirectoryOnly).OrderBy(x => x).ToList();

or for your second case

List<string> files = System.IO.Directory.EnumerateFiles(@"D:\Zip\ForExtract\multipart\", 
            "500mbInputData.zip.*", SearchOption.TopDirectoryOnly).OrderBy(x => x).ToList();

and then use the file list in your CombinationStream. The rest of the code would look like Manoj Choudhari wrote. You could also put the path and the file name with wild card into a parameter, so I'd suggest to add the following parameters to the function:

public static bool IsZipValid(string basePath, string fileNameWithWildcard)
{
    try
    {
        List<string> files = System.IO.Directory.EnumerateFiles(
                    basePath, fileNameWithWildcard, 
                    SearchOption.TopDirectoryOnly).OrderBy(x => x).ToList();

        using (var zipFile = // ... rest is as Manoj wrote

and use it like:

if (IsZipValid(@"D:\Zip\ForExtract\multipart\", "500mbInputData.*")) { // ... }

or

if (IsZipValid(@"D:\Zip\ForExtract\multipart\", "500mbInputData.zip.*")) { // ... }

To find out which kind of files you have in the basepath, you could write a helper function like

List<string> getZipFormat(string path)
{
    bool filesFound(string basePath, string pattern) => System.IO.Directory.EnumerateFiles(
            basePath, pattern, SearchOption.TopDirectoryOnly).Any();

    var isTar = filesFound(path, "*.tar.???");
    var isZip = filesFound(path, "*.z??");
    var is7Zip = filesFound(path, "*.7z.???");

    var result = new List<string>();
    if (isTar) result.Add("TAR");
    if (isZip) result.Add("ZIP");
    if (is7Zip) result.Add("7ZIP");
    return result;
}

Modify it to your needs - it will return a list of strings containing "TAR", "ZIP" or "7ZIP" (or more than one of them), depending on the patterns matching against the files in the base directory.

Usage (example for multi-zipformat check):

    var isValid = true;
    var basePath = @"D:\Zip\ForExtract\multipart\";
    foreach(var fmt in getZipFormat(basePath))
    switch (fmt)
    {
    case "TAR": 
        isValid = isValid & IsZipValid(basePath, "500mbInputData.tar.*");
        break;
    case "ZIP":
        isValid = isValid & IsZipValid(basePath, "500mbInputData.zip.*");
        break;
    case "7ZIP":
        isValid = isValid & IsZipValid(basePath, "500mbInputData.7z.*");
        break;
    default: 
        break;
    }

Note: As per my experiments with this, it could happen that the files remain open although your program has ended - meaning your files will still be locked the next time you run your code. So, I'd strongly suggest to explicitly close them, like

    var fStreams = files.Select(x => 
            new FileStream(x, FileMode.Open) as System.IO.Stream).ToList();
    using (var cStream = new CombinationStream(fStreams))
    using (var zipFile = new ZipArchive(cStream, ZipArchiveMode.Read))
    {
        // Do whatever you want...

        // ... but ensure you close the files
        fStreams.Select(s => { s.Close(); return s; });
    };
Donnydonnybrook answered 11/2, 2020 at 8:29 Comment(2)
We support multiple extensions like zip, tar, 7z, and rar so I can't use ".*" for an extension, but it's really helpful for me.Shears
@HirenJasani: Glad I could help. It is possible to use ? as well, like "*.zip.???"and also "*.z??" as wildcards - scan the directory for those patterns and if you find files in the base path check its integrity by using the code above. I have written a helper function and updated the answer.Donnydonnybrook

© 2022 - 2024 — McMap. All rights reserved.