How to quickly check if a zip file is corrupted?
Asked Answered
E

5

21

Does anyone have any ideas for how to pragmatically quickly check if a zip file is corrupted based on file size? Ideally the best way to check if a zip is corrupted is to do a CRC check but this can take a long time especially if there is a lot of large zip files. I would be happy just to be able to do a quick file size or header check.

Thanks in advance.

Ewaewald answered 17/10, 2010 at 17:58 Comment(2)
I'm currently using c# for my task but language does not matter.Ewaewald
This question on Unix & Linux Stack Exchange could also be relevant: unix.stackexchange.com/questions/197127/…Pernik
T
8

Section 4.3.7 of this page says that the compressed size is 4 bytes starting from byte 18. You could try reading that and comparing it to the size to the file.

However, I think it's pretty much useless for checking if the zip file is corrupted for two reasons:

  1. Some zip files contain more bytes than just the zip part. For example, self-extracting archives have an executable part yet they're still valid zip.
  2. The file can be corrupted without changing its size.

So, I suggest calculating the CRC for a guaranteed method of checking for corruption.

Tambour answered 17/10, 2010 at 18:11 Comment(3)
Also, many zip creation tools will write the header before they know the length of the file, so these bytes remain zero (to support streaming, presumably).Springhouse
What @Springhouse said is true, but also - the compressed size starting from byte 18 is the compressed size of a single entry in the zip file. It is not the compressed size of the zip file.Bidet
Also, this may be obvious, but worth stating: "calculating the CRC" works to verify the file, only if the original CRC is known.Bidet
D
27

Use zip -T to test the the file corrupted or not. Sample corrupted file look like this:

 zip -T filename.zip
        zip warning: missing end signature--probably not a zip file (did you
        zip warning: remember to use binary mode when you transferred it?)
        zip warning: (if you are trying to read a damaged archive try -F)

zip error: Zip file structure invalid (filename.zip)
Deplane answered 22/8, 2017 at 7:27 Comment(1)
Very handy. Can also be used to distinguish between e.g. doc and docx files where the file extension isn't reliable.Extractor
T
8

Section 4.3.7 of this page says that the compressed size is 4 bytes starting from byte 18. You could try reading that and comparing it to the size to the file.

However, I think it's pretty much useless for checking if the zip file is corrupted for two reasons:

  1. Some zip files contain more bytes than just the zip part. For example, self-extracting archives have an executable part yet they're still valid zip.
  2. The file can be corrupted without changing its size.

So, I suggest calculating the CRC for a guaranteed method of checking for corruption.

Tambour answered 17/10, 2010 at 18:11 Comment(3)
Also, many zip creation tools will write the header before they know the length of the file, so these bytes remain zero (to support streaming, presumably).Springhouse
What @Springhouse said is true, but also - the compressed size starting from byte 18 is the compressed size of a single entry in the zip file. It is not the compressed size of the zip file.Bidet
Also, this may be obvious, but worth stating: "calculating the CRC" works to verify the file, only if the original CRC is known.Bidet
D
7

This might be a late answer, but if you are on the windows command line, and have 7zip installed, just add it to your system PATH and run this:

7z t file.zip

Dia answered 23/1, 2022 at 16:47 Comment(1)
If it's not on your PATH, use C:\Progra~1\7-Zip\7z.exeAnnihilator
B
6

DotNetZip, a free open source library for handling zip files in .NET languages, supports a CheckZip() method that does what you want. There are various levels of assurance available at your option. The basic level just checks consistency of metadata. The most complete level does a full extraction of the zip file into a bitbucket to verify that the actual compressed data is not corrupted.

Bidet answered 24/10, 2010 at 16:15 Comment(2)
CodePlex is dead and those pages are now "Archive".Metalline
This might be the same code? github.com/DinoChiesa/DotNetZipMetalline
C
1

To check the whole archive 'for sure' you need to extract all data (since CRC, stored in archive, is calculated over uncompressed data), and, even after that you cannot be sure for 100% that it is not corrupted (because CRC is good, but not-guarantee that data was not altered).

Calfskin answered 12/11, 2010 at 17:21 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.