Unable to unzip a large zip file (3.3GB) in iOS9 using SSZipArchive
Asked Answered
W

2

13

As title, I creates the zip file from my Django backend server (hosted on a Ubuntu 14.04.1 LTS) using the python zipfile module:

zipfile.ZipFile(dest_path, mode='w', compression=zipfile.ZIP_DEFLATED, 
                allowZip64=True)

I managed to open it using my Mac in Finder, but no success using the SSZipArchive library. I have tried using the latest commit of master branch and also tag v1.0.1 and v0.4.0.

Using v0.4.0, I got error in line 1506 of unzip.c:

    if (unz64local_CheckCurrentFileCoherencyHeader(s,
                                                   &iSizeVar, 
                                                   &offset_local_extrafield,&size_local_extrafield)!=UNZ_OK)
        return UNZ_BADZIPFILE;

and it stucked at unzipping on the same file every time with the same currentFileNumber.

Does anyone get any clues?

P.S. I think SSZipArchive should support Zip64 archive file as I have asked a question on their github repo.

Updates [20160129] I performed a zipinfo check on the zip file and have the following output:

...
-rw-r--r--  2.0 unx     1992 b- defN 26-Nov-15 14:59 <file_name>
-rw-r--r--  2.0 unx      925 b- defN 26-Nov-15 14:59 <file_name>
-rw-r--r--  2.0 unx     1194 b- defN 26-Nov-15 14:59 <file_name>
-rw-r--r--  2.0 unx       72 b- defN 26-Nov-15 14:52 <file_name>
-rw-r--r--  2.0 unx      289 b- defN 18-Jan-16 11:27 <file_name>
-rw-r--r--  2.0 unx     1541 b- defN 18-Jan-16 11:27 <file_name>
-rw-r--r--  2.0 unx      295 b- defN 18-Jan-16 11:27 <file_name>
-rw-r--r--  2.0 unx 449619181 b- defN 18-Jan-16 11:26 <file_name>
-rw-r--r--  4.5 unx 73128184 bx defN 18-Jan-16 11:26 <file_name>
-rw-r--r--  4.5 unx 69444488 bx defN 18-Jan-16 11:26 <file_name>
-rw-r--r--  4.5 unx   671440 bx defN 18-Jan-16 11:26 <file_name>
-rw-r--r--  4.5 unx 20189549 bx defN 18-Jan-16 11:27 <file_name>
-rw-r--r--  4.5 unx      197 bx defN 18-Jan-16 11:26 <file_name>
-rw-r--r--  4.5 unx  1379396 bx defN 18-Jan-16 11:26 <file_name>
...
Woolcott answered 27/1, 2016 at 8:55 Comment(5)
It looks like the size of the file is being misreported in the headers of the zip archive. There are a few other tools relying on minizip which have reported this type of behavior with JSON files in particular, though they appear to be quite old issues. Do you happen to know the type of file causing the issue, and can you remove it using another zip tool to help narrow the behavior?Machismo
I find the file causing the problem is large mp4 video file. But I have no idea whether this large mp4 video is the culprit. You see the 2.0 line with size 449619181 above is the large mp4 file.Woolcott
do you happen to have a file comment on the mp4? #20250522Machismo
@ASmallShellScript how can I check whether a file have file comment?Woolcott
I think something like this would work. you probably can seek the file you want in a nicer way since it's massive. zf = zipfile.ZipFile(archive_name) for info in zf.infolist(): info.commentMachismo
W
2

After a dozen of trial and error, I found it is a problem with the zip file generated by python zipfile package. If I used the zip command provided by the Ubuntu server of version

Copyright (c) 1990-2008 Info-ZIP - Type 'zip "-L"' for software license.

This is Zip 3.0 (July 5th 2008), by Info-ZIP.

to archive a large data of 4GB, SSZipArchive can extract the zip package successfully.

I tested the zip file with zipinfo and found:

...
-rw-r--r--  3.0 unx     2939 bx     2677 defN 16-Jan-28 16:33 <file_name>
-rw-r--r--  3.0 unx    15069 bx     3040 defN 16-Jan-28 16:33 <file_name>
-rw-r--r--  3.0 unx     3265 bx     3003 defN 16-Jan-28 16:33 <file_name>
-rw-r--r--  3.0 unx     3048 bx     2766 defN 16-Jan-28 16:33 <file_name>
-rw-r--r--  3.0 unx     3453 bx     3168 defN 16-Jan-28 16:33 <file_name>
-rw-r--r--  3.0 unx     1415 tx      534 defN 16-Jan-28 16:33 <file_name>
drwxr-xr-x  3.0 unx        0 bx        0 stor 16-Jan-28 16:33 <file_name>
-rw-r--r--  3.0 unx     3302 tx      695 defN 16-Jan-28 16:33 <file_name>
drwxr-xr-x  3.0 unx        0 bx        0 stor 16-Jan-28 16:33 <file_name>
-rw-r--r--  3.0 unx   130678 bx   127322 defN 16-Jan-28 16:33 <file_name>
-rw-r--r--  3.0 unx   133540 bx   130045 defN 16-Jan-28 16:33 <file_name>
-rw-r--r--  3.0 unx      136 tx       71 defN 16-Jan-28 16:33 <file_name>
-rw-r--r--  3.0 unx     1416 tx      541 defN 16-Jan-28 16:33 <file_name>
-rw-r--r--  3.0 unx     1417 tx      541 defN 16-Jan-28 16:33 <file_name>
-rw-r--r--  3.0 unx     2766 tx      652 defN 16-Jan-28 16:33 <file_name>
5551 files, 3854751563 bytes uncompressed, 3793408959 bytes compressed:  1.6%
Woolcott answered 12/2, 2016 at 8:22 Comment(0)
L
0

To summarize stuff mentioned above but not emphasized: the problem is caused by ZIP header version field — this field is repeated in each file's header within an archive. The default Linux zip command (Info-ZIP) uses header version 3.0, as shown in the answer above. Python uses header 2.0 by default. But if some file exceeds 2GB size threshold then Python switches to 4.5 header version for this and subsequent files. Looks like this causes problems with SSZipArchive.

My current temporary workaround is to monkey-patch zipfile module:

import zipfile
zipfile.ZIP64_VERSION = 30  # instead of 45

This will create zip archives readable by SSZipArchive, but I think it will also violate Zip64 standard which states that For Zip64 format archives, this value should not be less than 45 (for extract_version field).

Leyva answered 21/5, 2022 at 20:56 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.