Python zipfile module erroneously thinks I have a zipfile that spans multiple disks, throws BadZipfile error
Asked Answered
P

4

13

I have a 1.4GB zip file and am trying to yield each member in succession. The zipfile module keeps throwing a BadZipfile exception, stating that

"zipfile.BadZipfile: zipfiles that span multiple disks are not supported".

Here is my code:

import zipfile

def iterate_members(zip_file_like_object):
  zflo = zip_file_like_object
  assert zipfile.is_zipfile(zflo) # Here is where the error happens.
  # If I comment out the assert, the same error gets thrown on this next line:
  with zipfile.ZipFile(zflo) as zip:
    members = zip.namelist()
    for member in members:
      yield member

fn = "filename.zip"
iterate_members(open(fn, 'rb'))

I'm using Python 2.7.3. I tried on both Windows 8 and ubuntu with the same result. Any help very much appreciated.

Protero answered 15/7, 2013 at 21:27 Comment(5)
Can you post the zip file (or a link to it)? The code that leads up to this error is pretty straightforward; it checks whether the file header declares more than one disk or the disk number of the file to be anything other than zero.Cotsen
Thanks phihag. Unfortunately I cannot post the file as it contains confidential client data.Protero
Well, can you generate a zip file with large test data that still shows the problem?Cotsen
Probably depends more on the software used to create the .zip. I just had this problem with a 500MB .zip from a customer. Unpacked it and repacked (obviously with a different zip tool than my customer) and it works. The repacked file is even bigger due to less compression. So, size does not seem to be all that matters.Protein
python3.7 can be run with a file while python3.6 can't for me.Pajamas
A
15

I get the same error on a similar file although I am using python 3.4

Was able to fix it by editing line 205 in zipfile.py source code:

if diskno != 0 or disks != 1:
    raise BadZipFile("zipfiles that span multiple disks are not supported")

to:

if diskno != 0 or disks > 1:

Hope this helps

Argol answered 2/2, 2015 at 13:54 Comment(2)
The zip tool built into Windows seems buggy and procudes an end-of-archive record which looks like zip64, but isn't. Hence _EndRecData64 finds disks == 0. It should have returend endrec instead. But the fix works nicely, thanks.Protein
Any update on this? This does not seem like an acceptable solution since you need to manually modify the Python Library source code. What if the python installation runs in the cloud and you don't have access to the Python Installation.Emerick
B
3

Quick Fix, Install zipfile38 using:

pip install zipfile38

And use it in the code same as you are doing before

import zipfile38 as zipfile
#your code goes here
Blurt answered 15/1, 2021 at 15:0 Comment(0)
M
1

This is fixed in newer versions of Python, which apply the fix that @josselin suggests. For older versions of Python, you can avoid installing a 3rd party package or modifying source code by monkey-patching. It is not pretty, but it will get the job done:

import zipfile
import struct

# Monkey-patch zipfile._EndRecData64
def _EndRecData64(fpin, offset, endrec):
    """
    Read the ZIP64 end-of-archive records and use that to update endrec
    """
    try:
        fpin.seek(offset - zipfile.sizeEndCentDir64Locator, 2)
    except OSError:
        # If the seek fails, the file is not large enough to contain a ZIP64
        # end-of-archive record, so just return the end record we were given.
        return endrec

    data = fpin.read(zipfile.sizeEndCentDir64Locator)
    if len(data) != zipfile.sizeEndCentDir64Locator:
        return endrec
    sig, diskno, reloff, disks = struct.unpack(
        zipfile.structEndArchive64Locator, data)
    if sig != zipfile.stringEndArchive64Locator:
        return endrec

    if diskno != 0 or disks > 1:
        raise zipfile.BadZipFile(
            "zipfiles that span multiple disks are not supported")

    # Assume no 'zip64 extensible data'
    fpin.seek(
        offset - zipfile.sizeEndCentDir64Locator - zipfile.sizeEndCentDir64, 2)
    data = fpin.read(zipfile.sizeEndCentDir64)
    if len(data) != zipfile.sizeEndCentDir64:
        return endrec
    sig, sz, create_version, read_version, disk_num, disk_dir, \
        dircount, dircount2, dirsize, diroffset = \
        struct.unpack(zipfile.structEndArchive64, data)
    if sig != zipfile.stringEndArchive64:
        return endrec

    # Update the original endrec using data from the ZIP64 record
    endrec[zipfile._ECD_SIGNATURE] = sig
    endrec[zipfile._ECD_DISK_NUMBER] = disk_num
    endrec[zipfile._ECD_DISK_START] = disk_dir
    endrec[zipfile._ECD_ENTRIES_THIS_DISK] = dircount
    endrec[zipfile._ECD_ENTRIES_TOTAL] = dircount2
    endrec[zipfile._ECD_SIZE] = dirsize
    endrec[zipfile._ECD_OFFSET] = diroffset
    return endrec


# Overwrite _EndRecData64 with the fixed version
zipfile._EndRecData64 = _EndRecData64
Marchese answered 29/9, 2023 at 22:13 Comment(0)
F
1

It may be worth checking if you are running into deflate64 encryption, and if so you can

pip install zipfile-deflate64

and

import zipfile_deflate64 as zipfile

Then try running and see if it fixes things. May not, but it'd be quick to check.

Frosted answered 27/10, 2023 at 1:4 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.