Delete file from zipfile with the ZipFile Module
Asked Answered
A

5

48

The only way I came up for deleting a file from a zipfile was to create a temporary zipfile without the file to be deleted and then rename it to the original filename.

In python 2.4 the ZipInfo class had an attribute file_offset, so it was possible to create a second zip file and copy the data to other file without decompress/recompressing.

This file_offset is missing in python 2.6, so is there another option than creating another zipfile by uncompressing every file and then recompressing it again?

Is there maybe a direct way of deleting a file in the zipfile, I searched and didn't find anything.

Algicide answered 4/2, 2009 at 23:0 Comment(1)
I found this thread on the Python bug tracker discussing the difficulties of removing files from a zip file: bugs.python.org/issue6818Houdon
K
55

The following snippet worked for me (deletes all *.exe files from a Zip archive):

zin = zipfile.ZipFile ('archive.zip', 'r')
zout = zipfile.ZipFile ('archve_new.zip', 'w')
for item in zin.infolist():
    buffer = zin.read(item.filename)
    if (item.filename[-4:] != '.exe'):
        zout.writestr(item, buffer)
zout.close()
zin.close()

If you read everything into memory, you can eliminate the need for a second file. However, this snippet recompresses everything.

After closer inspection the ZipInfo.header_offset is the offset from the file start. The name is misleading, but the main Zip header is actually stored at the end of the file. My hex editor confirms this.

So the problem you'll run into is the following: You need to delete the directory entry in the main header as well or it will point to a file that doesn't exist anymore. Leaving the main header intact might work if you keep the local header of the file you're deleting as well, but I'm not sure about that. How did you do it with the old module?

Without modifying the main header I get an error "missing X bytes in zipfile" when I open it. This might help you to find out how to modify the main header.

Kellerman answered 4/2, 2009 at 23:31 Comment(3)
thanks, but if i am not wrong - when you take a look at zipfile.writestr you will see that this is just a recompress. It would be much faster to just copy the already compressed files without uncomressing and then compressing them again.Algicide
@RSabt I agree with mdm that the unzip-and-rezip is the only viable option so far. By the way, wanna point out that mdm 's code helps, but better use os.path.splitext() when you gonna do something more seriously.Reservation
also you could avoid extracting the executable files. Check name first, and if not an executable, then read input. Would save some useless extraction time.Misunderstood
R
10

Not very elegant but this is how I did it:

import subprocess
import zipfile

z = zipfile.ZipFile(zip_filename)

files_to_del = filter( lambda f: f.endswith('exe'), z.namelist()]

cmd=['zip', '-d', zip_filename] + files_to_del
subprocess.check_call(cmd)

# reload the modified archive
z = zipfile.ZipFile(zip_filename)
Repletion answered 17/8, 2017 at 16:54 Comment(2)
This is what I ended up doing. Ugly, but ZipFile just doesn't seem to have a way of deleting or updating/replacing files.Radioman
This solution is platform specific and/or requires zip software to be installed on OS. Moreover, the overhead of a new subprocess is introduced.Longrange
I
9

Based on Elias Zamaria comment to the question.

Having read through Python-Issue #51067, I want to give update regarding it.

For today, solution already exists, though it is not approved by Python due to missing Contributor Agreement from the author.

Nevertheless, you can take the code from https://github.com/python/cpython/blob/659eb048cc9cac73c46349eb29845bc5cd630f09/Lib/zipfile.py and create a separate file from it. After that just reference it from your project instead of built-in python library: import myproject.zipfile as zipfile.

Usage:

with zipfile.ZipFile(f"archive.zip", "a") as z:
    z.remove(f"firstfile.txt")

I believe it will be included in future python versions. For me it works like a charm for given use case.

Ingaingaberg answered 9/9, 2021 at 9:19 Comment(1)
Seems to be broken for .jar files, sometimes it deletes everything instead of the file you wantedSupplementary
M
6

The routine delete_from_zip_file from ruamel.std.zipfile¹ allows you to delete a file based on its full path within the ZIP, or based on (re) patterns. E.g. you can delete all of the .exe files from test.zip using

from ruamel.std.zipfile import delete_from_zip_file

delete_from_zip_file('test.zip', pattern='.*.exe')  

(please note the dot before the *).

This works similar to mdm's solution (including the need for recompression), but recreates the ZIP file in memory (using the class InMemZipFile()), overwriting the old file after it is fully read.


¹ Disclaimer: I am the author of that package.

Mantic answered 1/1, 2017 at 10:33 Comment(3)
The delete_from_zip_file routine is very useful for me, but i'm getting this error while trying to remove many files from big archive (~3Gb in size) with bunch of folders: "LargeZipFile: Zipfile size would require ZIP64 extensions". I guess there are should be modifications in ruamel.std.zipfile, in the init.py file (like allowZip64 = True for zipfile.ZipFile(..)), right?Extemporaneous
I have never worked with allowZip64, no idea what it is about.Mantic
Easiest solution for small implicationsSupplementary
B
0

TL;DR:

import zipfile

with zipfile.ZipFile("bad.zip") as bad:
    # Or use "a" instead of "w" if you're appending
    with zipfile.ZipFile("good", "w") as good:
        for zip_info in bad.infolist():
            # I had hundreds of duplications of 'sample_100.csv'
            not_a_bad_file = zip_info.filename != 'sample_33.csv' or zip_info.file_size > 146622
            if not_a_bad_file:
                good.writestr(zip_info, bad.read(zip_info))

Explanation:

I added multiple files with the same name by mistake, and all of them were nearly 0 byte. The method suggested by @mdm won't work here. This is because if you pass the filename (str) to the read method, it gives you the last item - at least, it seems that way. However, after reading the library doc in CPython code, this part will become apparent:

.. note::

      The :meth:`.open`, :meth:`read` and :meth:`extract` methods can take a filename
      or a :class:`ZipInfo` object.  You will appreciate this when trying to read a
      ZIP file that contains members with duplicate names.

By passing zip_info (a ZipInfo object), you can be sure that you will retrieve that exact file.

Brocky answered 13/9, 2023 at 18:48 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.