Check if a directory exists in a zip file with Python
Asked Answered
P

5

18

Initially I was thinking of using os.path.isdir but I don't think this works for zip files. Is there a way to peek into the zip file and verify that this directory exists? I would like to prevent using unzip -l "$@" as much as possible, but if that's the only solution then I guess I have no choice.

Piecemeal answered 23/7, 2012 at 17:28 Comment(0)
B
15

Just check the filename with "/" at the end of it.

import zipfile

def isdir(z, name):
    return any(x.startswith("%s/" % name.rstrip("/")) for x in z.namelist())

f = zipfile.ZipFile("sample.zip", "r")
print isdir(f, "a")
print isdir(f, "a/b")
print isdir(f, "a/X")

You use this line

any(x.startswith("%s/" % name.rstrip("/")) for x in z.namelist())

because it is possible that archive contains no directory explicitly; just a path with a directory name.

Execution result:

$ mkdir -p a/b/c/d
$ touch a/X
$ zip -r sample.zip a
adding: a/ (stored 0%)
adding: a/X (stored 0%)
adding: a/b/ (stored 0%)
adding: a/b/c/ (stored 0%)
adding: a/b/c/d/ (stored 0%)

$ python z.py
True
True
False
Baccivorous answered 23/7, 2012 at 17:38 Comment(8)
Thanks! Well this worked with the sample you provided. But I'm trying to do this for docx files. Essentially I'm checking if the zip file contains the directory "word", but it's giving me false responses :(Piecemeal
Just try to print the list of files in your docx and see what is strange with it: print zipfile.ZipFile("sample.docx", "r").namelist()Baccivorous
I suppose that you have some prefix before word. Please check it.Baccivorous
word/_rels/document.xml.rels This is a file contained it in, I printed it straight out of z.namelist()Piecemeal
I fixed the function according to your needs. Please try it.Baccivorous
I'm trying to find the folder "word" I don't about the contents. I noticed it works if I give it a file, but not when I just give it the directory wordPiecemeal
Have you checked my function? I'm sure it must work now. Please check itBaccivorous
This worked great, thanks! I tweaked it a little and it works perfectlyPiecemeal
M
8

You can check for the directories with ZipFile.namelist().

import os, zipfile
dir = "some/directory/"

z = zipfile.ZipFile("myfile.zip")
if dir in z.namelist():
    print "Found %s!" % dir
Malraux answered 23/7, 2012 at 17:32 Comment(4)
This works for files but not directories :( at least not for me.Piecemeal
Try printing the namelist() of your .zip file to make sure your directory is formatted correctly.Malraux
Yea, I made sure the directory is there. I'm trying to do it for docx files, which are zip files anyways so that shouldn't matter right?Piecemeal
Oh I found the issue, the list doesn't contain the directory "word" by itself, rather it contains all the files.Piecemeal
H
2

for python(>=3.6):

this is how the is_dir() implemented in python source code:

def is_dir(self):
    """Return True if this archive member is a directory."""
    return self.filename[-1] == '/'

It simply checks if the filename ends with a slash /, Can't tell if this will work correctly in some certain circumstances(so IMO it is badly implemented).

for python(<3.6):

as print(zipinfo) will show filemode but no corrsponding property or field is provided, I dive into zipfile module source code and found how it is implemented. (see def __repr__(self): https://github.com/python/cpython/blob/3.6/Lib/zipfile.py)

possibly a bad idea but it will work:

if you want something simple and easy, this will work in most cases but it may fail because in some cases this field will not be printed.

def is_dir(zipinfo):
    return "filemode='d" in zipinfo.__repr__()

Finally:

my solution is to check file mode manually and decide if the referenced file is actually a directory inspired by https://github.com/python/cpython/blob/3.6/Lib/zipfile.py line 391.

def is_dir(fileinfo):
    hi = fileinfo.external_attr >> 16
    return (hi & 0x4000) > 0
Hyonhyoscine answered 14/11, 2018 at 3:28 Comment(0)
N
0

You can accomplish this using the built-in library ZipFile.

import zipfile
z = zipfile.ZipFile("file.zip")

if "DirName/" in [member.filename for member in z.infolist()]:
    print("Directory exists in archive")

Tested and functional with Python32.

Nusku answered 23/7, 2012 at 17:46 Comment(2)
You are trying to use a docx file instead of a zip? Rename the extension to .zip and try it again, it should work.Nusku
It works fine unzipping, and I can get it to print all the files. But the directory "word" is not in namelist(), rather individual files, such as word/webSettings.xml so it's not getting a match.Piecemeal
S
0

Not listed here, but this can be be done as a dictionary lookup.

import os, zipfile

def zip_contains_dir(zip_handle, path):
    # Ensure trailing slash.
    path = path.rstrip("/") + "/"
    return path in zip_handle.NameToInfo


with zipfile.ZipFile("myfile.zip") as zip_handle:
    print(zip_contains_dir(zip_handle, "some/dir"))

Sleuth answered 9/2 at 0:58 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.