archiving symlinks with python zipfile
Asked Answered
B

6

12

I have a script that creates zip files of dirs containing symlinks. I was surprised to find that the zipfiles have zipped the targets of the links as opposed to the links themselves, which is what I wanted and expected. Anyone know how to get zipfile to zip the links?

Byrd answered 3/3, 2016 at 21:0 Comment(0)
A
8

zipfile doesn't appear to support storing symbolic links. The way to store them in a ZIP is actually not part of the format and is only available as a custom extension in some implementations. In particular, Info-ZIP's implementation supports them so you can delegate to it instead. Make sure your decompression software can handle such archives - as I said, this feature is not standardized.

Arrowhead answered 3/3, 2016 at 21:9 Comment(3)
Thanks. I ended up forking off a subprocess to use the command line zip with --symlinks. It's much slower then the python zipfile lib, but it does support the symlinks.Byrd
@LarryMartell Info-ZIP has a shared library as well as a standalone executable which may save you some cycles. I cannot find any documentation for it though.Arrowhead
This is the answer that solves my problem. Other answers result in an empty normal directory (I've checked the external_attr of the added zinfo contains stat.IFLNK << 16) when I extract the archive with unzip.Casaleggio
N
10

It is possible to have zipfile store symbolic links, instead of the files themselves. For an example, see here. The relevant part of the script is storing the symbolic link attribute within the zipinfo:

zipInfo = zipfile.ZipInfo(archiveRoot)
zipInfo.create_system = 3
# long type of hex val of '0xA1ED0000L',
# say, symlink attr magic...
zipInfo.external_attr = 2716663808L
zipOut.writestr(zipInfo, os.readlink(fullPath))
Nones answered 10/1, 2017 at 16:16 Comment(3)
python3 doesn't distinguish between longs and ints, so zipInfo.external_attr = 0xA1ED0000L should work (which is slightly more readable?)Serpasil
I saw the better implementation: zipInfo.external_attr |= 0xA0000000Lurdan
The magic constants are all available as well, e.g. stat.S_IFLINKMarthmartha
A
8

zipfile doesn't appear to support storing symbolic links. The way to store them in a ZIP is actually not part of the format and is only available as a custom extension in some implementations. In particular, Info-ZIP's implementation supports them so you can delegate to it instead. Make sure your decompression software can handle such archives - as I said, this feature is not standardized.

Arrowhead answered 3/3, 2016 at 21:9 Comment(3)
Thanks. I ended up forking off a subprocess to use the command line zip with --symlinks. It's much slower then the python zipfile lib, but it does support the symlinks.Byrd
@LarryMartell Info-ZIP has a shared library as well as a standalone executable which may save you some cycles. I cannot find any documentation for it though.Arrowhead
This is the answer that solves my problem. Other answers result in an empty normal directory (I've checked the external_attr of the added zinfo contains stat.IFLNK << 16) when I extract the archive with unzip.Casaleggio
G
7

Please find a complete Python code as a working example that creates a cpuinfo.zip archive with the symbolic link cpuinfo.txt that points to /proc/cpuinfo.

#!/usr/bin/python

import stat
import zipfile

def create_zip_with_symlink(output_zip_filename, link_source, link_target):
    zipInfo  = zipfile.ZipInfo(link_source)
    zipInfo.create_system = 3 # System which created ZIP archive, 3 = Unix; 0 = Windows
    unix_st_mode = stat.S_IFLNK | stat.S_IRUSR | stat.S_IWUSR | stat.S_IXUSR | stat.S_IRGRP | stat.S_IWGRP | stat.S_IXGRP | stat.S_IROTH | stat.S_IWOTH | stat.S_IXOTH
    zipInfo.external_attr = unix_st_mode << 16 # The Python zipfile module accepts the 16-bit "Mode" field (that stores st_mode field from struct stat, containing user/group/other permissions, setuid/setgid and symlink info, etc) of the ASi extra block for Unix as bits 16-31 of the external_attr
    zipOut = zipfile.ZipFile(output_zip_filename, 'w', compression=zipfile.ZIP_DEFLATED)
    zipOut.writestr(zipInfo, link_target)
    zipOut.close()

create_zip_with_symlink('cpuinfo.zip', 'cpuinfo.txt', '/proc/cpuinfo')

You can further issue the following commands (e.g. under Ubuntu) to see how the archive unpacks to a working symbolic link:

unzip cpuinfo.zip
ls -l cpuinfo.txt
cat cpuinfo.txt
Germaun answered 20/1, 2021 at 20:49 Comment(0)
B
1

I have defined the following method in a Zip support class

def add_symlink(self, link, target, permissions=0o777):
    self.log('Adding a symlink: {} => {}'.format(link, target))
    permissions |= 0xA000

    zi = zipfile.ZipInfo(link)
    zi.create_system = 3
    zi.external_attr = permissions << 16
    self.zip.writestr(zi, target)
Bim answered 15/3, 2020 at 9:13 Comment(0)
M
0

While not part of the POSIX standard, many zip implementations support storing generic filesystem attributes on entries. The high bytes of the 4-byte value represent the file mode.

Essentially you need to replicate ZipInfo.from_file, but without following the link or truncating the mode:

st = os.lstat(path)
mtime = time.localtime(st.st_mtime)
info = zipfile.ZipInfo(name, mtime[0:6])
info.file_size = st.st_size
info.external_attr = st.st_mode << 16
out_zip.writestr(info, os.readlink(path))
Marthmartha answered 15/9, 2021 at 12:35 Comment(0)
H
0

Here's what I've tried to improve:

  • tested with Python 3.11
  • iterative loop instead of recursive.
  • preserve the the original symlink attribute (e.g. permission)

import zipfile
import stat
import os 

def archive(source, output_path):
    def _convert_attr_to_symlink_type(external_attr):
        # Refer to https://unix.stackexchange.com/a/14727
        # zipfile external_attr is 32 bit file attribute structure
        # first 4 bits determine filetype
        # next 3 bit setuid, setgid, sticky
        # next 9 bit is the read write execute permission for user group & others.
        # next 8 bit is unused
        # last 8 bit is DOS attribute

        # Preserve everything except the first 4 bits (i.e filetype bit)
        # MASK: 00001111111111111111111111111111
        preserve_mask =  (1 << 28) - 1
        external_attr &= preserve_mask

        # Overwrite File type as Symbolic Link File type (modify first 4 bits)
        # MASK: 10100000000000000000000000000000
        overwrite_mask = stat.S_IFLNK << 16
        external_attr |= overwrite_mask
        return external_attr

    with zipfile.ZipFile(output_path, mode='w') as zf:
        for root, folders, files in os.walk(source):
            for folder in folders:
                folderpath = os.path.join(root, folder)
                if os.path.islink(folderpath):
                    zip_info = zipfile.ZipInfo.from_file(folderpath)
                    zip_info.filename = zip_info.filename.rstrip('/')
                    zip_info.external_attr = _convert_attr_to_symlink_type(zip_info.external_attr)
                    zf.writestr(zip_info, os.readlink(folderpath))

            for filename in files:
                filepath = os.path.join(root, filename)

                if os.path.islink(filepath):
                    zip_info = zipfile.ZipInfo.from_file(filepath)
                    zip_info.external_attr = _convert_attr_to_symlink_type(zip_info.external_attr)
                    zf.writestr(zip_info, os.readlink(filepath))
                else:
                    zf.write(filepath)


archive('testfolder', test.zip')
Heptastich answered 16/1 at 1:6 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.