How to stream from ZipFile? How to zip "on the fly"?
Asked Answered
C

2

8

I want to zip a stream and stream out the result. I'm doing it using AWS Lambda which matters in sense of available disk space and other restrictions. I'm going to use the zipped stream to write an AWS S3 object using upload_fileobj() or put(), if it matters.

I can create an archive as a file until I have small objects:

import zipfile
zf = zipfile.ZipFile("/tmp/byte.zip", "w")
zf.writestr(filename, my_stream.read())
zf.close()

For large amount of data I can create an object instead of file:

from io import BytesIO
...
byte = BytesIO()
zf = zipfile.ZipFile(byte, "w")
....

but how can I pass the zipped stream to the output? If I use zf.close() - the stream will be closed, if I don't use it - the archive will be incomplete.

Circle answered 4/4, 2019 at 11:23 Comment(0)
T
4

You might like to try the zipstream version of zipfile. For example, to compress stdin to stdout as a zip file holding the data as a file named TheLogFile using iterators:

#!/usr/bin/python3
import sys, zipstream
with zipstream.ZipFile(mode='w', compression=zipstream.ZIP_DEFLATED) as z:
    z.write_iter('TheLogFile', sys.stdin.buffer)
    for chunk in z:
        sys.stdout.buffer.write(chunk)
Tsosie answered 4/4, 2019 at 12:39 Comment(3)
The key here "holding the data as a file". I do not want to use a file due to environment limitations. How should it look then?Circle
I wasn't clear. I just meant that the final output is a stream that, were you to save it to a file, would appear to be a zipfile. If you were to unzip it, you would get a file called TheLogFile containing whatever data you read from stdin. The only file is the nominal one that is part of the output stream format. Look at the webpy example at the end of the link, as that seems to be similar to your situation.Tsosie
got it, thank you. Another question: looks like indentation is a bit messy, does with zipstream ... contains only z.write_iter... only or for chunk... too?Circle
N
13

Instead of using Python't built-in zipfile, you can use stream-zip (full disclosure: written by me)

If you have an iterable of bytes, my_data_iter say, you can get an iterable of a zip file using its stream_zip function:

from datetime import datetime
from stream_zip import stream_zip, ZIP_64

def files():
    modified_at = datetime.now()
    perms = 0o600
    yield 'my-file-1.txt', modified_at, perms, ZIP_64, my_data_iter

my_zip_iter = stream_zip(files())

If you need a file-like object of the zipped bytes, say to pass to boto3's upload_fileobj, you can convert from the iterable with a transformation function, like the one from to-file-like-obj (also written by me)

import boto3
from to_file_like_obj import to_file_like_obj

# Convert iterable to file-like object
my_file_like_obj = to_file_like_obj(my_zip_iter)

# Upload to S3 (likely using a multipart upload)
s3 = boto3.client('s3')
s3.upload_fileobj(my_file_like_obj, 'my-bucket', 'my.zip')
Nupercaine answered 8/1, 2022 at 13:45 Comment(5)
Thanks for writing this library, super helpful and exactly what I was looking for!Eighteenmo
stream-zip is an amazing library. Super powerful and fast. I recommended it in another question at https://mcmap.net/q/1323955/-python-creating-zip-in-stream-that-exceeds-ram for anyone who wants to know more about it.Aguste
What are the main differences with the builtin zipstream?Musketeer
@AlbertHendriks I don't think there is a builtin zipstream? Can you clarify which zipstream you mean?Nupercaine
Never mind, I was misinformed.Musketeer
T
4

You might like to try the zipstream version of zipfile. For example, to compress stdin to stdout as a zip file holding the data as a file named TheLogFile using iterators:

#!/usr/bin/python3
import sys, zipstream
with zipstream.ZipFile(mode='w', compression=zipstream.ZIP_DEFLATED) as z:
    z.write_iter('TheLogFile', sys.stdin.buffer)
    for chunk in z:
        sys.stdout.buffer.write(chunk)
Tsosie answered 4/4, 2019 at 12:39 Comment(3)
The key here "holding the data as a file". I do not want to use a file due to environment limitations. How should it look then?Circle
I wasn't clear. I just meant that the final output is a stream that, were you to save it to a file, would appear to be a zipfile. If you were to unzip it, you would get a file called TheLogFile containing whatever data you read from stdin. The only file is the nominal one that is part of the output stream format. Look at the webpy example at the end of the link, as that seems to be similar to your situation.Tsosie
got it, thank you. Another question: looks like indentation is a bit messy, does with zipstream ... contains only z.write_iter... only or for chunk... too?Circle

© 2022 - 2024 — McMap. All rights reserved.