how to upload stream to AWS s3 with python
Asked Answered
H

2

4

I want to create a lambda that gets a zip file(which may contain a list of csv files) from S3, unzip it and upload back to s3. since lambda is limited by memory/disk size, I have to stream it from s3 and back into it. I use python (boto3) see my code below

count = 0
obj = s3.Object( bucket_name, key )
buffer = io.BytesIO(obj.get()["Body"].read())
print (buffer)
z = zipfile.ZipFile(buffer)
for x in z.filelist:
    with z.open(x) as foo2:
        print(sys.getsizeof(foo2))
        line_counter = 0
        out_buffer = io.BytesIO()
        for f in foo2:
            out_buffer.write(f)
            # out_buffer.writelines(f)
            line_counter += 1
        print (line_counter)
        print foo2.name
        s3.Object( bucket_name, "output/"+foo2.name+"_output" ).upload_fileobj(out_buffer)
        out_buffer.close()
z.close()

result is, creating empty files in the bucket. for example: if file: input.zip contained files: 1.csv,2.csv i get in the bucket 2 empty csv files with the corresponding names. also, i'm not sure it indeed stream the files, or just download all the zip file thanks

Herewith answered 30/1, 2018 at 15:45 Comment(3)
see edited questionHerewith
The boto3 client.get_object() method supports a Range parameter. You can use it to request a range of bytes e.g. "bytes=1024-2048".Aglow
@Herewith You can upload stream to AWS S3 with Python. Please check below my answerPoliceman
S
5

You need to seek back to the beginning of the ByesIO file before uploading.

out_buffer = io.BytesIO()
for f in foo2:
    out_buffer.write(f)
    # out_buffer.writelines(f)
    line_counter += 1

out_buffer.seek(0) # Change stream position to beginning of file

s3.Object( bucket_name, "output/"+foo2.name+"_output").upload_fileobj(out_buffer)
out_buffer.close()
Succory answered 30/1, 2018 at 18:57 Comment(0)
P
0

You can unzip the file from S3 and extract to S3.

s3Bucket ="s3-bucket"   #Provide S3 bucket name
file_name = "test.zip"  #Provide zip file name

s3=boto3.resource('s3')
zip_obj = s3.Object(bucket_name=s3Bucket, key=file_name)
buffer = BytesIO(zip_obj.get()["Body"].read())
z = zipfile.ZipFile(buffer)
for file in z.namelist():
    file_info = z.getinfo(file)
    s3.meta.client.upload_fileobj(
        z.open(file),
        Bucket=s3Bucket,
        Key=file,
        ExtraArgs={'ServerSideEncryption':'aws:kms','SSEKMSKeyId':'alias/<alias_name>'})

Reference - https://github.com/vhvinod/ftp-to-s3/blob/master/extract-s3-to-s3.py

Policeman answered 9/6, 2020 at 15:32 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.