How do I unzip a .zip file in google cloud storage?

Asked 28/3, 2018 at 17:46 Answered 15/9, 2023 at 10:32

google-cloud-platform google-cloud-storage gzip unzip

How do I unzip a .zip file in Goolge Cloud Storage Bucket? (If we have some other tool like 'CloudBerry Explorer' for AWS, that will be great.)

Stellite answered 28/3, 2018 at 17:46 Comment(1)

Did you ever find a good solution to this? – Karwan 21/6, 2018 at 13:57

You can use Python, e.g. from a Cloud Function:

    from google.cloud import storage
    from zipfile import ZipFile
    from zipfile import is_zipfile
    import io

    def zipextract(bucketname, zipfilename_with_path):

        storage_client = storage.Client()
        bucket = storage_client.get_bucket(bucketname)

        destination_blob_pathname = zipfilename_with_path
        
        blob = bucket.blob(destination_blob_pathname)
        zipbytes = io.BytesIO(blob.download_as_string())

        if is_zipfile(zipbytes):
            with ZipFile(zipbytes, 'r') as myzip:
                for contentfilename in myzip.namelist():
                    contentfile = myzip.read(contentfilename)
                    blob = bucket.blob(zipfilename_with_path + "/" + contentfilename)
                    blob.upload_from_string(contentfile)

    zipextract("mybucket", "path/file.zip") # if the file is gs://mybucket/path/file.zip

Windowpane answered 26/1, 2019 at 13:7 Comment(4)

Hi Daniel, Code is not working for me, Not able to go inside if condition. – Kaikaia 18/9, 2019 at 14:2

copy pasted this in a colab notebook, just needed to add auth and specify the project and it worked like a charm – Isleen 24/2, 2020 at 21:4

Hey I have been trying with my bucket gs://frames_final/frames_final_cat.zip and the function zipextract("frames_final", "frames_final_cat.zip") but am getting an error that project was not passed to environment. – Unbrace 24/8, 2020 at 13:59

This approach downloads the entire zip file from GCS to memory, so I think it could not scale for big files – Megavolt 11/1 at 16:43

Here is some code I created to run as a Firebase Cloud Function. It is designed to listen to files loaded into a bucket with the content-type 'application/zip' and extract them in place.

    const functions = require('firebase-functions');
    const admin = require("firebase-admin");
    const path = require('path');
    const fs = require('fs');
    const os = require('os');
    const unzip = require('unzipper')

    admin.initializeApp();

    const storage = admin.storage();


    const runtimeOpts = {
      timeoutSeconds: 540,
      memory: '2GB'
    }

    exports.unzip = functions.runWith(runtimeOpts).storage.object().onFinalize((object) => {

        return new Promise((resolve, reject) => {
            //console.log(object)
            if (object.contentType !== 'application/zip') {
              reject();
            } else {
              const bucket = firebase.storage.bucket(object.bucket)
              const remoteFile = bucket.file(object.name)
              const remoteDir = object.name.replace('.zip', '')

              console.log(`Downloading ${remoteFile}`)

              remoteFile.createReadStream()
                .on('error', err => {
                  console.error(err)
                  reject(err);
                })
                .on('response', response => {
                  // Server connected and responded with the specified status and headers.
                  //console.log(response)
                })
                .on('end', () => {
                  // The file is fully downloaded.
                  console.log("Finished downloading.")
                  resolve();
                })
                .pipe(unzip.Parse())
                .on('entry', entry => {
                  const file = bucket.file(`${remoteDir}/${entry.path}`)

                  entry.pipe(file.createWriteStream())
                  .on('error', err => {
                    console.log(err)
                    reject(err);
                  })
                  .on('finish', () => {
                    console.log(`Finsihed extracting ${remoteDir}/${entry.path}`)
                  });

                  entry.autodrain();

                });
            }
        })

    });

Selhorst answered 23/8, 2018 at 5:28 Comment(2)

Thank you for your example, however I have found a serious flaw with it: You should only call entry.autodrain() if you don't consume the stream. Otherwise you will have corrupted output files. I did, until I changed the code. – Smoothen 6/3, 2019 at 14:46

I can add that you should replace firebase.storage.bucket(object.bucket) with admin.storage.bucket() as firebase is not defined – Braw 23/12, 2019 at 19:40

In shell, you can use the below command to unzip a compressed file

gsutil cat gs://bucket/obj.csv.gz | zcat |  gsutil cp - gs://bucket/obj.csv

Vacuole answered 27/8, 2019 at 9:29 Comment(6)

Hi @Dishant, how can i use zcat in Google Cloud SDK tool? 'zcat' is not recognized as an internal or external command – Cota 28/8, 2019 at 8:34

@AbdurrahmanI.As of now there no functionality available in Google Cloud SDK to unzip a file. The easiest way I found, is to use this shell command – Vacuole 28/8, 2019 at 13:40

@DishantMishra If more than one files present inside zip file then How we can unzip all(we are giving one name-file.csv in destination location) like: gsutil cat gs://bucket/csv_files.zip | unzip | gsutil cp - gs://bucket/one_file.csv. It works but we can't see multiple files. which is present inside csv_files.zip – Kaikaia 5/9, 2019 at 13:19

@BhageshArora The command will not work for the multiple files zipped together as it is redirecting the output to a single file. You may have to download the file, unzip it and copy the extracted files to GCS. You can use the below command (if JDK is installed) to directly download and unzip the files. gsutil cat gs://bucket/obj.zip | jar xvf /dev/stdin – Vacuole 13/9, 2019 at 9:20

I struggled with this for making multiple process and found a simple solution. You might interest. gsutil cat gs://bucket/*/*/*/*.gz | zcat | gsutil cp - gs://bucket/decompressed/result.json – Cota 7/1, 2020 at 12:59

this is streaming unzip,upload ? or first download the entire file first , then apply unzip finally upload back to gcs? – Tunnel 28/12, 2022 at 3:1

If you ended up having a zip file on your Google Cloud Storage bucket because you had to move large files from another server with the gsutil cp command, you could instead gzip when copying and it will be transferred in compressed format and unzippet when arriving to the bucket.

It is built in gsutil cp by using the -Z argument.

E.g.

gsutil cp -Z largefile.txt gs://bucket/largefile.txt

Capitulation answered 29/11, 2019 at 9:14 Comment(2)

If I understand OP's question correctly, they are asking for a tool that will upload a zipped file and then unzip it in the bucket. gsutil cp -Z doesn't do that. It zips the file and leaves it zipped in the bucket. Source: cloud.google.com/storage/docs/gsutil/commands/… – Big 13/3, 2022 at 9:3

Confirmed that this does not address the original question and it does not do the behavior described in the answer. – Pleasant 25/3, 2022 at 18:23

There is no mechanism in the GCS to unzip the files. A feature request regarding the same has already been forwarded to the Google development team.

As an alternative, you can upload the ZIP files to the GCS bucket and then download them to a persistent disk attached to a VM instance, unzip them there, and upload the unzipped files using the gsutil tool.

Umpteen answered 28/3, 2018 at 20:34 Comment(3)

How do I unzip the zip files on the persistent disk attached to a VM instance? – Rosenberger 22/9, 2018 at 9:55

Won't the VM instance have file size limitations? – Fanti 17/12, 2019 at 0:0

The issue was marked as blocked, and some time later unblocked and then closed. Here's another feature request. – Featherhead 14/11, 2022 at 11:49

There are Data flow templates in google Cloud data flow which helps to Zip/unzip the files in cloud storage.Refer below screenshots.

This template stages a batch pipeline that decompresses files on Cloud Storage to a specified location. This functionality is useful when you want to use compressed data to minimize network bandwidth costs. The pipeline automatically handles multiple compression modes during a single execution and determines the decompression mode to use based on the file extension (.bzip2, .deflate, .gz, .zip).

Pipeline requirements

The files to decompress must be in one of the following formats: Bzip2, Deflate, Gzip, Zip.

The output directory must exist prior to pipeline execution.

Azaleah answered 28/9, 2019 at 18:20 Comment(1)

Only works for a single folder not when there are multiple directories within it – Unbrace 24/8, 2020 at 13:15

Enable Dataflow API in your gcloud console
Create a temp dir in your bucket (cant use root).
Replace YOUR_REGION (e.g. europe-west6) and YOUR_BUCKET in the below command and run it with gcloud cli (presumption is gz file is at root - change if not):

gcloud dataflow jobs run unzip \
--gcs-location gs://dataflow-templates-YOUR_REGION/latest/Bulk_Decompress_GCS_Files \
--region YOUR_REGION \
--num-workers 1 \
--staging-location gs://YOUR_BUCKET/temp \
--parameters inputFilePattern=gs://YOUR_BUCKET/*.gz,outputDirectory=gs://YOUR_BUCKET/,outputFailureFile=gs://YOUR_BUCKET/decomperror.txt

Bremer answered 23/1, 2021 at 21:58 Comment(1)

Careful: this combines all files within an archive into one output file. – Kamilahkamillah 26/5, 2021 at 3:18

I'm afraid that by default in Google Cloud no program could do this..., but you can have this functionality, for example, using Python.

Universal method available on any machine where Python is installed (so also on Google Cloud):

You need to enter the following commands:

python

or if you need administrator rights:

sudo python

and then in the Python Interpreter:

>>> from zipfile import ZipFile
>>> zip_file = ZipFile('path_to_file/t.zip', 'r')
>>> zip_file.extractall('path_to_extract_folder')

and finally, press Ctrl+D to exit the Python Interpreter.

The unpacked files will be located in the location you specify (of course, if you had the appropriate permissions for these locations).

The above method works identically for Python 2 and Python 3.

Enjoy it to the fullest! :)

Regina answered 13/9, 2018 at 16:52 Comment(5)

I don't understand why this answer gets negative points. I tried to make it as useful as possible, correct and unique compared to others answers when added this one. If someone in the comment would write me what is wrong with it, I would be grateful. Thank you! – Regina 12/12, 2019 at 8:55

I tested your solution first as it seems easier than the accepted one. It failed for me at the second line when I specified a path gs://mybucket/myfolder/myfile.zip . I don't think ZipFile can access files from the bucket directly – Isleen 24/2, 2020 at 21:3

Thank you for your comment @SylvainGantois. Indeed, my solution assumes that we have access to the zip package within the usual folder structure. I hope that it will be useful to some users at least, just as it was for me. – Regina 25/2, 2020 at 10:59

The point of the question was to unzip from and to Google Storage, not unzipping on a local file system. On a VM with files on a local file system you might as well just use unzip. – Langtry 2/6, 2020 at 17:39

just to comment there is a package gcsfs which enables one to reference a gcs bucket file the same way you would the normal file system. – Phlyctena 19/5, 2022 at 18:51

You can create a Google Cloud Function with a Cloud Storage trigger.

When a new object is created, the function will be triggered.

const functions = require("@google-cloud/functions-framework");
const {Storage} = require("@google-cloud/storage");
const unzip = require("unzip-stream");

functions.cloudEvent("gcs-unzip", async cloudEvent => {

    //console.log(JSON.stringify(cloudEvent, null, 4));

    const zipFile = cloudEvent.data;

    //console.log(JSON.stringify(file, null, 4));

    if (zipFile.contentType === "application/zip") {
        const storage = new Storage();

        async function unzipAndUploadContainedFiles() {
            await storage
                .bucket(zipFile.bucket)
                .file(zipFile.name)
                .createReadStream()
                .pipe(unzip.Parse())
                .on("entry", async function (entry) { //there could be multiple files and even a directory structure in the zip file
                    //console.log(JSON.stringify(entry, null, 4));

                    const gcsTargetFileName = zipFile.name.replace(".zip", "") + "/" + entry.path;
                    if (entry.type === "File") {
                        entry.pipe(storage.bucket(zipFile.bucket).file(gcsTargetFileName).createWriteStream());
                    }
                });
        }

        await unzipAndUploadContainedFiles().catch(err => {
            console.error(err);
        });

    } else {
        console.log("Non-zip file ignored.");
    }

});

Disinfect answered 9/3, 2023 at 12:24 Comment(0)

Another fast way to do it using Python in version 3.2 or higher:

import shutil
shutil.unpack_archive('filename')

The method also allows you to indicate the destination folder:

shutil.unpack_archive('filename', 'extract_dir')

The above method works not only for zip archives, but also for tar, gztar, bztar, or xztar archives.

If you need more options look into documentation of shutil module: shutil.unpack_archive

Regina answered 21/3, 2019 at 16:17 Comment(2)

just tested it. works smoothly. not sure why it was getting negative votes. best solution in this thread according to me – Fever 30/1, 2021 at 13:32

@Fever Because it doesn't work for the google cloud files in the question. – Symbolist 30/11, 2022 at 21:16

To unzip all files within a zip I used this one-liner on the cloud shell terminal:

gsutil cat gs://{bucket_name}/data.zip | for i in $(jar --list); do gsutil cat gs://{bucket_name}/data.zip | jar x $i && cat $i | gsutil cp - gs://{bucket_name}/unzipped/$i && rm ./$i; done;

Amortize answered 3/5, 2023 at 7:30 Comment(0)

Here is a python implementation with automated cloud function extractor

This method will automatically unpack any zipped file you upload your to a specified bucket.

@functions_framework.cloud_event
def unzip_zipped_files(cloud_event: CloudEvent) -> None:
    """Unzip zipped files from the a Google bucket.

    Args:
        cloud_event: An event received when a zipped file
            is uploaded to a specified bucket.
    """
    # get the event
    data = cloud_event.data

    bucket_name = data["bucket"]
    zipped_file = data["name"]
    bucket = storage.Client(project="<your-project-name-here>").bucket(bucket_name)
    
    # get the zipped blob using the path of the zipped file
    zipped_blob = bucket.blob(zipped_file)

    temp_file = tempfile.NamedTemporaryFile(
        delete=False, suffix="." + zipped_blob.name.split(".")[-1]
    )
    # Cloud functions offer some storage to perform downloads to
    zipped_blob.download_to_filename(temp_file.name)
    temp_file.close()
    
    # Now we have the local path to the zipped file (in the cloud funciton)
    zipped_blob_path = temp_file.name

    # Unpack
    shutil.unpack_archive(
        filename=zipped_blob_path, 
        extract_dir="<dir-to-unpack-things-to>",
    )
    
    # If you have multiple files within a single zip file, get the list of files.
    unzipped_blobs_paths = os.listdir("<dir-to-unpack-things-to>")

    if unzipped_blobs_paths == []:
        raise FileNotFoundError(f"No unzipped files found in {zipped_file}")

    # upload them to another bucket (in the same project) 
    bucket = storage.Client(project="<your-project-here>").bucket("Abulubad-unzipped")
    for blob_path in unzipped_blobs_paths:
        blob = bucket.blob(blob_path)
        blob.upload_from_filename(blob_path)

After setting this function in your directory, all you have to do after that is follow some steps in the Deploying the function section using the Cloud function docs and you will have the cloud function up and running.

let me know if you face any issues :)

Boldface answered 15/9, 2023 at 10:32 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Universal method available on any machine where Python is installed (so also on Google Cloud):

Here is a python implementation with automated cloud function extractor

Recommended topics

Hot tags