How do I unzip a .zip file in Goolge Cloud Storage Bucket? (If we have some other tool like 'CloudBerry Explorer' for AWS, that will be great.)
You can use Python, e.g. from a Cloud Function:
from google.cloud import storage
from zipfile import ZipFile
from zipfile import is_zipfile
import io
def zipextract(bucketname, zipfilename_with_path):
storage_client = storage.Client()
bucket = storage_client.get_bucket(bucketname)
destination_blob_pathname = zipfilename_with_path
blob = bucket.blob(destination_blob_pathname)
zipbytes = io.BytesIO(blob.download_as_string())
if is_zipfile(zipbytes):
with ZipFile(zipbytes, 'r') as myzip:
for contentfilename in myzip.namelist():
contentfile = myzip.read(contentfilename)
blob = bucket.blob(zipfilename_with_path + "/" + contentfilename)
blob.upload_from_string(contentfile)
zipextract("mybucket", "path/file.zip") # if the file is gs://mybucket/path/file.zip
Here is some code I created to run as a Firebase Cloud Function. It is designed to listen to files loaded into a bucket with the content-type 'application/zip' and extract them in place.
const functions = require('firebase-functions');
const admin = require("firebase-admin");
const path = require('path');
const fs = require('fs');
const os = require('os');
const unzip = require('unzipper')
admin.initializeApp();
const storage = admin.storage();
const runtimeOpts = {
timeoutSeconds: 540,
memory: '2GB'
}
exports.unzip = functions.runWith(runtimeOpts).storage.object().onFinalize((object) => {
return new Promise((resolve, reject) => {
//console.log(object)
if (object.contentType !== 'application/zip') {
reject();
} else {
const bucket = firebase.storage.bucket(object.bucket)
const remoteFile = bucket.file(object.name)
const remoteDir = object.name.replace('.zip', '')
console.log(`Downloading ${remoteFile}`)
remoteFile.createReadStream()
.on('error', err => {
console.error(err)
reject(err);
})
.on('response', response => {
// Server connected and responded with the specified status and headers.
//console.log(response)
})
.on('end', () => {
// The file is fully downloaded.
console.log("Finished downloading.")
resolve();
})
.pipe(unzip.Parse())
.on('entry', entry => {
const file = bucket.file(`${remoteDir}/${entry.path}`)
entry.pipe(file.createWriteStream())
.on('error', err => {
console.log(err)
reject(err);
})
.on('finish', () => {
console.log(`Finsihed extracting ${remoteDir}/${entry.path}`)
});
entry.autodrain();
});
}
})
});
In shell, you can use the below command to unzip a compressed file
gsutil cat gs://bucket/obj.csv.gz | zcat | gsutil cp - gs://bucket/obj.csv
'zcat' is not recognized as an internal or external command
–
Cota gsutil cat gs://bucket/obj.zip | jar xvf /dev/stdin
–
Vacuole gsutil cat gs://bucket/*/*/*/*.gz | zcat | gsutil cp - gs://bucket/decompressed/result.json
–
Cota If you ended up having a zip file on your Google Cloud Storage bucket because you had to move large files from another server with the gsutil cp
command, you could instead gzip when copying and it will be transferred in compressed format and unzippet when arriving to the bucket.
It is built in gsutil cp by using the -Z argument.
E.g.
gsutil cp -Z largefile.txt gs://bucket/largefile.txt
gsutil cp -Z
doesn't do that. It zips the file and leaves it zipped in the bucket. Source: cloud.google.com/storage/docs/gsutil/commands/… –
Big There is no mechanism in the GCS to unzip the files. A feature request regarding the same has already been forwarded to the Google development team.
As an alternative, you can upload the ZIP files to the GCS bucket and then download them to a persistent disk attached to a VM instance, unzip them there, and upload the unzipped files using the gsutil tool.
There are Data flow templates in google Cloud data flow which helps to Zip/unzip the files in cloud storage.Refer below screenshots.
This template stages a batch pipeline that decompresses files on Cloud Storage to a specified location. This functionality is useful when you want to use compressed data to minimize network bandwidth costs. The pipeline automatically handles multiple compression modes during a single execution and determines the decompression mode to use based on the file extension (.bzip2, .deflate, .gz, .zip).
Pipeline requirements
The files to decompress must be in one of the following formats: Bzip2, Deflate, Gzip, Zip.
The output directory must exist prior to pipeline execution.
- Enable Dataflow API in your gcloud console
- Create a
temp
dir in your bucket (cant use root). - Replace
YOUR_REGION
(e.g.europe-west6
) andYOUR_BUCKET
in the below command and run it with gcloud cli (presumption isgz
file is at root - change if not):
gcloud dataflow jobs run unzip \
--gcs-location gs://dataflow-templates-YOUR_REGION/latest/Bulk_Decompress_GCS_Files \
--region YOUR_REGION \
--num-workers 1 \
--staging-location gs://YOUR_BUCKET/temp \
--parameters inputFilePattern=gs://YOUR_BUCKET/*.gz,outputDirectory=gs://YOUR_BUCKET/,outputFailureFile=gs://YOUR_BUCKET/decomperror.txt
I'm afraid that by default in Google Cloud no program could do this..., but you can have this functionality, for example, using Python.
Universal method available on any machine where Python is installed (so also on Google Cloud):
You need to enter the following commands:
python
or if you need administrator rights:
sudo python
and then in the Python Interpreter:
>>> from zipfile import ZipFile
>>> zip_file = ZipFile('path_to_file/t.zip', 'r')
>>> zip_file.extractall('path_to_extract_folder')
and finally, press Ctrl+D to exit the Python Interpreter.
The unpacked files will be located in the location you specify (of course, if you had the appropriate permissions for these locations).
The above method works identically for Python 2 and Python 3.
Enjoy it to the fullest! :)
You can create a Google Cloud Function with a Cloud Storage trigger.
When a new object is created, the function will be triggered.
const functions = require("@google-cloud/functions-framework");
const {Storage} = require("@google-cloud/storage");
const unzip = require("unzip-stream");
functions.cloudEvent("gcs-unzip", async cloudEvent => {
//console.log(JSON.stringify(cloudEvent, null, 4));
const zipFile = cloudEvent.data;
//console.log(JSON.stringify(file, null, 4));
if (zipFile.contentType === "application/zip") {
const storage = new Storage();
async function unzipAndUploadContainedFiles() {
await storage
.bucket(zipFile.bucket)
.file(zipFile.name)
.createReadStream()
.pipe(unzip.Parse())
.on("entry", async function (entry) { //there could be multiple files and even a directory structure in the zip file
//console.log(JSON.stringify(entry, null, 4));
const gcsTargetFileName = zipFile.name.replace(".zip", "") + "/" + entry.path;
if (entry.type === "File") {
entry.pipe(storage.bucket(zipFile.bucket).file(gcsTargetFileName).createWriteStream());
}
});
}
await unzipAndUploadContainedFiles().catch(err => {
console.error(err);
});
} else {
console.log("Non-zip file ignored.");
}
});
Another fast way to do it using Python in version 3.2 or higher:
import shutil
shutil.unpack_archive('filename')
The method also allows you to indicate the destination folder:
shutil.unpack_archive('filename', 'extract_dir')
The above method works not only for zip archives, but also for tar, gztar, bztar, or xztar archives.
If you need more options look into documentation of shutil
module: shutil.unpack_archive
To unzip all files within a zip I used this one-liner on the cloud shell terminal:
gsutil cat gs://{bucket_name}/data.zip | for i in $(jar --list); do gsutil cat gs://{bucket_name}/data.zip | jar x $i && cat $i | gsutil cp - gs://{bucket_name}/unzipped/$i && rm ./$i; done;
Here is a python implementation with automated cloud function extractor
This method will automatically unpack any zipped file you upload your to a specified bucket.
@functions_framework.cloud_event
def unzip_zipped_files(cloud_event: CloudEvent) -> None:
"""Unzip zipped files from the a Google bucket.
Args:
cloud_event: An event received when a zipped file
is uploaded to a specified bucket.
"""
# get the event
data = cloud_event.data
bucket_name = data["bucket"]
zipped_file = data["name"]
bucket = storage.Client(project="<your-project-name-here>").bucket(bucket_name)
# get the zipped blob using the path of the zipped file
zipped_blob = bucket.blob(zipped_file)
temp_file = tempfile.NamedTemporaryFile(
delete=False, suffix="." + zipped_blob.name.split(".")[-1]
)
# Cloud functions offer some storage to perform downloads to
zipped_blob.download_to_filename(temp_file.name)
temp_file.close()
# Now we have the local path to the zipped file (in the cloud funciton)
zipped_blob_path = temp_file.name
# Unpack
shutil.unpack_archive(
filename=zipped_blob_path,
extract_dir="<dir-to-unpack-things-to>",
)
# If you have multiple files within a single zip file, get the list of files.
unzipped_blobs_paths = os.listdir("<dir-to-unpack-things-to>")
if unzipped_blobs_paths == []:
raise FileNotFoundError(f"No unzipped files found in {zipped_file}")
# upload them to another bucket (in the same project)
bucket = storage.Client(project="<your-project-here>").bucket("Abulubad-unzipped")
for blob_path in unzipped_blobs_paths:
blob = bucket.blob(blob_path)
blob.upload_from_filename(blob_path)
After setting this function in your directory, all you have to do after that is follow some steps in the Deploying the function section using the Cloud function docs and you will have the cloud function up and running.
let me know if you face any issues :)
© 2022 - 2024 — McMap. All rights reserved.