gsutil rsync with gzip compression
Asked Answered
C

2

9

I'm hosting publicly available static resources in a google storage bucket, and I want to use the gsutil rsync command to sync our local version to the bucket, saving bandwidth and time. Part of our build process is to pre-gzip these resources, but gsutil rsync has no way to set the Content-Encoding header. This means we must run gsutil rsync, then immediately run gsutil setmeta to set headers on all the of gzipped file types. This leaves the bucket in a BAD state until that header is set. Another option is to use gsutil cp, passing the -z option, but this requires us to re-upload the entire directory structure every time, and this includes a LOT of image files and other non-gzipped resources that wastes time and bandwidth.

Is there an atomic way to accomplish the rsync and set proper Content-Encoding headers?

Centennial answered 1/7, 2015 at 19:32 Comment(1)
Fantastic - didn't know about the -z option to cp.Misread
O
5

Assuming you're starting with gzipped source files in source-dir you can do:

gsutil -h content-encoding:gzip rsync -r source-dir gs://your-bucket

Note: If you do this and then run rsync in the reverse direction it will decompress and copy all the objects back down:

gsutil rsync -r gs://your-bucket source-dir 

which may not be what you want to happen. Basically, the safest way to use rsync is to simply synchronize objects as-is between source and destination, and not try to set content encodings on the objects.

Orsa answered 1/7, 2015 at 19:56 Comment(3)
The issue there is only text based files are compressed, like CSS,js,html, etc.Centennial
Is there an atomic way to do this based on file extension? I don't really see a way to. What would be a good addition to gsutil rsync is to pass a list of file extensions that a header can be applied to during the rsync. For example, the only files that are normally gzip encoded are html, css,js,json,xml,svg,txt. This is from the apache config for deflate: httpd.apache.org/docs/current/mod/mod_deflate.htmlCentennial
We chose not to support on-the-fly compression with the rsync command because doing it correctly would require tracking the pre-compressed size and checksum(s) in the object's metadata, and could lead to confusing situations if clients try to do multi-source synchronization. Basically, if you want to compress on the fly you need to use the gsutil cp command.Orsa
B
3

I'm not completely answering the question but I came here as I was wondering the same thing trying to achieve the following:

how to deploy efficiently a static website to google cloud storage

I was able to find an optimized way for deploying my static web site from a local folder to a gs bucket

  • Split my local folder into 2 folders with the same hierarchy, one containing the content to be gzip (html,css,js...), the other the other files
  • Gzip each file in my gzip folder (in place)
  • Call gsutil rsync in for each folder to the same gs destination

Of course, it is only a one way synchronization and deleted local files are not deleted remotely

For the gzip folder the command is

gsutil -m -h Content-Encoding:gzip rsync -c -r src/gzip gs://dst

forcing the content encoding to be gzippped

For the other folder the command is

gsutil -m rsync -c -r src/none gs://dst

the -m option is used for parallel optimization. The -c option is needed to force using checksum validation (Why is gsutil rsync re-downloading all our files?) as I was touching each local file in my build process. the -r option is used for recursivity.

I even wrote a script for it (in dart): http://tekhoow.blogspot.fr/2016/10/deploying-static-website-efficiently-on.html

Bathhouse answered 12/10, 2016 at 10:39 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.