Bulk file restore from Google Cloud Storage

To restore hundreds of objects you could do something as simple as:

gsutil cp -AR gs://my-bucket gs://my-bucket

This will copy all objects (including deleted ones) to the live generation, using metadata-only copying, i.e., not require copying the actual bytes. Caveats:

It will leave the deleted generations in place, so costing you extra storage.
If your bucket isn't empty this command will re-copy any live objects on top of themselves (ending up with an extra archived version of each of those as well, also costing you for extra storage).
If you want to restore a large number of objects this simplistic script would run too slowly - you'd want to parallelize the individual gsutil cp operations. You can't use the gsutil -m option in this case, because gsutil prevents that, in order to preserve generation ordering (e.g., if there were several generations of objects with the same name, parallel copying them would end up with the live generation coming from an unpredictable generation). If you only have 1 generation of each you could parallelize the copying by doing something like:

gsutil ls -a gs://my-bucket/** | sed 's/\(.\)\(#[0-9]\)/gsutil cp \1\2 \1 \&/' > gsutil_script.sh

This generates a listing of all objects (including deleted ones), and transforms it into a sequence of gsutil cp commands to copy those objects (by generation-specific name) back to the live generation in parallel. If the list is long you'll want to break in into parts so you don't (for example) try to fork 100k processes to do the parallel copying (which would overload your machine).

Recommended topics

Hot tags