Bulk file restore from Google Cloud Storage
Asked Answered
C

1

6

Accidentally run delete command on wrong bucket, object versioning is turned on, but I don't really understand what steps should I take in order to restore files, or what's more important, how to do it in bulk as I've deleted few hundreds of them.

Will appreciate any help.

Crellen answered 7/4, 2017 at 12:51 Comment(0)
B
9

To restore hundreds of objects you could do something as simple as:

gsutil cp -AR gs://my-bucket gs://my-bucket

This will copy all objects (including deleted ones) to the live generation, using metadata-only copying, i.e., not require copying the actual bytes. Caveats:

  1. It will leave the deleted generations in place, so costing you extra storage.

  2. If your bucket isn't empty this command will re-copy any live objects on top of themselves (ending up with an extra archived version of each of those as well, also costing you for extra storage).

  3. If you want to restore a large number of objects this simplistic script would run too slowly - you'd want to parallelize the individual gsutil cp operations. You can't use the gsutil -m option in this case, because gsutil prevents that, in order to preserve generation ordering (e.g., if there were several generations of objects with the same name, parallel copying them would end up with the live generation coming from an unpredictable generation). If you only have 1 generation of each you could parallelize the copying by doing something like:

    gsutil ls -a gs://my-bucket/** | sed 's/\(.\)\(#[0-9]\)/gsutil cp \1\2 \1 \&/' > gsutil_script.sh

This generates a listing of all objects (including deleted ones), and transforms it into a sequence of gsutil cp commands to copy those objects (by generation-specific name) back to the live generation in parallel. If the list is long you'll want to break in into parts so you don't (for example) try to fork 100k processes to do the parallel copying (which would overload your machine).

Boogie answered 7/4, 2017 at 16:29 Comment(5)
Thanks Mike for in-depth answer, wish documentation was that precise!Crellen
Acctually when I try to use you command for parallel coping there's sed: -e expression #1, char 35: invalid reference \2 on `s' command's RHS error.Crellen
Sorry about that - the backslash characters in the command got swallowed by github formatting, so I had to escape them. I updated the command to fix this - please try it again.Boogie
One other problem with your cp -AR command is that it will put all the versioned objects inside a "folder" called my-bucket hence you will have all of your objects placed in gs://my-bucket/my-bucketExaggerate
You can use gsutil cp -AR gs://my-bucket/* gs://my-bucket to avoid creation of folder named "my-bucket".Pastiche

© 2022 - 2024 — McMap. All rights reserved.