Is it possible to mass rename objects on Google Cloud Storage using gsutil (or some other tool)? I am trying to figure out a way to rename a bunch of images from *.JPG to *.jpg.
Here is a native way to do this in bash with an explanation below, line by line of the code:
gsutil ls gs://bucket_name/*.JPG > src-rename-list.txt
sed 's/\.JPG/\.jpg/g' src-rename-list.txt > dest-rename-list.txt
paste -d ' ' src-rename-list.txt dest-rename-list.txt | sed -e 's/^/gsutil\ mv\ /' | while read line; do bash -c "$line"; done
rm src-rename-list.txt; rm dest-rename-list.txt
The solution pushes 2 lists, one for the source and one for the destination file (to be used in the "gsutil mv" command):
gsutil ls gs://bucket_name/*.JPG > src-rename-list.txt
sed 's/\.JPG/\.jpg/g' src-rename-list.txt > dest-rename-list.txt
The line "gsutil mv " and the two files are concatenated line by line using the below code:
paste -d ' ' src-rename-list.txt dest-rename-list.txt | sed -e 's/^/gsutil\ mv\ /'
This then runs each line in a while loop: while read line; do bash -c "$line"; done
Lastly, clean up and delete the files created:
rm src-rename-list.txt; rm dest-rename-list.txt
The above has been tested against a working Google Storage bucket.
https://cloud.google.com/storage/docs/gsutil/addlhelp/WildcardNames
gsutil supports URI wildcards
EDIT
As part of the bucket sub-directory support we changed the * wildcard to match only up to directory boundaries, and introduced the new ** wildcard...
Do you have directories under bucket? if so, maybe you need to go down to each directories or use **.
gsutil -m mv gs://my_bucket/**.JPG gs://my_bucket/**.jpg
or
gsutil -m mv gs://my_bucket/mydir/*.JPG gs://my_bucket/mydir/*.jpg
EDIT
gsutil doesn't support wildcard for destination so far (as of 4/12/'14)
nether API.
so at this moment you need to retrieve list of all JPG files, and rename each files.
python example:
import subprocess
files = subprocess.check_output("gsutil ls gs://my_bucket/*.JPG",shell=True)
files = files.split("\n")[:-1]
for f in files:
subprocess.call("gsutil mv %s %s"%(f,f[:-3]+"jpg"),shell=True)
please note that this would take hours.
gsutil does not support parallelized and mass-copy/rename.
You have two options:
- use a dataflow process to do the operation or
- use GNU parallel to launch it using several processes
If you use GNU Parallel, it is better to deploy a new instance to do the mass copy/rename operation:
- First: - Make a list of files you want to copy/rename (a file with source and destination separated by a space or tab), like this:
gs://origin_bucket/path/file gs://dest_bucket/new_path/new_filename
- Second: Launch a new compute instance
- Third: Login in that instance and install Gnu parallel
sudo apt install parallel
- Third: authorize yourself with google (gcloud auth login) because the service account for compute might not have permissions to move/rename the files.
gcloud auth login
- Make the copy (gsutil cp) or move (gsutil mv) operation with parallel:
parallel -j 20 --colsep ' ' gsutil mv {1} {2} :::: file_with_source_destination_uris.txt
This will make 20 parallel runs of the gsutil cp operation.
parallel
utility! Thanks –
Mammoth Yes, it is possible:
© 2022 - 2024 — McMap. All rights reserved.