Mass rename objects on Google Cloud Storage
Asked Answered
D

4

11

Is it possible to mass rename objects on Google Cloud Storage using gsutil (or some other tool)? I am trying to figure out a way to rename a bunch of images from *.JPG to *.jpg.

Dosser answered 27/11, 2014 at 8:49 Comment(0)
G
15

Here is a native way to do this in bash with an explanation below, line by line of the code:

gsutil ls gs://bucket_name/*.JPG > src-rename-list.txt
sed 's/\.JPG/\.jpg/g' src-rename-list.txt > dest-rename-list.txt
paste -d ' ' src-rename-list.txt dest-rename-list.txt | sed -e 's/^/gsutil\ mv\ /' | while read line; do bash -c "$line"; done
rm src-rename-list.txt; rm dest-rename-list.txt

The solution pushes 2 lists, one for the source and one for the destination file (to be used in the "gsutil mv" command):

gsutil ls gs://bucket_name/*.JPG > src-rename-list.txt
sed 's/\.JPG/\.jpg/g' src-rename-list.txt > dest-rename-list.txt

The line "gsutil mv " and the two files are concatenated line by line using the below code:

paste -d ' ' src-rename-list.txt dest-rename-list.txt | sed -e 's/^/gsutil\ mv\ /'

This then runs each line in a while loop: while read line; do bash -c "$line"; done

Lastly, clean up and delete the files created:

rm src-rename-list.txt; rm dest-rename-list.txt

The above has been tested against a working Google Storage bucket.

Gregg answered 7/10, 2019 at 16:5 Comment(1)
liked the solution, only small comment, instead of: gsutil ls gs://bucket_name/*.JPG | sed 's/\.JPG/\.jpg/g' > dest-rename-list.txt simples to: cat src-rename-list.txt | sed 's/\.JPG/\.jpg/g' > dest-rename-list.txtTumor
G
4

https://cloud.google.com/storage/docs/gsutil/addlhelp/WildcardNames

gsutil supports URI wildcards

EDIT

gsutil 3.0 release note

As part of the bucket sub-directory support we changed the * wildcard to match only up to directory boundaries, and introduced the new ** wildcard...

Do you have directories under bucket? if so, maybe you need to go down to each directories or use **.

gsutil -m mv gs://my_bucket/**.JPG gs://my_bucket/**.jpg

or

gsutil -m mv gs://my_bucket/mydir/*.JPG gs://my_bucket/mydir/*.jpg

EDIT
gsutil doesn't support wildcard for destination so far (as of 4/12/'14)
nether API.

so at this moment you need to retrieve list of all JPG files, and rename each files.

python example:

import subprocess
files = subprocess.check_output("gsutil ls gs://my_bucket/*.JPG",shell=True)
files = files.split("\n")[:-1]
for f in files:
    subprocess.call("gsutil mv %s %s"%(f,f[:-3]+"jpg"),shell=True)

please note that this would take hours.

Gyrostatics answered 27/11, 2014 at 9:59 Comment(4)
Hmmm doesn't seem to work, comes up with "CommandException: Destination (gs://my_bucket/*.jpg) must match exactly 1 URL"Dosser
I edited :) sorry I don't have environment to test now, if not work I would test some other waysGyrostatics
Nah the directories are right. I think the issue is that it doesn't do wildcard replacement?Dosser
Indeed it doesn't work. It seems you have to do rename one by one after retrieving file list, even inefficient.. And GCS API doesn't seem to have copy/rename function..Gyrostatics
K
3

gsutil does not support parallelized and mass-copy/rename.

You have two options:

  • use a dataflow process to do the operation or
  • use GNU parallel to launch it using several processes

If you use GNU Parallel, it is better to deploy a new instance to do the mass copy/rename operation:

  • First: - Make a list of files you want to copy/rename (a file with source and destination separated by a space or tab), like this:
gs://origin_bucket/path/file gs://dest_bucket/new_path/new_filename
  • Second: Launch a new compute instance
  • Third: Login in that instance and install Gnu parallel
sudo apt install parallel
  • Third: authorize yourself with google (gcloud auth login) because the service account for compute might not have permissions to move/rename the files.
gcloud auth login
  • Make the copy (gsutil cp) or move (gsutil mv) operation with parallel:
   parallel -j 20 --colsep ' ' gsutil mv {1} {2} :::: file_with_source_destination_uris.txt

This will make 20 parallel runs of the gsutil cp operation.

Keef answered 15/3, 2020 at 12:2 Comment(1)
Love parallel utility! ThanksMammoth
R
-1

Yes, it is possible:

Move/rename objects and/or subdirectories

Rae answered 27/11, 2014 at 8:55 Comment(8)
Seen that but not quite sure how it helps me mass rename objects?Dosser
@Andrei but I can't use gsutil + subprocess on my gae projectSwill
You don't need gsutil in a GAE project. You can simply retrieve a list of objects in your code and rename them.Rae
@AndreiVolgin Currently I'm using gcs client library to copy,list,stat,delete files stored in gcs. I also need to implement renaming folders. That client lib don't have any methods for renaming.Swill
So, what I'm going to do is, 1.create a new folder. 2. Copy all the contents of old folder into the newer one. 3. Delete the old folder . But it seems a long process..Swill
Cloud storage objects are immutable, so renaming and replacing is the same.Rae
Not sure if you can help. I tried this method and it didn't find the file with the prefix, it seems like it's looking to exactly match the path I put as opposed to a file with that prefixPruter
Can you provide an example?Rae

© 2022 - 2024 — McMap. All rights reserved.