How to delete multiple files in gcs except 1 using gsutil
Asked Answered
C

3

5

I currently have this: gsutil ls gs://basty/*_TZ001.*

gs://basty/20201007_TZ001.csv
gs://basty/20201008_TZ001.csv
gs://basty/20201009_TZ001.csv

My problem is that I have bcuket with many files I want to delete all except 1 (20201009_TZ001.csv)

I thought using bash or python I don't know.

Conscionable answered 21/10, 2020 at 19:5 Comment(0)
R
9

You can filter results with grep (using -v flag to invert results) and the pipe with xargs

gsutil ls gs://basty/*_TZ001.* |\
  grep -v 20201009_TZ001.csv |\
  xargs -i{} gsutil rm {}

To be sure that is precisely what you want, you could first execute a dry-run command:

gsutil ls gs://basty/*_TZ001.* |\
  grep -v 20201009_TZ001.csv |\
  xargs -i{} echo "Will delete: " {}
Rigamarole answered 21/10, 2020 at 19:8 Comment(1)
I get illegal option -- i on MacOS while testing it. It works if the xargs flag is changed to -ISectionalism
C
5

I have another solution. This one is based on temporary holds. I recommend you create a new folder inside the bucket. Then, add to this one a temporary hold like this:

gsutil -m  retention temp set gs://BUCKETNAME/FOLDER/

Then, add all the files that you don't want to delete to this Folder .

Execute the command :

gsutil rm gs://BUCKET/*

You will see how all the files will be erased skipping the FOLDER.

Finally, remove the hold:

gsutil -m  retention temp release  gs://BUCKETNAME/FOLDER/
Corelli answered 21/10, 2020 at 22:31 Comment(0)
C
1

There is one more solution

gsutil ls -r gs://basty/*_TZ001.* | grep -v 20201009_TZ001.csv | gsutil -m rm -I
  • it works very fast as it removes files concurrently and not one by one (-m option in the command)
  • it doesn't use holds which make the whole operation slower
  • it works for files at any level in the bucket file structure if you replace gs://basty/*_TZ001.* with gs://basty/**
Coronagraph answered 4/11, 2021 at 8:26 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.