How to retrieve the most recent file in cloud storage bucket?
Asked Answered
B

5

16

Is this something that can be done with gsutil?

https://cloud.google.com/storage/docs/gsutil/commands/ls does not seem to mention any sorting functionality - only filtering by a date - which wouldn't work for my use case.

Burgonet answered 18/8, 2017 at 16:3 Comment(3)
Possible duplicate of Google Cloud Storage: How to get list of new files in bucket/folder using gsutilSupposing
That is asking for selecting by a certain date. That seems to be filtering functionality.Burgonet
Where is filter by date?Optime
E
10

Hello this still doesn't seems to exists, but there is a solution in this post: enter link description here

The command used is this one:

gsutil ls -l gs://[bucket-name]/ | sort -k 2

As it allow you to filter by date you can get the most recent result in the bucket and recuperating the last line using another pipe if you need.

Earnest answered 16/1, 2019 at 14:10 Comment(1)
Just to note, per GCS docs, "There is no guarantee that generation numbers increase for successive versions, only that each new version has a unique generation number"Twink
S
3
gsutil ls -l gs://<bucket-name> | sort -k 2 | tail -n 2 | head -1 | cut -d ' ' -f 7

It will not work well if there is less then two objects in the bucket though

Sakai answered 27/5, 2019 at 17:36 Comment(0)
H
1

By using gsutil from a host machine this will populate the response array:

response=(`gsutil ls -l gs://some-bucket-name|sort -k 2|tail -2|head -1`)

Or by gsutil from docker container:

response=(`docker run --name some-container-name --rm --volumes-from gcloud-config -it google/cloud-sdk:latest gsutil ls -l gs://some-bucket-name|sort -k 2|tail -2|head -1`)

Afterwards, to get the whole response, run:

echo ${response[@]}

will print for example:

33 2021-08-11T09:24:55Z gs://some-bucket-name/filename-37.txt

Or to get separate info from the response, (e.g. filename)

echo ${response[2]}

will print the filename only

gs://some-bucket-name/filename-37.txt
Haswell answered 12/8, 2021 at 13:16 Comment(0)
C
0

For my use case, I wanted to find the most recent directory in my bucket. I number them in ascending order (with leading zeros), so all I need to get the most recent one is this:

gsutil ls -l gs://[bucket-name] | sort | tail -n 1 | cut -d '/' -f 4
  1. list the directory
  2. sort alphabetically (probably unnecessary)
  3. take the last line
  4. tokenise it with "/" delimiter
  5. get the 4th token, which is the directory name
Campfire answered 18/8, 2021 at 7:18 Comment(5)
Read this link regarding sequentially naming objects: cloud.google.com/storage/docs/best-practices#naming Avoid using sequential object names such as timestamp-based object names if you are uploading many objects in parallel. Objects with sequential names are stored consecutively, so they are likely to hit the same backend server. When this happens, throughput is constrained. In order to achieve optimal throughput, add the hash of the sequence number as part of the object name to make it non-sequential.Roentgen
I've been doing it this way for years with no issues... I have root folders 0001 0002 0003 0004 etc; each of those is limited to 75GB in size; when it fills, I move on to the next one. The filenames WITHIN the folders, are md5 hashes of the file contents, so maybe that's suitable given the wording above?Campfire
Cloud Storage does not have folders. What you think is a folder is just a prefix that is part of the object name. Buckets are a flat namespace. Unless you need optimum performance, this probably does not matter for you. For customers that require high performance for millions/billions of objects: Objects with sequential names are stored consecutively, so they are likely to hit the same backend server. I commented on your answer so that others do not copy your naming scheme without understanding the impact on performance.Roentgen
I know that, but I'm using this as a backup of my server. I should have clarified that I meant that's my file structure on the server.Campfire
I am not trying to inform you. I commenting for future readers of your answer.Roentgen
O
0

Most users will want to retrieve objects recursively, in which case you need to use -r and /** syntax, e.g.

gsutil ls -r -l 'gs://[bucket-name]/**' | sort -k 2
Outstare answered 12/2, 2024 at 16:51 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.