How to check if any given object exist in google cloud storage bucket through bash
Asked Answered
M

3

7

I would like to pragmatically check if object exist at a perticular google cloud storage bucket. Based on object availability i would perform further operations.

I have gone through https://cloud.google.com/storage/docs/gsutil/commands/stat and doc mentioned that "gsutil -q" useful for writing scripts, because the exit status will be 0 for an existing object and 1 for a non-existent object. But when i use command it does not work properly. Please let me know if anyone tried this before?

#!/bin/bash
gsutil -q stat gs://<bucketname>/object

return_value=$?

if [ $return_value != 0 ]; then
    echo "folder exist"
else
    echo "folder does not exist"
fi
Marginal answered 8/2, 2018 at 2:48 Comment(0)
B
5

I see that you already have found the answer to your issue, however, I will post this answer here in order to give more context on how the gsutil stat command works and why was your code not working.

gsutil is a Python application that is used for accessing and working with Cloud Storage using the Command Line Interface. It has many commands available, and the one that you used is gsutil stat, which outputs the metadata of an object retrieving the minimum possible data without having to list all the objects in a bucket. This command is also strongly consistent, which makes it an appropriate solution for certain types of applications.

Using the gsutil stat gs://<BUCKET_NAME>/<BUCKET_OBJECT> command, returns something like the following:

gs://<BUCKET_NAME>/<BUCKET_OBJECT>.png:
    Creation time:          Tue, 06 Feb 2018 14:49:58 GMT
    Update time:            Tue, 06 Feb 2018 14:49:58 GMT
    Storage class:          MULTI_REGIONAL
    Content-Length:         6119
    Content-Type:           image/png
    Hash (crc32c):          <CRC32C_HASH>
    Hash (md5):             <MD5_HASH>
    ETag:                   <ETAG>
    Generation:             <TIMESTAMP>
    Metageneration:         1

However, if you run it using the -q, it will return 0 if the object exists, or 1 if does not, which makes it interesting for writing scripts such as the one you shared.

Finally, there are some additional considerations that you have to consider when working with subdirectories inside a bucket:

  • A command such as gsutil -q stat gs://my_bucket/my_subdirectory will retrieve the data of an object called my_subdirectory, not of a directory itself.
  • A command such as gsutil -q stat gs://my_bucket/my_subdirectory/ will operate over the subdirectory itself, and not over the nested files, so it will just tell you whether the subdirectory exists or not (this is why your code was failing).
  • You have to use something like gsutil -q stat gs://my_bucket/my_subdirectory/my_nested_file.txt in order to retrieve the metadata of a file nested under a subdirectory.

So, in short, your issue was that you were not making a proper definition of paths. It is not that gsutil is too sensitive in understanding path, but this behavior is working as intended, because you may have the following situation, where you have a file and a folder with the same name, and you should be able to retrieve either of them, thus requiring to specify the / that indicates whether it is a directory or a file:

gs://my_bucket/
  |_ my_subdirectory        #This is a file
  |_ my_subdirectory/       #This is a folder
     |_ my_nested_file.txt  #This is a nested file
Blevins answered 13/2, 2018 at 9:22 Comment(1)
One additional bit: GCS has no concept of "directories", even though gsutil tries to create that illusion through using object name prefixes. gsutil stat operates only on full object names, not "directories"/prefixes. When evaluating each argument, gsutil actually removes the trailing slash (if present). So running gsutil stat against "gs://bucket/dir" and "gs://bucket/dir/" both check whether "gs://bucket/dir" exists. Further, gsutil stat gs://bucket/dir will return a failure code if that exact object does not exist, even if something with that prefix (e.g. gs://bucket/dir/obj) does.Garboil
S
3

You have the conditional check inverted: exit status 0 means success, i.e., the gsutil stat command found the given object.

Supplement answered 8/2, 2018 at 5:19 Comment(2)
@mike.schwartz - thanks for reply but i do not think that is issue as i already tested this by inverting conditional check. For example: I set $return_value = 0 and check if exiting object exist or not. Physically object exist in gcs bucket and my script after inverting condition , throw correct output say object exist BUT if I look for some other object that actually does not even exist in gcs bucket still it says object exist. So it seems like there is some issue with gsutil -q stat command in returning status value.Marginal
gsutil -q stat works for me: % gsutil mb gs://test-bucket-abc123 Creating gs://test-bucket-abc123/... % echo "hi" | gsutil cp - gs://test-bucket-abc123/hi Copying from <STDIN>... / [1 files][ 0.0 B/ 0.0 B] Operation completed over 1 objects. % gsutil -q stat gs://test-bucket-abc123/hi % return_value=$? % echo $return_value 0 % gsutil -q stat gs://test-bucket-abc123/non-existent % return_value=$? % echo $return_value 1Supplement
M
2

Issue is we should use / after object to ensure gsutil -q stat command recognize path properly. If i remove / then it does not work. I am surprise if google is so sensitive in understanding path.

#!/bin/bash
gsutil -q stat gs://<bucketname>/object/

return_value=$?

if [ $return_value = 0 ]; then
    echo "folder exist"
else
    echo "folder does not exist"
fi
Marginal answered 8/2, 2018 at 20:55 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.