I see that you already have found the answer to your issue, however, I will post this answer here in order to give more context on how the gsutil stat
command works and why was your code not working.
gsutil is a Python application that is used for accessing and working with Cloud Storage using the Command Line Interface. It has many commands available, and the one that you used is gsutil stat, which outputs the metadata of an object retrieving the minimum possible data without having to list all the objects in a bucket. This command is also strongly consistent, which makes it an appropriate solution for certain types of applications.
Using the gsutil stat gs://<BUCKET_NAME>/<BUCKET_OBJECT>
command, returns something like the following:
gs://<BUCKET_NAME>/<BUCKET_OBJECT>.png:
Creation time: Tue, 06 Feb 2018 14:49:58 GMT
Update time: Tue, 06 Feb 2018 14:49:58 GMT
Storage class: MULTI_REGIONAL
Content-Length: 6119
Content-Type: image/png
Hash (crc32c): <CRC32C_HASH>
Hash (md5): <MD5_HASH>
ETag: <ETAG>
Generation: <TIMESTAMP>
Metageneration: 1
However, if you run it using the -q
, it will return 0
if the object exists, or 1
if does not, which makes it interesting for writing scripts such as the one you shared.
Finally, there are some additional considerations that you have to consider when working with subdirectories inside a bucket:
- A command such as
gsutil -q stat gs://my_bucket/my_subdirectory
will retrieve the data of an object called my_subdirectory
, not of a directory itself.
- A command such as
gsutil -q stat gs://my_bucket/my_subdirectory/
will operate over the subdirectory itself, and not over the nested files, so it will just tell you whether the subdirectory exists or not (this is why your code was failing).
- You have to use something like
gsutil -q stat gs://my_bucket/my_subdirectory/my_nested_file.txt
in order to retrieve the metadata of a file nested under a subdirectory.
So, in short, your issue was that you were not making a proper definition of paths. It is not that gsutil
is too sensitive in understanding path, but this behavior is working as intended, because you may have the following situation, where you have a file and a folder with the same name, and you should be able to retrieve either of them, thus requiring to specify the /
that indicates whether it is a directory or a file:
gs://my_bucket/
|_ my_subdirectory #This is a file
|_ my_subdirectory/ #This is a folder
|_ my_nested_file.txt #This is a nested file
gsutil stat
operates only on full object names, not "directories"/prefixes. When evaluating each argument, gsutil actually removes the trailing slash (if present). So runninggsutil stat
against "gs://bucket/dir" and "gs://bucket/dir/" both check whether "gs://bucket/dir" exists. Further,gsutil stat gs://bucket/dir
will return a failure code if that exact object does not exist, even if something with that prefix (e.g. gs://bucket/dir/obj) does. – Garboil