Gsutil - How can I check if a file exists in a GCS bucket (a sub-directory) using Gsutil
Asked Answered
S

6

25

I have a GCS bucket containing some files in the path

gs://main-bucket/sub-directory-bucket/object1.gz

I would like to programmatically check if the sub-directory bucket contains one specific file. I would like to do this using gsutil.

How could this be done?

Senghor answered 30/3, 2015 at 22:24 Comment(0)
W
14

If your script allows for non-zero exit codes, then:

#!/bin/bash

file_path=gs://main-bucket/sub-directory-bucket/object1.gz
gsutil -q stat $file_path
status=$?

if [[ $status == 0 ]]; then
  echo "File exists"
else
  echo "File does not exist"
fi

But if your script is set to fail on error, then you can't use exit codes. Here is an alternative solution:

#!/bin/bash
trap 'exit' ERR

file_path=gs://main-bucket/sub-directory-bucket/object1.gz
result=$(gsutil -q stat $file_path || echo 1)
if [[ $result != 1 ]]; then
  echo "File exists"
else
  echo "File does not exist"
fi

Whore answered 25/8, 2020 at 23:13 Comment(2)
This should be the correct answer, as it explains the case of exit codes as well.Rotatory
Instead of setting a trap, you could use something like if [ "$(gsutil -q stat $file_path ; echo $?)" = 0 ]Stretchy
R
12

You can use the gsutil stat command.

Ratoon answered 30/3, 2015 at 22:47 Comment(3)
Thank you jterrace. I did check out gsutil stat - especially the gsutil -q stat option. It looks perfect for my use case. However, Google says that we can only use gsutil -q stat on objects within the main directory. That is, it will not work for objects contained within sub-directories. Is there any other way to check if a object within a sub-directory exists? Thanks!Senghor
Subdirectories don't really exist. Please see cloud.google.com/storage/docs/gsutil/addlhelp/…Isodynamic
@Senghor - it's talking specifically about directories themselves, not the objects inside, e.g. gsutil stat gs://bucket/dir/subdir/foo.txt would work fine. I'll file a bug about updating the docs to make it more clear.Ratoon
C
11

Use the gsutil stat command. For accessing the sub-directories with more number of files use wildcards(*).

For example:

gsutil -q stat gs://some-bucket/some-subdir/*; echo $?

In your case:

gsutil -q stat gs://main-bucket/sub-directory-bucket/*; echo $?

Result 0 means exists; 1 means not exists

Cathartic answered 18/6, 2018 at 12:0 Comment(0)
F
3

There is also gsutil ls (https://cloud.google.com/storage/docs/gsutil/commands/ls)

e.g.

gsutil ls gs://my-bucket/foo.txt

Output is either that same filepath or "CommandException: One or more URLs matched no objects."

Firkin answered 3/9, 2018 at 9:16 Comment(0)
S
1

Simply using the ls command and counting the number of rows of the output.

If 0 then file not there, if 1 the file exists.

file_exists=$(gsutil ls gs://my_bucket/object1.gz | wc -l)

The same could be used for many files of course.

files_number=$(gsutil ls gs://my_bucket/object* | wc -l)
Sardinia answered 6/10, 2021 at 19:9 Comment(1)
This does not provide an answer to the question. Once you have sufficient reputation you will be able to comment on any post; instead, provide answers that don't require clarification from the asker. - From ReviewBing
P
0

If for whatever reason you want to do something depending on the result of that listing (if there are for example parquet files on a directory load a bq table):

gsutil -q stat gs://dir/*.parquet; if [ $? == 0 ]; then bq load ... ; fi

Pigeonhearted answered 25/11, 2019 at 11:27 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.