All the response here have a piece of the answer but you will need to combine: the prefix
, the delimiter
and the prefixes
in a loaded list_blobs(...)
iterator. Let me throw down the code to get the 100 top level folders and then we'll walk through it.
import google.cloud.storage as gcs
client = gcs.Client()
blobs = client.list_blobs(
bucket_or_name=BUCKET_NAME,
prefix="",
delimiter="/",
max_results=1
)
next(blobs, ...) # Force list_blobs to make the api call (lazy loading)
# prefixes is now a set, convert to list
print(list(blobs.prefixes)[:100])
In first eight lines we build the GCS client and make the client.list_blobs(...)
call. In your question you mention the bucket.list_blobs(..)
method - as of version 1.43 this still works but the page on Buckets in the docs say this is now deprecated. The only difference is the keword arg bucket_or_name
, on line 4.
We want folders at the top level, so we don't actually need to specify prefix
at all, however, it will be useful for other readers to know that if you had wanted to list folders in a top-level directory stuff
then you should specify a trailing slash. This kwarg would then become prefix="stuff/"
.
Someone already mentioned the delimiter
kwarg, but to iterate, you should specify this so GCS knows how to interpret the blob names as directories. Simple enough.
The max_results=1
is for efficiency. Remember that we don't want blobs here, we want only folder names. Therefore if we tell GCS to stop looking once it finds a single blob, it might be faster. In practice, I have not found this to be the case but it could easily be if you have vast numbers of blobs, or if the storage is cold-line or whatever. YMMV. Consider it optional.
The blobs
object returned is an lazy-loading iterator, which means that it won't load - including not even populating its members - until the first api call is made. To get this first call, we ask for the next
element in the iterator. In your case, you know you have at least one file, so simply calling next(blobs)
will work. It fetches the blob that is next in line (at the front of the line) and then throws it away.
However, if you could not guarantee to have at least one blob, then next(blobs)
, which needs to return something from the interator, will raise a StopIteration
exception. To get round this, we put the default value of the ellipsis ...
.
Now the member of blobs
we want, prefixes
, is loaded, we print out the first 100. The output will be something like:
{'dir0/','dir1/','dir2/', ...}
prefixes
returned. I do not have an example to post. Google Cloud Storage usesprefix
andseparator
to facilitate listing objects. Hopefully, this tip will help you. – Taryntaryneprefixes
". This won't load however until you iterate list_blobs first element. PS it's actuallydelimiter
notseparator
on GCS but we know what you mean. – Troublous