Google Cloud Storage List Blob objects with specific file name
Asked Answered
S

6

5

With the help of google.cloud.storage and list_blobs I can get the list of files from the specific bucket. But I want to filter(name*.ext) the exact files from the bucket. I was not able to find the exact solution.

For example: buket=data, prefix_folder_name=sales, with in prefix folder I have list of invoices with metadata. I want to get the specific invoices and its metadata(name*.csv & name.*.meta). Also, if I loop the entire all_blobs of the particular folder to get the selected files then it will be huge volume of data and it may affecting performance.

It would be good if someone one help me with this solution.

bucket = gcs_client.get_bucket(buket)
all_blobs = bucket.list_blobs(prefix=prefix_folder_name)
for blob in all_blobs: 
  print(blob.name)
Subito answered 8/7, 2020 at 17:21 Comment(0)
A
6

According to google-cloud-storage documentation Blobs are objects that have name attribute, so you can filter them by this attribute.

from google.cloud import storage

# storage_client = gcs client
storage_client = storage.Client()

# bucket_name = "your-bucket-name"
# Note: Client.list_blobs requires at least package version 1.17.0.
blobs = storage_client.list_blobs(bucket_name)

# filter_dir = "filter-string"
[blob.name for blob in blobs if filter_dir in blob.name ]
Avionics answered 26/11, 2020 at 13:7 Comment(1)
It's clearly stated in the question that just getting all the objects and looping to match a filter would not be performant.Jampacked
C
4

You can do this with the match_glob Parameter, e.g.

bucket.list_blobs(match_glob='*.ext')
Cotyledon answered 2/8, 2023 at 11:20 Comment(0)
S
2

It doesn't allow you to filter, but you can use use the fields parameter to just return the name of the objects, limiting the amount of data returned and making it easy to filter.

Siberia answered 8/7, 2020 at 23:55 Comment(0)
Q
1

You can filter for a prefix, but to filter more specifically (e.g., for objects ending with a given name extension) you have to implement client-side filtering logic. That's what gsutil does when you do a command like:

gsutil ls gs://your-bucket/abc*.txt
Queeniequeenly answered 8/7, 2020 at 18:48 Comment(0)
F
1

Expanding on @michael-vehrs answer, if you need to look at all directory levels, be sure to prepend **/.

from google.cloud import storage
bucket = storage.Client().bucket("bucket-name")
bucket.list_blobs(match_glob="**/*.ext")
Fahrenheit answered 3/11, 2023 at 15:15 Comment(0)
A
0

You can use the following considering the filters as name and .ext for the files:

all_blobs = bucket.list_blobs()    
fileList = [file.name for file in all_blobs if '.ext' in file.name and 'name' in file.name]

for file in fileList: 
  print(file)

Here name will be the fileName filter and .ext will be your extension filter.

Anabaptist answered 30/4, 2021 at 11:59 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.