How to list all blobs inside of a specific subdirectory in Azure Cloud Storage using Python?
Asked Answered
H

5

27

I worked through the example code from the Azure docs https://learn.microsoft.com/en-us/azure/storage/blobs/storage-quickstart-blobs-python

from azure.storage.blob import BlockBlobService
account_name = "x"
account_key = "x"
top_level_container_name = "top_container"

blob_service = BlockBlobService(account_name, account_key)

print("\nList blobs in the container")
generator = blob_service.list_blobs(top_level_container_name)
for blob in generator:
    print("\t Blob name: " + blob.name)

Now I would like to know how to get more fine grained in my container walking. My container top_level_container_name has several subdirectories

  • top_level_container_name/dir1
  • top_level_container_name/dir2
  • etc in that pattern

I would like to be able to list all of the blobs that are inside just one of those directories. For instance

  • dir1/a.jpg
  • dir1/b.jpg
  • etc

How do I get a generator of just the contents of dir1 without having to walk all of the other dirs? (I would also take a list or dictionary)

I tried adding /dir1 to the name of the top_level_container_name so it would be top_level_container_name = "top_container/dir1" but that didn't work. I get back an error code azure.common.AzureHttpError: The requested URI does not represent any resource on the server. ErrorCode: InvalidUri

The docs do not seem to even have any info on BlockBlobService.list_blobs() https://learn.microsoft.com/en-us/python/api/azure.storage.blob.blockblobservice.blockblobservice?view=azure-python

Update: list_blobs() comes from https://github.com/Azure/azure-storage-python/blob/ff51954d1b9d11cd7ecd19143c1c0652ef1239cb/azure-storage-blob/azure/storage/blob/baseblobservice.py#L1202

Huntsville answered 3/7, 2018 at 0:15 Comment(0)
S
41

Please try something like:

generator = blob_service.list_blobs(top_level_container_name, prefix="dir1/")

This should list blobs and folders in dir1 virtual directory.

If you want to list all blobs inside dir1 virtual directory, please try something like:

generator = blob_service.list_blobs(top_level_container_name, prefix="dir1/", delimiter="")

For more information, please see this link.

Shirleenshirlene answered 3/7, 2018 at 1:51 Comment(2)
This returns: <azure.storage.common.models.ListGenerator at 0x7fd2204f5cd0>Arose
@Arose as intended (list_blobs returns a generator object). Now iterate over the generator as you would any iterableHymeneal
D
34

Not able to import BlockBlobService. Seems like BlobServiceClient is the new alternative. Followed the official doc and found this:

from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient

Create a Blob Storage Account client

connect_str = <connectionstring>
blob_service_client = BlobServiceClient.from_connection_string(connect_str)

Create a container client

container_name="dummy"
container_client=blob_service_client.get_container_client(container_name)

This will list all blobs in the container inside dir1 folder/directory

blob_list = container_client.list_blobs(name_starts_with="dir1/")
for blob in blob_list:
print("\t" + blob.name)
Duel answered 21/6, 2020 at 12:43 Comment(2)
Please provide a small writeup for the solution providedKolb
can you limit results to only say .json files ?Caulfield
A
8

The module azurebatchload provides for this and more. You can filter on folder or filenames, plus choose to get the the result in various formats:

  • list
  • dictionary with extended info
  • pandas dataframe

1. List a whole container with just the filenames as a list.

from azurebatchload import Utils

list_blobs = Utils(container='containername').list_blobs()

2. List a whole container with just the filenames as a dataframe.

from azurebatchload import Utils

df_blobs = Utils(
   container='containername',
   dataframe=True
).list_blobs()

3. List a folder in a container.

from azurebatchload import Utils

list_blobs = Utils(
   container='containername',
   name_starts_with="foldername/"
).list_blobs()

4. Get extended information a folder.

from azurebatchload import Utils

dict_blobs = Utils(
   container='containername',
   name_starts_with="foldername/",
   extended_info=True
).list_blobs()

5. Get extended information a folder returned as a pandas dataframe.

from azurebatchload import Utils

df_blobs = Utils(
   container='containername',
   name_starts_with="foldername/",
   extended_info=True,
   dataframe=True
).list_blobs()

disclaimer: I am the author of the azurebatchload module.

Artair answered 9/4, 2021 at 15:5 Comment(3)
Does it support Service Principal secret based authentication instead of connectionString?Therese
@Erfan, I do get the error If AZURE_STORAGE_CONNECTION_STRING is not set as env variable AZURE_STORAGE_KEY and AZURE_STORAGE_ACCOUNT have to be set. every time I run and not sure on how should I configure these two.. Kindly helpFolia
Several ways, easiest way is os.environ["AZURE_STORAGE_CONNECTION_STRING"] = "you connection string" in your python script @FoliaArtair
C
2

To get the blob files inside dir or subdirectory as filepath

from azure.storage.blob import BlockBlobService
blob_service = BlockBlobService(account_name, account_key)
blobfile = []
generator = blob_service.list_blobs(container_name, prefix="filepath/", delimiter="")
for blob in generator:
    blobname = blob.name.split('/')[-1]
    blobfile.append(blobname)
    print("\t Blob name: " + blob.name)
print(blobfile)

Replace delimiter="/" to get the blob as a folder in the above code

Cud answered 28/7, 2021 at 9:58 Comment(0)
D
0

the parameter is name_starts_with. the code will look like this: container.list_blobs(name_starts_with=prefix_value)

prefix="dir1/" inside the container.

please check the documentation https://learn.microsoft.com/en-us/python/api/azure-storage-blob/azure.storage.blob.containerclient?view=azure-python#azure-storage-blob-containerclient-list-blobs

Dyadic answered 15/10, 2022 at 2:48 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.