I need to list all of the blobs in an Azure Blobstorage container. The container has circa 200,000~ blobs in it, and I'm looking to obtain the blob name, the last modified date, and the blob size.
Following the documentation for the Azure Java SDK V12, the following code should work:
BlobServiceClient blobServiceClient = new BlobServiceClientBuilder().connectionString(AzureBlobConnectionString).buildClient();
String containerName = "container1";
BlobContainerClient containerClient = blobServiceClient.getBlobContainerClient(containerName);
System.out.println("\nListing blobs...");
// List the blob(s) in the container.
for (BlobItem blobItem : containerClient.listBlobs()) {
System.out.println("\t" + blobItem.getName());
}
However, when executed this application just seems to hang indefinitely. If I open Powershell and run the following command:
Get-AzStorageBlob -Container container1 -Context $ctx
I get the expected result within about 3 minutes.
I've given the code example upwards of an hour to execute, yet nothing comes of it. I attempted to restrict the data being requested as per the documentation, along with setting a 5 minute time out:
BlobServiceClient blobServiceClient = new BlobServiceClientBuilder().connectionString(AzureBlobConnectionString).buildClient();
String containerName = "container1";
BlobContainerClient containerClient = blobServiceClient.getBlobContainerClient(containerName);
System.out.println("\nListing blobs...");
ListBlobsOptions options = new ListBlobsOptions()
.setMaxResultsPerPage(10)
.setDetails(new BlobListDetails()
.setRetrieveDeletedBlobs(false)
.setRetrieveSnapshots(true));
Duration duration = Duration.ofMinutes(5);
containerClient.listBlobs(options, duration).forEach(blob ->
System.out.printf("Name: %s, Directory? %b, Deleted? %b, Snapshot ID: %s%n",
blob.getName(),
blob.isPrefix(),
blob.isDeleted(),
blob.getSnapshot()));
However this resulted in it timing out with the exception:
Exception in thread "main" reactor.core.Exceptions$ReactiveException: java.util.concurrent.TimeoutException: Did not observe any item or terminal signal within 300000ms in 'flatMap' (and no fallback has been configured)
at reactor.core.Exceptions.propagate(Exceptions.java:366)
at reactor.core.publisher.BlockingIterable$SubscriberIterator.hasNext(BlockingIterable.java:168)
at java.lang.Iterable.forEach(Iterable.java:74)
at AzureManagement.AzureControl.listAllBlobs(AzureControl.java:42)
at Main.main(Main.java:8)
I understand there used to be a method called "listBlobsSegmented", however this does not appear to be in V12 of the Azure SDK for Java.
If anybody has any ideas as to how to get a list of the blobs in the container in an effective and efficient manner I would very much appreciate it!
Thanks.