How to count the total number of objects in an S3 bucket using Java?
Asked Answered
M

2

6

I am trying to find a faster way to count all of the objects within an s3 bucket using Amazon's AWS SDK.

private static int getBucketFileCount(AmazonS3 s3, ListObjectsV2Request req) {
   ListObjectsV2Result result;
   int fileCount = 0;
   log.info("Counting s3 files");

   do {
      result = s3.listObjectsV2(req);
      for (S3ObjectSummary objectSummary : result.getObjectSummaries()) {
         fileCount++;
      }
      req.setContinuationToken(result.getNextContinuationToken());

   } while (result.isTruncated() == true);
       return fileCount;
}

However, this method is very slow and I have not been able to figure out a way to do it properly. I have found another answer that sort of helps, but cannot figure out the implementation exactly. Will getObjectSummaries get the count of objects stored in a S3 Bucket?

How do I use the getNextMarker() function with my current implementation? What do I need to change?

Magnoliamagnoliaceous answered 23/8, 2017 at 20:56 Comment(5)
I noticed you have fileCount += result.getKeyCount() as well as fileCount++ inside a for loop. Are you double counting?Branch
Ah my bad, that was something I forgot to removeMagnoliamagnoliaceous
If you don't require an up-to-date count value, Amazon S3 Inventory is a scheduled alternative to the Amazon S3 synchronous List API operation. It provides a comma-separated values (.csv) flat-file output of your objects and their corresponding metadata on a daily or weekly basis.Codding
refer #28114105 for answer.Circumfuse
@AmitKhandelwal I already linked to that post. I still do not understand how exactly to use those functions since I am not well versed in aws sdkMagnoliamagnoliaceous
R
0

A very fast and cheap way to get the number of objects of a bucket in AWS is to look at the NumberOfObjects Cloudwatch metric for the bucket, which I believe is at least published daily:

    long offsetInMilliseconds = 1000 * 60 * 60 * 24;
    Date endDate = new Date();
    Date startDate = new Date(endDate.getTime() - offsetInMilliseconds);

    Dimension dimension = new Dimension()
        .withName("BucketName")
        .withValue(bucketName);
    Dimension storageTypeDimension = new Dimension()
        .withName("StorageType")
        .withValue("AllStorageTypes");

    GetMetricStatisticsRequest request = new GetMetricStatisticsRequest()
        .withStartTime(startDate)
        .withEndTime(endDate)
        .withPeriod(86400)
        .withDimensions(dimension, storageTypeDimension)
        .withMetricName("NumberOfObjects")
        .withNamespace("AWS/S3")
        .withStatistics(Statistic.Maximum);

    GetMetricStatisticsResult result = cloudWatch.getMetricStatistics(request);

    if (!result.getDatapoints().isEmpty()) {
        double maximumNumberOfObjects = result.getDatapoints().get(0).getMaximum();
        System.out.println("Maximum number of objects: " + maximumNumberOfObjects);
    } else {
        System.out.println("No data available.");
    }
Rattling answered 22/4, 2024 at 22:12 Comment(0)
D
0

You can use ListObjectsV2Request methods to build a request object and pass it to S3Client#listObjectsV2. As a final step, you can iterate the contents of the result to get the desired count.

Sample code:

import software.amazon.awssdk.auth.credentials.ProfileCredentialsProvider;
import software.amazon.awssdk.regions.Region;
import software.amazon.awssdk.services.s3.S3Client;
import software.amazon.awssdk.services.s3.model.ListObjectsV2Request;
import software.amazon.awssdk.services.s3.model.ListObjectsV2Response;
import software.amazon.awssdk.services.s3.model.S3Object;

public class Main {

    public static void main(String[] args) {
        String bucketName = "Replace-this-string-with-your-bucket-name";

        S3Client s3 = S3Client.builder()
                .region(Region.US_EAST_1)
                .credentialsProvider(ProfileCredentialsProvider.create())
                .build();

        int count = 0;
        String continuationToken = null;

        do {
            ListObjectsV2Request listObjectsRequest = ListObjectsV2Request.builder()
                    .bucket(bucketName)
                    .continuationToken(continuationToken)
                    .build();

            ListObjectsV2Response response = s3.listObjectsV2(listObjectsRequest);

            for (S3Object object : response.contents()) {
                count++;
            }

            continuationToken = response.nextContinuationToken();

        } while (continuationToken != null);

        System.out.println("Total number of objects in the bucket: " + count);
    }
}

Some useful references:

  1. Provide temporary credentials to the AWS SDK for Java
  2. ListObjectsV2Request#continuationToken
Diner answered 21/9, 2024 at 17:13 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.