Why am I gettting AWS S3 503 Slow Down exceptions when I'm using random prefixes per documentation?
Asked Answered
M

1

8

I have a Lambda using the AWS Java SDK that is triggered via SQS. The SQS message contains at least one S3 bucket/key combination. The lambda reads the specified file which is full of tomcat log events and sorts them by session ID, then writes each session's events to another, common S3 bucket.

Reads from the source buckets never hit any service limits so there is no problem there.

Writes to the single destination bucket result in thousands of 503 Slow Down exceptions. To get around this I am writing to >= 20k VERY random partitions in the destination bucket, but I still get these errors.

My most recent test showed that during any particular second the lambdas were only writing about 1,800 files TOTAL to the bucket, and those writes were spread evenly over many partitions.

Since AWS documentation states that 3,500 simultaneous write operations are possible per partition, per second, there should be no reason to get a 503 Slow Down exception. Because these writes are happening from up to 1,000 concurrently running lambdas I have no good way to implement any kind of exponential backoff strategy, but I really shouldn't need to. It doesn't make sense that S3 would be issuing these exceptions and the application really relies on maximum concurrent processing.

So, long story short, 200+ concurrent lambdas writing ~1,800 small files per second over 200+ distinct AWS S3 Clients and ~30,000 partitions is throwing thousands of S3 Slow Down exceptions. How can I get rid of them?

I have tried several different partition counts, from 5,000 up to 32,000 to try and distribute the writes. No change.

Partition examples: 80e49ec0903f1e8fa1a8/ a5468f4a8a184cd13696/ cf4d516ca85abafb7b26/

protected PutObjectResult saveFile(AmazonS3 s3, String bucket, String partition, List<Packet> packets) {

        //noinspection deprecation
     System.setProperty(SDKGlobalConfiguration.ENABLE_S3_SIGV4_SYSTEM_PROPERTY, "true");

        String fileName = Utils.getIntermediateName(partition);

        byte[] content = packets.stream().map(Packet::getMessage).map(Utils::preparePacket)
                .filter(Objects::nonNull).collect(ByteArrayOutputStream::new, (baos, bytes) -> {
                    try {
                        baos.write(bytes);
                    } catch (IOException e1) {
                        log("Could not write to a stream: " + e1.toString());
                    }
                }, (baos1, baos2) -> baos1.write(baos2.toByteArray(), 0, baos2.size())).toByteArray();

        PutObjectResult result = null;

        try (InputStream stream = new ByteArrayInputStream(content)) {
            ObjectMetadata metadata = new ObjectMetadata();
            metadata.setContentLength(content.length);

            PutObjectRequest request = new PutObjectRequest(bucket, fileName, stream, metadata);
            request.setCannedAcl(CannedAccessControlList.BucketOwnerFullControl);
            result = s3.putObject(request);
        } catch (Exception e) {
            log("In saveFile: " + e.toString());
        }
        return result;
    }

Expected results are that I get no 503 Slow Down exceptions as I am way, way under the documented limit of 3,500 writes per partition.

Actual results are that I get thousands of 503 Slow Down exceptions.

Milord answered 6/8, 2019 at 19:9 Comment(2)
Do you have logging enabled for the target bucket? You might confirm that you don't have some kind of write amplification/multiplication scenario going, where each write to the bucket triggers another write, via S3 events, ad infinitum. Having ruled that out, how long have you been writing to this bucket at the current level?Dogmatics
Brand new bucket, no events configured on it at all.Milord
C
-1

You are on the right track to increase randomness as far left as possible in the key. However AWS's partition logic is opaque -- we don't know how many partitions a bucket is split into, or where in the key AWS is drawing the line to split partitions. Note that '/' is not a special character for these purposes - they might split at the 3rd, 8th, or 20th character. We also don't get to know how 'wide' those partitions are.

The system should autoscale behind the scenes, but if it isn't then a support ticket (if you have that level of support) can be filed to ask the S3 team to split the bucket into more partitions and even potentially apply some manual partitioning.

Cruiser answered 12/6, 2024 at 18:27 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.