I am running a batch job in AWS which consumes messages from a SQS queue and writes them to a Kafka topic using akka. I've created a Sqs Async Client with the following parameters:
private static SqsAsyncClient getSqsAsyncClient(final Config configuration, final String awsRegion) {
var asyncHttpClientBuilder = NettyNioAsyncHttpClient.builder()
.maxConcurrency(100)
.maxPendingConnectionAcquires(10_000)
.connectionMaxIdleTime(Duration.ofSeconds(60))
.connectionTimeout(Duration.ofSeconds(30))
.connectionAcquisitionTimeout(Duration.ofSeconds(30))
.readTimeout(Duration.ofSeconds(30));
return SqsAsyncClient.builder()
.region(Region.of(awsRegion))
.httpClientBuilder(asyncHttpClientBuilder)
.endpointOverride(URI.create("https://sqs.us-east-1.amazonaws.com/000000000000")).build();
}
private static SqsSourceSettings getSqsSourceSettings(final Config configuration) {
final SqsSourceSettings sqsSourceSettings = SqsSourceSettings.create().withCloseOnEmptyReceive(false);
if (configuration.hasPath(ConfigPaths.SqsSource.MAX_BATCH_SIZE)) {
sqsSourceSettings.withMaxBatchSize(10);
}
if (configuration.hasPath(ConfigPaths.SqsSource.MAX_BUFFER_SIZE)) {
sqsSourceSettings.withMaxBufferSize(1000);
}
if (configuration.hasPath(ConfigPaths.SqsSource.WAIT_TIME_SECS)) {
sqsSourceSettings.withWaitTime(Duration.of(20, SECONDS));
}
return sqsSourceSettings;
}
But, whilst running my batch job I get the following AWS SDK exception:
software.amazon.awssdk.core.exception.SdkClientException: Unable to execute HTTP request: Acquire operation took longer than the configured maximum time. This indicates that a request cannot get a connection from the pool within the specified maximum time. This can be due to high request rate.
The exception still seems to occur even after I try tweaking the parameters mentioned here:
Consider taking any of the following actions to mitigate the issue: increase max connections, increase acquire timeout, or slowing the request rate. Increasing the max connections can increase client throughput (unless the network interface is already fully utilized), but can eventually start to hit operation system limitations on the number of file descriptors used by the process. If you already are fully utilizing your network interface or cannot further increase your connection count, increasing the acquire timeout gives extra time for requests to acquire a connection before timing out. If the connections doesn't free up, the subsequent requests will still timeout. If the above mechanisms are not able to fix the issue, try smoothing out your requests so that large traffic bursts cannot overload the client, being more efficient with the number of times you need to call AWS, or by increasing the number of hosts sending requests
Has anyone run into this issue before?