How to improve performance of initial calls to AWS services from an AWS Lambda (Java)?

G

4

3

I recently tried to analyze some performance issues on a service hosted in AWS Lambda. Breaking down the issue, I realized that it was only on the first calls on each container. When isolating the issue, I found myself creating a new test project to get a simple example.

Test project (You can clone it, build it mvn package, deploy it sls deploy and then test it via the AWS Management Console.)

This project has 2 AWS Lambda functions: source and target. The target function simply returns an empty json {}. The source function invokes the target function using the AWS Lambda SDK.

The approximate duration of the target function is 300-350 ms on cold starts and 1ms on hot invokes. The approximate duration of the source function is 6000-6300ms on cold starts and 280ms on hot invokes.

The 6 seconds overhead on the cold starts of the source function appear to be 3 seconds of getting the client and 3 seconds of invoking the other function, in hot invokes that is 3ms and 250ms respectively. I get similar times for other services like AWS SNS.

I don't really understand what it is doing in those 6 seconds and what I can do to avoid it. When doing warmup calls, I can get the client and store the reference to avoid the first few seconds, but the other few seconds come from actually using the other service (SNS, Lambda, etc), which I can't really do as a no-op.

So, do other people experience the same cold start durations and what can I do to increase the performance on that? (other than bringing the memory setting up)

Gasser answered 5/12, 2020 at 13:22 Comment(4)

See Predictable start-up times with Provisioned Concurrency. – Aramenta 5/12, 2020 at 15:36

I would recommend you to use AWS X-Ray. That might give you further insight. – Expellant 6/12, 2020 at 1:1

Your test code invokes a Lambda - that isn't useful in determine why the Lambda code is not performing well. Where is the code in the Lambda? – Quarrelsome 8/12, 2020 at 1:42

The code in both lambdas are in the test project. The target lambda simply prints "invoked" and then returns an empty json object "{}". The primary goal of the test is to figure out how to optimize the process of invoking a different lambda function, so I think invoking a simple "hello-world" lambda is the most accurate test – Gasser 8/12, 2020 at 11:17

B

3

The main reason for slow cold-start times with a Java Lambda is the need to load classes and initialize objects. For simple programs this can be very fast: a Lambda that does nothing other than print "Hello, World" will run in ~40 ms, which is similar to the Python runtime. On the other hand, a Spring app will take much more time to start up, because even a simple Spring app loads thousands of classes before it does anything useful.

While the obvious way to reduce your cold-start times is to reduce the number of classes that you need to load, this is rarely easy to do, and often not possible. For example, if you're writing a web-app in Spring there's no way around initializing the Spring application context before processing a web request.

If that's not an option, and you're using the Maven Shade plugin to produce an "uber-JAR", you should switch to the Assembly plugin as I describe here. The reason is that Lambda unpacks your deployment bundle, so an "uber-JAR" turns into lots of tiny classfiles that have to be individually opened.

Lastly, increase your memory allotment. This without question the best thing that you can do for Lambda performance, Java or otherwise. First, because increasing memory reduces the amount of work that the Java garbage collector has to do. Second, because the amount of CPU that your Lambda gets is dependent on the memory allotment. You don't get a full virtual CPU until 1,769 MB. I recommend that for a Java app you give it 2 GB; the cost of the bigger allotment is often offset by reduced CPU requirements.

One thing I would not do is pay for provisioned concurrency. If you want a machine up and running all the time, use ECS/EKS/EC2. And recognize that if you have a bump in demand, you're still going to get cold starts.

Update: I spent some time over the holiday quantifying various performance improvement techniques. The full writeup is here, but the numbers are worth repeating.

My example program was, like the OP's, a "do nothing" that just created an SDK client and used it to invoke an API:

public void handler(Object ignored, Context context)
{
    long start = System.currentTimeMillis();
    
    AWSLogs client = AWSLogsClientBuilder.defaultClient();
    
    long clientCreated = System.currentTimeMillis();
    
    client.describeLogGroups();
    
    long apiInvoked = System.currentTimeMillis();
    
    System.err.format("time to create SDK client = %6d\n", (clientCreated - start));
    System.err.format("time to make API call     = %6d\n", (apiInvoked - clientCreated));
}

I ran this with different memory sizes, forcing a cold start each time. All times are in milliseconds:

|                   |  512 MB | 1024 MB | 2048 MB | 4096 MB |
|+++++++++++++++++++|+++++++++|+++++++++|+++++++++|+++++++++|
| Create client     |    5298 |    2493 |    1272 |    1019 |
| Invoke API call   |    3844 |    2023 |    1061 |     613 |
| Billed duration   |    9213 |    4555 |    2349 |    1648 |

As I said above, the primary benefit that you get from increasing memory is that you increase CPU at the same time. Creating and initializing an SDK client is CPU-intensive, so the more CPU you can give it, the better.

Update 2: this morning I tried compiling a simple AWS program with GraalVM. It took several minutes to build the stand-alone executable, and even then it created a "fallback image" (which has an embedded JDK) due to dependencies of the AWS SDK. When I compared runtimes, there was no difference between running with standard Java.

Bottom line: use Java for things that will run long enough to benefit from Hotspot. Use a different language (Python, JavaScript, perhaps Go) for things that are short-running and need low latency.

Bruyn answered 5/12, 2020 at 19:11 Comment(5)

So, if I understand it correctly, the delay I am experiencing is probably mostly of loading the class files and processing them, right? In that case, would a more lightweight AWS Lambda sdk package be interesting? In any case, I will try to fiddle a bit with the packaging and try the assembly plugin and see what other results I will get. – Gasser 5/12, 2020 at 20:53

@Gasser - I'm not sure where you're going to find a "more lightweight" SDK, unless you were thinking of writing it yourself. Even then, the existing SDK is relatively lightweight. – Bruyn 6/12, 2020 at 20:11

But I'm willing to bet that increasing the amount of RAM that you allocate to your Lambda will far overshadow anything else that you can do. I reread your question and saw that you don't want to increase RAM. Why? – Bruyn 6/12, 2020 at 20:12

I have no problem increasing the RAM, I will probably settle on something about 3GB, but I am looking for other ways to speed up the initialization process. if I can bring the initialization below 1 second, I can forget about forcing warm containers, simply bringing the RAM to 10GB is not an option I am willing to go to though, hence my search for optimization options – Gasser 7/12, 2020 at 2:9

I tried to make a "hello world" app with a native image with graalvm, but found it quite difficult to do so, I also saw a post about using the newer java sdk and specifying more client settings on creation: docs.aws.amazon.com/sdk-for-java/latest/developer-guide/… but still no notable change to the performance. I settled on just increasing the capacity (memory/cpu) of the containers for the time being. – Gasser 29/1, 2021 at 13:53

E

3

Provisioned concurrency helps with The the code initialization duration you are having. Other than that, it targets to another overhead coming from execution environment setup for your function’s code.

Refer to Turning on Provisioned Concurrency section here.

Excitant answered 5/12, 2020 at 15:50 Comment(1)

The provisioned concurrency will certainly help in avoiding cold starts, but it won't solve the problem as the cold starts occur when the function is actually used (as in, actually publishes to an SNS topic or actually invokes another Lambda), due to this, neither provisioned concurrency nor tools like thundra can really warm up the lambda containers. – Gasser 5/12, 2020 at 20:55

B

3

The main reason for slow cold-start times with a Java Lambda is the need to load classes and initialize objects. For simple programs this can be very fast: a Lambda that does nothing other than print "Hello, World" will run in ~40 ms, which is similar to the Python runtime. On the other hand, a Spring app will take much more time to start up, because even a simple Spring app loads thousands of classes before it does anything useful.

While the obvious way to reduce your cold-start times is to reduce the number of classes that you need to load, this is rarely easy to do, and often not possible. For example, if you're writing a web-app in Spring there's no way around initializing the Spring application context before processing a web request.

If that's not an option, and you're using the Maven Shade plugin to produce an "uber-JAR", you should switch to the Assembly plugin as I describe here. The reason is that Lambda unpacks your deployment bundle, so an "uber-JAR" turns into lots of tiny classfiles that have to be individually opened.

Lastly, increase your memory allotment. This without question the best thing that you can do for Lambda performance, Java or otherwise. First, because increasing memory reduces the amount of work that the Java garbage collector has to do. Second, because the amount of CPU that your Lambda gets is dependent on the memory allotment. You don't get a full virtual CPU until 1,769 MB. I recommend that for a Java app you give it 2 GB; the cost of the bigger allotment is often offset by reduced CPU requirements.

One thing I would not do is pay for provisioned concurrency. If you want a machine up and running all the time, use ECS/EKS/EC2. And recognize that if you have a bump in demand, you're still going to get cold starts.

Update: I spent some time over the holiday quantifying various performance improvement techniques. The full writeup is here, but the numbers are worth repeating.

My example program was, like the OP's, a "do nothing" that just created an SDK client and used it to invoke an API:

public void handler(Object ignored, Context context)
{
    long start = System.currentTimeMillis();
    
    AWSLogs client = AWSLogsClientBuilder.defaultClient();
    
    long clientCreated = System.currentTimeMillis();
    
    client.describeLogGroups();
    
    long apiInvoked = System.currentTimeMillis();
    
    System.err.format("time to create SDK client = %6d\n", (clientCreated - start));
    System.err.format("time to make API call     = %6d\n", (apiInvoked - clientCreated));
}

I ran this with different memory sizes, forcing a cold start each time. All times are in milliseconds:

|                   |  512 MB | 1024 MB | 2048 MB | 4096 MB |
|+++++++++++++++++++|+++++++++|+++++++++|+++++++++|+++++++++|
| Create client     |    5298 |    2493 |    1272 |    1019 |
| Invoke API call   |    3844 |    2023 |    1061 |     613 |
| Billed duration   |    9213 |    4555 |    2349 |    1648 |

As I said above, the primary benefit that you get from increasing memory is that you increase CPU at the same time. Creating and initializing an SDK client is CPU-intensive, so the more CPU you can give it, the better.

Update 2: this morning I tried compiling a simple AWS program with GraalVM. It took several minutes to build the stand-alone executable, and even then it created a "fallback image" (which has an embedded JDK) due to dependencies of the AWS SDK. When I compared runtimes, there was no difference between running with standard Java.

Bottom line: use Java for things that will run long enough to benefit from Hotspot. Use a different language (Python, JavaScript, perhaps Go) for things that are short-running and need low latency.

Bruyn answered 5/12, 2020 at 19:11 Comment(5)

So, if I understand it correctly, the delay I am experiencing is probably mostly of loading the class files and processing them, right? In that case, would a more lightweight AWS Lambda sdk package be interesting? In any case, I will try to fiddle a bit with the packaging and try the assembly plugin and see what other results I will get. – Gasser 5/12, 2020 at 20:53

@Gasser - I'm not sure where you're going to find a "more lightweight" SDK, unless you were thinking of writing it yourself. Even then, the existing SDK is relatively lightweight. – Bruyn 6/12, 2020 at 20:11

But I'm willing to bet that increasing the amount of RAM that you allocate to your Lambda will far overshadow anything else that you can do. I reread your question and saw that you don't want to increase RAM. Why? – Bruyn 6/12, 2020 at 20:12

I have no problem increasing the RAM, I will probably settle on something about 3GB, but I am looking for other ways to speed up the initialization process. if I can bring the initialization below 1 second, I can forget about forcing warm containers, simply bringing the RAM to 10GB is not an option I am willing to go to though, hence my search for optimization options – Gasser 7/12, 2020 at 2:9

I tried to make a "hello world" app with a native image with graalvm, but found it quite difficult to do so, I also saw a post about using the newer java sdk and specifying more client settings on creation: docs.aws.amazon.com/sdk-for-java/latest/developer-guide/… but still no notable change to the performance. I settled on just increasing the capacity (memory/cpu) of the containers for the time being. – Gasser 29/1, 2021 at 13:53

R

2

aws-lightweight-client-java is a standalone jar (no dependencies) and is less than 60K. It was built with the exact purpose of reducing Java Lambda cold start times which it does considerably and it is easy to use (though you might have to check the AWS API docs for your task). I've found that with the AWS SDK S3 jar my cold start time was about 10s and with this lightweight client it's down to 4s (this is with 512MB memory allocated). Allocating 2GB memory to the Lambda yields a cold start time of 3.6s with the AWS SDK and down to 1s with the lightweight client.

The mere fact that the library is making https calls does bring about the loading of 2000 or so classes so it's hard to go a lot quicker than 1s (unless there's some cool https library out there that is much more efficient in this regard).

Repose answered 4/6, 2021 at 11:36 Comment(0)

D

2

Basically, there are a set of recommendations that I am using as a cheat sheet every time I have to optimize lambda performance.

Use SDKv2. I have seen a lot of times when AWS SDKv1 and v2 are completely incompatible. Migration from v1 to v2 can be easy but sometimes API changes are so huge that you simply cannot find the corresponding method in V2. But if you can then you better do so. V2 introduces a lot of performance improvements so it’s a rule of thumb “If it’s possible then use V2 whenever you can”
Using defined credentials provider. AWS SDK has quite an interesting way of detecting the credentials. It passes multiple steps trying to figure out the proper credentials until it finds or fails to find any.

Java system properties
Environment variables
Web identity token from AWS STS
The shared credentials and config files
Amazon ECS container credentials
Amazon EC2 instance profile credentials

All these steps take time and you save some milliseconds by specifying exact credentials providers. Like this:

S3Client client = S3Client.builder()
       .credentialsProvider(EnvironmentVariableCredentialsProvider.create())
       .build();

This way SDK wouldn’t traverse all possible sources of creds and immediately detect the correct one.

Initialize everything prior to the execution Simple advice to follow. You may want to simplify things and put all the initialization into the handler method. Better don’t do this, but try to put as much initialization into the constructor as possible. It may reduce latency for repetitive lambda invocations.
Reduce jar size To reduce lambda cold start one of the not obvious advice is to reduce the jar size. Java developers usually don’t care to include a few more libraries to avoid reinventing the wheel. But in the case of lambda, you better take a closer look at your pom.xml and clean everything unnecessary. Cause a bigger jar means a longer cold start.
Avoid using any DI I don't think you wanted to use any kind of DI. But in case you do try to avoid it. Lambda's purpose is to be small and lightweight. And DI will dramatically increase cold start and it doesn't make a lot of sense to wire up 2-3 classes.
Use Tiered compilation Java Just in time compilation has such a cool feature as a tiered compilation introduced since Java 8 was released. The purpose of JIT is to run the code and eventually reach the native code performance. It cannot be done immediately. But running the code and analyzing the hot spots JIT eventually interprets code almost as good as native. This can be achieved by collecting the profiling information in the background. This makes sense with your monolith application running in a servlet container for ages. But short-lived Lambda cannot benefit from these optimisations and it’s better to turn it off completely. To do this put these env variables: how to put env vars

For better understanding, I would refer to oracle docs: https://docs.oracle.com/javacomponents/jrockit-hotspot/migration-guide/comp-opt.htm#JRHMG119

Specify Region and HttpClient explicitly By default, AWS SDK comes with 3 different HTTP libs supported which are apache, netty and the built-in JDK HTTP client. Apache and Netty have a lot of features that standard built-in doesn’t have but we have to reduce cold start so prefer using built-in and excluding two others to keep less dependency on the resulting jar.

<dependency>
            <groupId>software.amazon.awssdk</groupId>
            <artifactId>s3</artifactId>
            <exclusions>
                <exclusion>
                    <groupId>software.amazon.awssdk</groupId>
                    <artifactId>netty-nio-client</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>software.amazon.awssdk</groupId>
                    <artifactId>apache-client</artifactId>
                </exclusion>
            </exclusions>
        </dependency>

Almost the same situation is with the region. It took some time to figure out the region lambda is deployed and this time can be reduced by specifying the region explicitly. Overall the resulting configuration should look like:

       .region(Region.US_WEST_2)
       .httpClient(UrlConnectionHttpClient.builder().build())
       .build();

Use RDS Proxy to have connection pooling In case you are planning to use Lambda with RDS then this advice may help you as well otherwise skip it. In “normal” Java applications it is common to use a pool of connections to reuse existing ones and save some time on establishing new ones. RDS Proxy service is coming to the rescue when you are using Lambda.
Increase the memory allocated Simple yet powerful advice. It can be that your Lambda can run out of memory with the standard 128 Mb allocated. And it seems correct to increase memory in this case. What is hidden and not obvious is that increasing memory allocated gives your lambda more CPU available. So the combination of increased memory allocated and more virtual CPU power available of course decreases the execution time. Giving your lambda more memory and CPU means increased costs. But less execution time. Instead of guessing which combination is better, I propose to use this tool: https://github.com/alexcasalboni/aws-lambda-power-tuning
Use provisioned concurrency Other guys already mentioned this. Probably the most simple and easy way to solve the problem. But it incurs additional costs. Provisioned concurrency means AWS will keep execution context ready for you to be used thus decreasing lambda cold start. You can specify the number of provisioned instances and enjoy lambda being warmed up for you.

There is exotic advice to use Graal VM but I think my answer is long enough.

Drivel answered 22/5, 2022 at 9:5 Comment(0)

Recommended topics

Hot tags