The main reason for slow cold-start times with a Java Lambda is the need to load classes and initialize objects. For simple programs this can be very fast: a Lambda that does nothing other than print "Hello, World" will run in ~40 ms, which is similar to the Python runtime. On the other hand, a Spring app will take much more time to start up, because even a simple Spring app loads thousands of classes before it does anything useful.
While the obvious way to reduce your cold-start times is to reduce the number of classes that you need to load, this is rarely easy to do, and often not possible. For example, if you're writing a web-app in Spring there's no way around initializing the Spring application context before processing a web request.
If that's not an option, and you're using the Maven Shade plugin to produce an "uber-JAR", you should switch to the Assembly plugin as I describe here. The reason is that Lambda unpacks your deployment bundle, so an "uber-JAR" turns into lots of tiny classfiles that have to be individually opened.
Lastly, increase your memory allotment. This without question the best thing that you can do for Lambda performance, Java or otherwise. First, because increasing memory reduces the amount of work that the Java garbage collector has to do. Second, because the amount of CPU that your Lambda gets is dependent on the memory allotment. You don't get a full virtual CPU until 1,769 MB. I recommend that for a Java app you give it 2 GB; the cost of the bigger allotment is often offset by reduced CPU requirements.
One thing I would not do is pay for provisioned concurrency. If you want a machine up and running all the time, use ECS/EKS/EC2. And recognize that if you have a bump in demand, you're still going to get cold starts.
Update: I spent some time over the holiday quantifying various performance improvement techniques. The full writeup is here, but the numbers are worth repeating.
My example program was, like the OP's, a "do nothing" that just created an SDK client and used it to invoke an API:
public void handler(Object ignored, Context context)
{
long start = System.currentTimeMillis();
AWSLogs client = AWSLogsClientBuilder.defaultClient();
long clientCreated = System.currentTimeMillis();
client.describeLogGroups();
long apiInvoked = System.currentTimeMillis();
System.err.format("time to create SDK client = %6d\n", (clientCreated - start));
System.err.format("time to make API call = %6d\n", (apiInvoked - clientCreated));
}
I ran this with different memory sizes, forcing a cold start each time. All times are in milliseconds:
| | 512 MB | 1024 MB | 2048 MB | 4096 MB |
|+++++++++++++++++++|+++++++++|+++++++++|+++++++++|+++++++++|
| Create client | 5298 | 2493 | 1272 | 1019 |
| Invoke API call | 3844 | 2023 | 1061 | 613 |
| Billed duration | 9213 | 4555 | 2349 | 1648 |
As I said above, the primary benefit that you get from increasing memory is that you increase CPU at the same time. Creating and initializing an SDK client is CPU-intensive, so the more CPU you can give it, the better.
Update 2: this morning I tried compiling a simple AWS program with GraalVM. It took several minutes to build the stand-alone executable, and even then it created a "fallback image" (which has an embedded JDK) due to dependencies of the AWS SDK. When I compared runtimes, there was no difference between running with standard Java.
Bottom line: use Java for things that will run long enough to benefit from Hotspot. Use a different language (Python, JavaScript, perhaps Go) for things that are short-running and need low latency.