JVM startup performance can be improved in many ways: CDS, straight up tuning, CPU pinning and jlink (since JDK 9) can be good options. AOT (JDK 9, Linux-only) has some rough edges but can be tuned to help small applications.
CDS
Class Data Sharing was developed for the client VM back in 1.5, but has been much improved since to work on all variants of HotSpot (all GCs since 8), bringing a substantial boost to startup performance when enabled.
CDS is always enabled on 32-bit JRE client installs, but might need some manual steps to be enabled elsewhere:
- Run
java -Xshare:dump
to generate a CDS shared archive
- Add
-Xshare:auto
to your command to ensure it's used
Other tuning
While -client
may or may not actually do anything (the JVM on many systems doesn't ship with a client VM - none since 9, actually), there are numerous ways to tune a HotSpot server JVM to behave more like a client VM:
- Use only the C1 (client) compiler:
-XX:TieredStopAtLevel=1
- Use as few compiler threads as possible:
-XX:CICompilerCount=1
- Use the single-threaded serial GC:
-XX:+UseSerialGC
- Limit heap usage (especially on large systems), e.g.,
-Xmx512m
This should be enough to boost startup for a small short-running application, but may have very negative effects on peak performance. You can of course get even further by disabling features you might not be using, such as -XX:-UsePerfData
(disables some runtime information retrievable using MXBeans and jvmstat).
Advanced
jlink is a new tool available in Java 9 which allows you to build custom runtime images. If your console application only uses a small subset of JDK modules, a custom runtime can be made very small, which can improve startup times further. A minimal image including only the java.base module, and might boost startup times by ~10-20ms depending on hardware and other tuning: $JAVA_HOME/bin/jlink --add-modules java.base --module-path $JAVA_HOME/jmods --output jbase
(Linux only) Java 9 introduces an experimental AOT compiler, jaotc
, which can be used to help applications boot faster and spend a lot less cycles doing so. Out-of-the-box it might slow down immediate startup (since the AOT'd code is a shared library which adds its own I/O overheads, doesn't support the Serial GC..), but with careful tuning we've seen it reduce start-up times of small applications by 15-30%.
CPU pinning: on large systems, we've seen effects that stem from cache coherence traffic between sockets, and binding to a single CPU node can cut startup times considerably. On Linux something like numactl --cpunodebind=0 $JAVA_HOME/bin/java ...
should do the trick.
All in all we've been able to get minimal applications to execute in as little as 35ms (JDK 9 GA). Various startup optimizations has gone into the JDK10 branch and I'm now seeing numbers as low as 28ms.