Why does the JVM require warmup?

Asked 24/3, 2016 at 10:43 Answered 24/3, 2016 at 19:56

Solved java garbage-collection jvm low-latency hft

I understand that in the Java virtual machine (JVM), warmup is potentially required as Java loads classes using a lazy loading process and as such you want to ensure that the objects are initialized before you start the main transactions. I am a C++ developer and have not had to deal with similar requirements.

However, the parts I am not able to understand are the following:

Which parts of the code should you warm up?
Even if I warm up some parts of the code, how long does it remain warm (assuming this term only means how long your class objects remain in-memory)?
How does it help if I have objects which need to be created each time I receive an event?

Consider for an example an application that is expected to receive messages over a socket and the transactions could be New Order, Modify Order and Cancel Order or transaction confirmed.

Note that the application is about High Frequency Trading (HFT) so performance is of extreme importance.

Gesundheit answered 24/3, 2016 at 10:43 Comment(2)

You may find -XX:+PrintCompilation useful for tracing compilation behavior. You also might want to contact oracle, they're working on an AOT compiler, currently only offered to commercial customers AIUI. I think some other JVM vendors also offer AOT. – Irish 24/3, 2016 at 14:46

Also see these links: jvm options on Windows, jvm options on Linux, Solaris, Mac – Trinitrotoluene 24/3, 2016 at 22:34

Which parts of the code should you warm up?

Usually, you don't have to do anything. However for a low latency application, you should warmup the critical path in your system. You should have unit tests, so I suggest you run those on start up to warmup up the code.

Even once your code is warmed up, you have to ensure your CPU caches stay warm as well. You can see a significant slow down in performance after a blocking operation e.g. network IO, for up to 50 micro-seconds. Usually this is not a problem but if you are trying to stay under say 50 micro-seconds most of the time, this will be a problem most of the time.

Note: Warmup can allow Escape Analysis to kick in and place some objects on the stack. This means such objects don't need to be optimised away. It is better to memory profile your application before optimising your code.

Even if I warm up some parts of the code, how long does it remain warm (assuming this term only means how long your class objects remain in-memory)?

There is no time limit. It depends on whether the JIt detects whether the assumption it made when optimising the code turned out to be incorrect.

How does it help if I have objects which need to be created each time I receive an event?

If you want low latency, or high performance, you should create as little objects as possible. I aim to produce less than 300 KB/sec. With this allocation rate you can have an Eden space large enough to minor collect once a day.

Consider for an example an application that is expected to receive messages over a socket and the transactions could be New Order, Modify Order and Cancel Order or transaction confirmed.

I suggest you re-use objects as much as possible, though if it's under your allocation budget, it may not be worth worrying about.

Note that the application is about High Frequency Trading (HFT) so performance is of extreme importance.

You might be interested in our open source software which is used for HFT systems at different Investment Banks and Hedge Funds.

http://chronicle.software/

My production application is used for High frequency trading and every bit of latency can be an issue. It is kind of clear that at startup if you don't warmup your application, it will lead to high latency of few millis.

In particular you might be interested in https://github.com/OpenHFT/Java-Thread-Affinity as this library can help reduce scheduling jitter in your critical threads.

And also it is said that the critical sections of code which requires warmup should be ran (with fake messages) atleast 12K times for it to work in an optimized manner. Why and how does it work?

Code is compiled using background thread(s). This means that even though a method might be eligible for compiling to native code, it doesn't mean that it has done so esp on startup when the compiler is pretty busy already. 12K is not unreasonable, but it could be higher.

Killy answered 24/3, 2016 at 17:20 Comment(6)

Thanks @PeterLawrey for such detailed explanation. Only other thing that I would like to ask as follow-up question is whether it can be logged or monitored which section of your code is warmed up. – Gesundheit 25/3, 2016 at 0:43

@Gesundheit every part of the critical path should be warmed up including the TCP connection esp those which don't get called often. I suggest Chronicle Queue for low latency logging and persisted messaging. – Killy 25/3, 2016 at 10:6

@PeterLawrey sir i remember in one of your videos you had some solution (probably developed by you) that was assisting the warmup, i remember something like it would save the profile and on next startup jvm would precompile some methods using previous profile hence not waiting for 10k executions... i might be completely off because this was long time ago... Did you have something for jvm warmups? – Nutriment 6/8, 2017 at 13:22

@Nutriment I have tool which uses the WhiteBox class, however a) it's not supported and b) doesn't seem to help much. The best solution is to warm up the code with a realistic load yourself. i.e. use one of your load tests. – Killy 7/8, 2017 at 16:26

@PeterLawrey "Even once your code is warmed up, you have to ensure your CPU caches stay warm as well" how to ensure that CPU cache stays warm? – Boschbok 22/2, 2021 at 8:46

@GovindaSakhare turn off power management, use isolated CPUs, use the "performance" governor, busy wait, periodically run the code to keep it in cache. Here is an article I wrote about how the same piece of code can run at very different speeds depending on what is run before it. chronicle.software/… – Killy 1/3, 2021 at 11:36

Warming refers to having a piece of code run enough times that the JVM stops interpreting and compiles to native (at least for the first time). Generally that's something you don't want to do. The reason is that the JVM gathers statistics about the code in question that it uses during code generation (akin to profile guided optimizations). So if a code chunk in question is "warmed" with fake data which has different properties than the real data you could well be hurting performance.

EDIT: Since the JVM cannot perform whole-program static analysis (it can't know what code is going to be loaded by the application) it can instead make some guesses about types from the statistics it has gathered. As an example when calling a virtual function (in C++ speak) at an exact calling location and it determines that all types have the same implementation, then the call is promoted to direct call (or even inlined). If later that assumption if proven to be wrong, then the old code must be "uncompiled" to behave properly. AFAIK HotSpot classifies call-sites as monomorphic (single implementation), bi-morphic (exactly two..transformed into if (imp1-type) {imp1} else {imp2} ) and full polymorphic..virtual dispatch.

And there's another case in which recompiling occurs..when you have tiered-compilation. The first tier will spend less time trying to produce good code and if the method is "hot-enough" then the more expensive compile-time code generator kicks in.

Hebrides answered 24/3, 2016 at 11:0 Comment(7)

I guess the number of times could be 12K times at the least. Once JVM compiles the code to native, is it assured to remain in this state for the rest of the processing time? – Gesundheit 24/3, 2016 at 11:5

@Gesundheit No. JIT is free to "uncompile" (and recompile) code if it deems it necessary. – Hereabouts 24/3, 2016 at 11:8

@Hereabouts - Are there any techniques / methods that one can follow during development in order to prolong the time that JIT does not uncompile / recompile the code? – Gesundheit 24/3, 2016 at 11:20

@Gesundheit You can control JIT with runtime parameters, but it's not my strongest area of expertise. Try them out and see. – Hereabouts 24/3, 2016 at 11:24

This is old by might be able to answer some of your questions: slideshare.net/ZeroTurnaround/… – Hebrides 24/3, 2016 at 12:24

Updated my answer with an overview of basic uncompling/recompling issues – Hebrides 24/3, 2016 at 13:9

Generally speaking, you should assume the JIT is smarter than you and keeps the important parts of the program as optimized as they need to be. – Honan 24/3, 2016 at 16:25

Warm-up is rarely required. It's relevant when doing for example performance tests, to make sure that the JIT warm-up time doesn't skew the results.

In normal production code you rarely see code that's meant for warm-up. The JIT will warm up during normal processing, so there's very little advantage to introduce additional code just for that. In the worst case you might be introducing bugs, spending extra development time and even harming performance.

Unless you know for certain that you need some kind of warm-up, don't worry about it. The example application you described certainly doesn't need it.

Hereabouts answered 24/3, 2016 at 10:57 Comment(4)

This is perhaps not true. My production application is used for High frequency trading and every bit of latency can be an issue. It is kind of clear that at startup if you don't warmup your application, it will lead to high latency of few millis. Once warmed up and after JVM has optimized, the code gives the right level of performance. I am interested to find out why and how? – Gesundheit 24/3, 2016 at 11:1

@Gesundheit If you're writing HFT code in Java, you're almost certainly using custom techniques such as Chronicle to manage resources manually, and none of the standard JVM advice applies. – Waldner 24/3, 2016 at 11:5

If you're dealing with HFT, then you should say that in your question. It's an entirely different beast, and normal Java rules don't necessarily apply anymore. You might want to look at OpenHFT for additional information (it's the work of a StackOverflow frequent flyer Mr. Peter Lawrey). – Hereabouts 24/3, 2016 at 11:7

@chrylis - yes, there are various techniques that we have used for object creation or so called for reduced gc. Nevertheless, warmup is still done and is required. – Gesundheit 24/3, 2016 at 11:9

Why JVM requires warmup?

Modern (J)VMs gather statistics at runtime about which code is used most often and how it is used. One (of hundreds if not thousands) example is optimization of calls to virtual functions (in C++ lingo) which have only on implementation. Those statistics can by their definition only gathered at run time.

Class loading itself is part of the warm up as well, but it obviously automatically happens before the execution of code inside those classes, so there is not much to worry about

Which parts of the code should you warmup?

The part that is crucial for the performance of your application. The important part is to "warm it up" just the same way as it is used during normal usage, otherwise the wrong optimizations will be done (and undone later on).

Even if I warmup some parts of the code, how long does it remain warm (assuming this term only means how long your class objects remain in-memory)?

This is really hard to say basically the JIT compiler constantly monitors execution and performance. If some threshhold is reached it will try to optimize things. It will then continue to monitor performance to verify that the optimization actually helps. If not it might unoptimize the code. Also things might happen, that invalidate optimizations, like loading of new classes. I'd consider those things not predictable, at least not based on a stackoverflow answer, but there are tools the tell you what the JIT is doing: https://github.com/AdoptOpenJDK/jitwatch

How does it help if I have objects which need to be created each time I receive an event.

One simple example could be: you create objects inside a method, since a reference leaves the scope of the method, those objects will get stored on the heap, and eventually collected by the garbage collector. If the code using those objects is heavily used, it might end up getting inlined in a single big method, possibly reordered beyond recognition, until these Objects only live inside this method. At that point they can be put on the stack and get removed when the method exits. This can save huge amounts of garbage collection and will only happen after some warm up.

With all that said: I'm skeptical on the notion that one needs to do anything special for warming up. Just start your application, and use it and the JIT compiler will do it's thing just fine. If you experience problems, then learn what the JIT does with your application and how to fine tune that behavior or how to write your application so that it benefits the most.

The only case where I actually know about the need for warm up are benchmarks. Because if you neglect it there you will get bogus results almost guaranteed.

Typebar answered 24/3, 2016 at 11:9 Comment(0)

It is all about JIT compiler, which is used on the JVM to optimize bytecode in the runtime (because javac can't use advanced or agressive optimization technics due to platform-independent nature of the bytecode)

you can warmup the code that will process your messages. Actually, in most cases you don't need do to it by special warm-up cycles: just let the application to start and process some of the first messages - JVM will try to do its best to analyse code execution and make optimizations :) Manual warm-up with fake samples can yield even worse results
code will be optimized after some amount of time and will be optimized until some event in the program-flow would degradate code state (after it JIT compiler will try to optimize the code again - this process never ends)
short-living objects are subjects to be optimized too but generally it should help your message processing tenured code to be more efficient

Malone answered 24/3, 2016 at 10:49 Comment(3)

I came across this JVM flag -xx:CompileThreshold which is set by default to 10000. Does this have anything to do with what you mentioned in #1. – Gesundheit 24/3, 2016 at 10:54

And also it is said that the critical sections of code which requires warmup should be ran (with fake messages) atleast 12K times for it to work in an optimized manner. Why and how does it work? – Gesundheit 24/3, 2016 at 10:56

it is not recommended to change default JIT options if you're not sure what are you doing - Sun and Oracle team has a great expertise and code base to empirically find the values which are generally good and offer best efficiency. Of course you can lower CompileThreshold value but there are allways drawbacks - larger memory consumption, for example – Malone 24/3, 2016 at 11:8

Which parts of the code should you warmup?

There is no answer to this question in general. It depends entirely on your application.

Even if I warmup some parts of the code, how long does it remain warm (assuming this term only means how long your class objects remain in-memory)?

Objects remain in memory for as long as your program has a reference to them, absent any special weak-reference use or something similar. Learning about when your program "has a reference" to something can be a little more obscure than you might think at first glance, but it is the basis for memory management in Java and worth the effort.

How does it help if I have objects which need to be created each time I receive an event.

This is entirely dependent on the application. There is no answer in general.

I encourage you to study and work with Java to understand things like classloading, memory management, and performance monitoring. It takes some amount of time to instantiate an object, in general it takes more time to load a class (which, of course, is usually done far less often). Usually, once a class is loaded, it stays in memory for the life of the program -- this is the sort of thing that you should understand, not just get an answer to.

There are also techniques to learn if you don't know them already. Some programs use "pools" of objects, instantiated before they're actually needed, then handed off to do processing once the need arises. This allows a time-critical portion of the program to avoid the time spent instantiating during the time-critical period. The pools maintain a collection of objects (10? 100? 1000? 10000?), and instantiate more if needed, etc. But managing the pools is a significant programming effort, and, of course, you occupy memory with the objects in the pools.

It would be entirely possible to use up enough memory to trigger garbage collection much more often, and SLOW THE SYSTEM YOU WERE INTENDING TO SPEED UP. This is why you need to understand how it works, not just "get an answer".

Another consideration -- by far most of the effort put into making programs faster is wasted, as in not needed. Without extensive experience with the application being considered, and/or measurement of the system, you simply do not know where (or whether) optimization will even be noticeable. System/program design to avoid pathological cases of slowness ARE useful, and don't take nearly the time and effort of 'optimization'. Most of the time it is all any of us need.

-- edit -- add just-in-time compilation to the list of things to study and understand.

Uppercase answered 24/3, 2016 at 11:6 Comment(1)

thank you for this. I will take each of the subjects you have mentioned one at a time. – Gesundheit 24/3, 2016 at 11:51

I always pictured it like the following:

You as (a C++ developer) could imagine an automated iterative approach by the jvm compiling/hotloading/replacing various bits an pieces with (the imaginary analog of) gcc -O0,-O1,-O2,-O3 variants (and sometimes reverting them if it deems it neccessary)

I'm sure this it not strictly what happens but might be an useful analogy for a C++ dev.

On a standard jvm the times it takes for a snippet to be considered for jit is set by -XX:CompileThreshold which is 1500 by default. (Sources and jvm versions vary - but I think thats for jvm8)

Further a book which I have at hand states under Host Performace JIT Chapter (p59) that the following optimizations are done during JIT:

Inlining
Lock elimination
Virtual call elimination
Non-volatile memory write elimination
Native code generation

EDIT:

regarding comments

I think 1500 may be just enough to hint to JIT that it should compile the code into native and stop interpreting. would you agree?

I don't know if its just a hint, but since openjdk is open-source lets look at the various limits and numbers in globals.hpp#l3559@ver-a801bc33b08c (for jdk8u)

(I'm not a jvm dev this might be the completly wrong place to look)

Compiling a code into native does not necessarily mean it is also optimized.

To my understanding - true; especially if you mean -Xcomp (force compile) - this blog even states that it prevents the jvm from doing any profiling - hence optimizing - if you do not run -Xmixed (the default).

So a timer kicks in to sample frequently accessed native code and optimize the same. Do you know how we can control this timer interval?

I really don't know the details, but the gobals.hpp I linked indeed defines some frequency intervals.

Julijulia answered 24/3, 2016 at 19:56 Comment(1)

Thanks. This is easy and useful analogy. I have two questions in this regards. 1. I think 1500 may be just enough to hint to JIT that it should compile the code into native and stop interpreting. would you agree? 2. Compiling a code into native does not necessarily mean it is also optimized. So a timer kicks in to sample frequently accessed native code and optimize the same. Do you know how we can control this timer interval? – Gesundheit 25/3, 2016 at 1:13

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags