What makes recent versions of JVM faster?

Asked 22/4, 2013 at 3:17 Answered 9/5, 2013 at 7:54

Solved java performance scala jvm native-code

I've seen multiple claims recently that talk about how the Java (and JVM-based languages such as Scala) are comparable in performance to C/C++ code.

For example, from the description of the ScalaLab project:

The speed of Scala based scripting, that approaches the speed of native and optimized Java code, and thus is close to, or even better from C/C++ based scientific code!

Can someone point me to a summary of what these JVM optimizations are? Are there any real benchmarks supporting this claim or providing some real-world comparison?

Woald answered 22/4, 2013 at 3:17 Comment(1)

Like all benchmarks, it depends on your data. I don't think this can be answered in the current form. – Goshorn 22/4, 2013 at 3:18

Performance Techniques

First, it depends on which JVM you are talking about, since there are several - but I'm going to assume you mean Oracle HotSpot (and in any case, the other top-tier JVMs will use similar techniques).

For that JVM, this list from the HotSpot internal wiki provides a great start (and the child pages go into detail on some of the more interesting techiques). If you are just looking for a laundry list of tricks, the wiki has that too, although to make sense of them you'll probably have to google the individual terms.

Not all of these have been implemented recently, but some of the big ones have (range check elision, escape analysis, superword optimizations) - at least for a loose definition of "recently".

Next let's take a look at the relative performance picture when it comes to C/C++ vs Java, and why the techniques above help either help narrow the gap or in some cases actually give Java and intrinsic advantage over native-compiled languages.

Java vs C/C++

At a high level, the optimizations are a mix of things that you'd see in any decent compiler for native languages like C and C++, along with things that are needed to reduce the impacts of Java/JVM specific features and safety checks, such as:

Escape analysis which mitigates (somewhat) the no stack allocation for objects
Inline caches, and class hierarchy analysis, which mitigate "every function is virtual"
Range check elimination, which mitigates "every array access is range checked"

Many of these JVM-specific* optimizations only help bring the JVM up to parity with native languages, in that they are addressing hurdles the native languages don't have to deal with. A few optimizations, however, are things that a statically compiled language can't manage (or can manage in some cases only with profile-guided optmization, which is rare and is necessarily one-size-fits-all anyway):

Dynamic inlining of only the hottest code
Code generation based on actual branch/switch frequencies
Dynamic generation of CPU/instruction set aware code (even CPU features released after the code was compiled!)¹
Elision of never-executed code
Injection of pre-fetch instructions interleaved with application code
The whole family of techniques supported by safepointing

The consensus seems to be that Java often produces code similar in speed to good C++ compilers at a moderate optimization level, such as gcc -O2, although a lot depends on the exact benchmark. Modern JVMs like HotSpot tends to excel at low level array traversal and math (as long as the competing compiler isn't vectorizing - that's hard to beat), or in scenarios with heavy object allocation when the competing code is doing a similar number of allocations (JVM object allocation + GC is generally faster than malloc), but falls down when the memory penalty of typical Java applications is a factor, where stack allocation is heavily used, or where vectorizing compilers or intrinsics tip the scales towards the native code.

If you search for Java vs C performance, you'll find plenty of people who have tackled this question, with varying levels of rigor. Here's the first one I stumbled across, which seems to show a rough tie between gcc and HotSpot (even at -O3 in this case). This post and the linked discussions is probably a better start if you want to see how a single benchmark can go through several iterations in each language, leapfrogging each other - and shows some of the limits of optimization on both sides.

*well not really JVM-specific - most would also apply to other safe or managed languages like the CLR

¹ This particular optimization is becoming more and more relevant as new instruction sets (particularly SIMD instructions, but there are others) are being released with some frequency. Automatic vectorization can speed up some codes massively, and while Java has been slow off the mark here, they are at least catching up a bit.

Parra answered 22/4, 2013 at 3:40 Comment(0)

Actual performance of course depends on benchmarks and differs by application. But it is easy to see how JIT VMs can be just as fast as statically compiled code, at least in theory.

The main strength of JIT code is that it can optimize based on information known only at runtime. In C when you link against a DLL, you'll have to make that function call every time. In a dynamic language, the function can be inlined, even if it's a function that was loaded at runtime, thanks to just in time compilation.

Another example is optimizing based on runtime values. In C/C++ you use a preprocessor macro to disable asserts and have to recompile if you want to change this option. In Java, asserts are handled by setting a private boolean field and then putting an if branch in the code. But since the VM can compile a version of the code that either included or doesn't include the assert code depending on the value of the flag, there is little or no performance hit.

Another major VM innovation is polymorphic inlining. Idomatic Java is focused very heavily on small wrapper methods like getters and setters. In order to acheive good performance, inlining them is obviously necessary. Not only can the VM inline polymorphic functions in the common case where only one type is actually being called, it can inline code that calls multiple different types, by including an inline cache with the appropriate code. If the code ever starts operating on lots of different types, the VM can detect this and fallback to slower virtual dispatch.

A static compiler of course can do none of this. Powerful static analysis only gets you so far. This isn't just limited to Java either, though it's the most obvious example. Google's V8 vm for Javascript is also pretty fast. Pypy aims to do the same for Python and Rubinius for Ruby, but they're not quite there (it helps when you have a big corporation backing you).

Grugru answered 22/4, 2013 at 3:38 Comment(0)

I would add that hotspot, jrockit and IBM's JVM all perform heap compression in the GC. I recently ported some heavy math code to Scala because of this reason. If you intend to run any large application I would strongly reccomend Java. You might regret using CLR when deploying to a server or scaling up, especially if its memory intensive.

Also with respect to native code the JVM configuration options are excellent.

Caress answered 9/5, 2013 at 7:54 Comment(0)

Performance Techniques

Java vs C/C++

Recommended topics

Hot tags