Performance Techniques
First, it depends on which JVM you are talking about, since there are several - but I'm going to assume you mean Oracle HotSpot (and in any case, the other top-tier JVMs will use similar techniques).
For that JVM, this list from the HotSpot internal wiki provides a great start (and the child pages go into detail on some of the more interesting techiques). If you are just looking for a laundry list of tricks, the wiki has that too, although to make sense of them you'll probably have to google the individual terms.
Not all of these have been implemented recently, but some of the big ones have (range check elision, escape analysis, superword optimizations) - at least for a loose definition of "recently".
Next let's take a look at the relative performance picture when it comes to C/C++ vs Java, and why the techniques above help either help narrow the gap or in some cases actually give Java and intrinsic advantage over native-compiled languages.
Java vs C/C++
At a high level, the optimizations are a mix of things that you'd see in any decent compiler for native languages like C and C++, along with things that are needed to reduce the impacts of Java/JVM specific features and safety checks, such as:
- Escape analysis which mitigates (somewhat) the no stack allocation for objects
- Inline caches, and class hierarchy analysis, which mitigate "every function is virtual"
- Range check elimination, which mitigates "every array access is range checked"
Many of these JVM-specific* optimizations only help bring the JVM up to parity with native languages, in that they are addressing hurdles the native languages don't have to deal with. A few optimizations, however, are things that a statically compiled language can't manage (or can manage in some cases only with profile-guided optmization, which is rare and is necessarily one-size-fits-all anyway):
- Dynamic inlining of only the hottest code
- Code generation based on actual branch/switch frequencies
- Dynamic generation of CPU/instruction set aware code (even CPU features released after the code was compiled!)1
- Elision of never-executed code
- Injection of pre-fetch instructions interleaved with application code
- The whole family of techniques supported by safepointing
The consensus seems to be that Java often produces code similar in speed to good C++ compilers at a moderate optimization level, such as gcc -O2, although a lot depends on the exact benchmark. Modern JVMs like HotSpot tends to excel at low level array traversal and math (as long as the competing compiler isn't vectorizing - that's hard to beat), or in scenarios with heavy object allocation when the competing code is doing a similar number of allocations (JVM object allocation + GC is generally faster than malloc), but falls down when the memory penalty of typical Java applications is a factor, where stack allocation is heavily used, or where vectorizing compilers or intrinsics tip the scales towards the native code.
If you search for Java vs C performance, you'll find plenty of people who have tackled this question, with varying levels of rigor. Here's the first one I stumbled across, which seems to show a rough tie between gcc and HotSpot (even at -O3 in this case). This post and the linked discussions is probably a better start if you want to see how a single benchmark can go through several iterations in each language, leapfrogging each other - and shows some of the limits of optimization on both sides.
*well not really JVM-specific - most would also apply to other safe or managed languages like the CLR
1 This particular optimization is becoming more and more relevant as new instruction sets (particularly SIMD instructions, but there are others) are being released with some frequency. Automatic vectorization can speed up some codes massively, and while Java has been slow off the mark here, they are at least catching up a bit.