Why is Java faster if it repeats the same code?
Asked Answered
C

3

11

Given the following code:

public class Test{

    static int[] big = new int [10000];

    public static void main(String[] args){
        long time;
        for (int i = 0; i < 16; i++){
            time = System.nanoTime();
            getTimes();
            System.out.println(System.nanoTime() - time);
        }    
    }
    public static void getTimes(){
        int d;
        for (int i = 0; i < 10000; i++){
            d = big[i];
        }    
    }
}

The output shows a decreasing duration trend:

171918
167213
165930
165502
164647
165075
203991
70563
45759
43193
45759
44476
45759
52601
47897
48325

Why is the same code in getTimes being executed in less than one third of the time after it has been executed 8 times or more? (Edit: It does not always happen at the 8th time, but from the 5th to 10th)

Cassey answered 15/6, 2015 at 11:35 Comment(12)
Can you consistently reproduce this behaviour?Waits
I did this very often, there is always a point where the time gets much smaller.Cassey
Looks like the JIT to me. A very poignant demonstration as to why you should never benchmark Java yourself but use appropriate microbenching tools such as jmh.Amphibolite
You're not using d for anything, so the whole loop could be simply removed by the JIT compiler as dead code. See also avoiding benchmarking pitfalls.Chrissychrist
@MickMnemonic and that will make it faster in 8th or 5 to 10 runs..?Coopt
And why isn't removed it in the first or second iteration but in the 8th?Cassey
@VirajNalawade why second? Why not 10'000th? Because the JIT will optimise heavily used code more. If I run a loop twice then dead code analysis is likely more costly than just running it. Once I run it some threshold number of times, the JIT will begin optimizing the code. This threshold is, to some extent, configurable.Amphibolite
The JIT compiler will compile code that is used often enough to provide a speedup compared to just interpreting the code. That it happens between the 5th and 10th iteration might just depend on the time that compilation needs, i.e. when the compiled version becomes available.Pyrogallate
I'm even suprised that JIT didn't optimized away getTimes since it's not having any lasting affetAinsley
Maybe you can use the options described here to track what's going on (i.e. if and when the JIT kicks in etc.): #1503979Pyrogallate
This question could probably be considered as "nearly duplicate" of others (related, although probably not the best matching one...). So just as a hint: Let the loop run for (int i=0; i<5000; i++), and start it with java -server YourProgram, to see how your "code" seems to be executed in 977 nanoseconds (on windows, at least). (This is the time that light takes to travel 300 meters, by the way)Backslide
Added a "slightly" more detailed answer. Comments are welcome.Blindly
B
13

The fact that what you see is the result of some JIT optimization should be clear by now looking at all the comments you received. But what is really happening and why that code is optimized always nearly after the same amount of iterations of the outer for?

I'll try to answer both questions but please remember that everything explained here is relative only to Oracle's Hotspot VM. There is no Java specification that defines how a JVM JIT should behave.

First of all, let's see what the JIT is doing running that Test program with some additional flag (the plain JVM is enough to run this, no need to load the debugging shared library, required for some of the UnlockDiagnosticVMOptions options):

java -XX:+PrintCompilation Test

The execution completes with this output (removing a few lines at the beginning that show that other methods are being compiled):

[...]
195017
184573
184342
184262
183491
189494
    131   51%      3       Test::getTimes @ 2 (22 bytes)
245167
    132   52       3       Test::getTimes (22 bytes)
165144  

65090
    132   53       1       java.nio.Buffer::limit (5 bytes)
59427
    132   54%      4       Test::getTimes @ 2 (22 bytes)  
75137
48110    
    135   51%     3        Test::getTimes @ -2 (22 bytes)   made not entrant

    142   55       4       Test::getTimes (22 bytes)
150820
86951
90012
91421

The printlns from your code are interleaved with diagnostic information related to the compilation the JIT is performing. Looking at a single line:

131    51%      3       Test::getTimes @ 2 (22 bytes)

Each column has the following meaning:

  1. Timestamp
  2. Compilation Id (with additional attributes if needed)
  3. Tiered compilation level
  4. Method short name (with @osr_bci if available)
  5. Compiled method size

Keeping only the lines related to getTimes:

    131   51%      3       Test::getTimes @ 2 (22 bytes)
    132   52       3       Test::getTimes (22 bytes)
    132   54%      4       Test::getTimes @ 2 (22 bytes)     
    135   51%      3       Test::getTimes @ -2 (22 bytes)   made not entrant
    142   55       4       Test::getTimes (22 bytes)

It's clear that getTimes is being compiled more than once, but every time it's compiled in a different way.

That % symbol means that on-stack replacement(OSR) has been performed, meaning that the 10k loop contained in getTimes have been compiled isolated from the rest of the method and that the JVM replaced that section of the method code with the compiled version. The osr_bci is an index that points to this new compiled block of code.

The next compilation is a classic JIT compilation that compiles all the getTimes method (the size is still the same because there is nothing else in that method other than the loop).

The third time another OSR is performed but at a different tiered level. Tiered compilation have been added in Java7 and basically allows the JVM to choose the client or server JIT mode at runtime, switching freely between the two when necessary. The Client mode performs a simpler set of optimization strategies while the server mode is able to apply more sophisticated optimizations that on the other hand have a bigger cost in term of time spent compiling.

I will not go into details about the different modes or about tiered compilation, if you need additional information i recommend Java Performance: The Definitive Guide by Scott Oaks and also check this question that explain what changes between the levels.

Back to the output of PrintCompilation, the gist here is that from a certain point in time, a sequence of compilations with increasing complexity is performed until the method becomes apparently stable (i.e. the JIT doesn't compile it again).

So, why all this start at that certain point in time, after 5-10 iteration of the main loop?

Because the inner getTimes loop has become "hot".

The Hotspot VM, usually defines "hot" those methods that have been invoked at least 10k times (that's the historical default threshold, can be changed using -XX:CompileThreshold=<num>, with tiered compilation there are now multiple thresholds) but in the case of OSR i'm guessing that it's performed when a block of code is deemed "hot" enough, in term of absolute or relative execution time, inside the method contains it.

Additional References

PrintCompilation Guide by Krystal Mok

Java Performance: The Definitive Guide

Blindly answered 15/6, 2015 at 21:31 Comment(1)
I second the recommendation, Java Performance: TPG is indeed invaluable whether you really need to squeeze out more performance from your application or you're just nosey.Vaporize
L
4

The JIT (Just in Time) Compiler of the virtual machine optimizes the intepretation of the Java Byte Code. For example, if you have an if() statement, which is false in about 99% of cases, the jit optimizes your code for the false case, which makes your true cases eventually slower. Sorry for the bad english.

Learned answered 15/6, 2015 at 11:43 Comment(2)
Could you please provide some reference?Coopt
Read Inside the Java 2 Virtual Machine by Bill Venners. Though the java version is old, It will give you good Idea of internal working of JVM.Eavesdrop
S
0

Example : Code Before Optimization

class A {
  B b;
  public void newMethod() {
    y = b.get();  //calling get() function
    ...do stuff...
    z = b.get();   // calling again
    sum = y + z;
  }
}
class B {
   int value;
   final int get() {
      return value;
   }
}

Example : Code After Optimization

class A {
B b;
public void newMethod() {
   y = b.value;
   ...do stuff...
   sum = y + y; 
}
}
class B {
   int value;
   final int get() {
      return value;
   }
}

Originally, the code contained two calls to the b.get() method. After optimization, the two method calls are optimized into a single variable-copy operation; that is, the optimized code does not need to perform a method call to acquire the field value of class B.

Read more

Spool answered 16/6, 2015 at 4:52 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.