Why do two consecutive calls to the same method yield different times for execution?
Asked Answered
H

8

2

Here is a sample code:

public class TestIO{
public static void main(String[] str){
    TestIO t = new TestIO();
    t.fOne();
    t.fTwo();
    t.fOne();
    t.fTwo();
}


public void fOne(){
    long t1, t2;
    t1 = System.nanoTime();
    int i = 10;
    int j = 10;
    int k = j*i;
    System.out.println(k);
    t2 = System.nanoTime();
    System.out.println("Time taken by 'fOne' ... " + (t2-t1));
}

public void fTwo(){
    long t1, t2;
    t1 = System.nanoTime();
    int i = 10;
    int j = 10;
    int k = j*i;
    System.out.println(k);
    t2 = System.nanoTime();
    System.out.println("Time taken by 'fTwo' ... " + (t2-t1));
}

}

This gives the following output: 100 Time taken by 'fOne' ... 390273 100 Time taken by 'fTwo' ... 118451 100 Time taken by 'fOne' ... 53359 100 Time taken by 'fTwo' ... 115936 Press any key to continue . . .

Why does it take more time (significantly more) to execute the same method for the first time than the consecutive calls?

I tried giving -XX:CompileThreshold=1000000 to the command line, but there was no difference.

Ha answered 29/4, 2009 at 22:48 Comment(0)
J
7

There are several reasons. The JIT (Just In Time) compiler may not have run. The JVM can do optimizations that differ between invocations. You're measuring elapsed time, so maybe something other than Java is running on your machine. The processor and RAM caches are probably "warm" on subsequent invocations.

You really need to make multiple invocations (in the thousands) to get an accurate per method execution time.

Jaredjarek answered 29/4, 2009 at 22:51 Comment(0)
T
7

The issues mentioned by Andreas and the unpredictability of JIT are true, but still one more issue is the class loader:

The first call to fOne differs radically from the latter ones, because that is what makes the first call to System.out.println, which means that is when the class loader will from disk or file system cache (usually it's cached) all the classes that are needed to print the text. Give the parameter -verbose:class to the JVM to see how many classes are actually loaded during this small program.

I've noticed similar behavior when running unit tests - the first test to call a big framework takes much longer (in case of Guice about 250ms on a C2Q6600), even though the test code would be the same, because the first invocation is when hundreds of classes are loaded by the class loader.

Since your example program is so short, the overhead probably comes from the very early JIT optimizations and the class loading activity. The garbage collector will probably not even start before the program has ended.


Update:

Now I found a reliable way to find out what is really taking the time. Nobody had yet found out it, although it's closely related to class loading - it was dynamic linking of native methods!

I modified the code as follows, so that the logs would show when the tests begin and end (by looking when those empty marker classes are loaded).

    TestIO t = new TestIO();
    new TestMarker1();
    t.fOne();
    t.fTwo();
    t.fOne();
    t.fTwo();
    new TestMarker2();

The command for running the program, with the right JVM parameters that show what is really happening:

java -verbose:class -verbose:jni -verbose:gc -XX:+PrintCompilation TestIO

And the output:

* snip 493 lines *
[Loaded java.security.Principal from shared objects file]
[Loaded java.security.cert.Certificate from shared objects file]
[Dynamic-linking native method java.lang.ClassLoader.defineClass1 ... JNI]
[Loaded TestIO from file:/D:/DEVEL/Test/classes/]
  3       java.lang.String::indexOf (166 bytes)
[Loaded TestMarker1 from file:/D:/DEVEL/Test/classes/]
[Dynamic-linking native method java.io.FileOutputStream.writeBytes ... JNI]
100
Time taken by 'fOne' ... 155354
100
Time taken by 'fTwo' ... 23684
100
Time taken by 'fOne' ... 22672
100
Time taken by 'fTwo' ... 23954
[Loaded TestMarker2 from file:/D:/DEVEL/Test/classes/]
[Loaded java.util.AbstractList$Itr from shared objects file]
[Loaded java.util.IdentityHashMap$KeySet from shared objects file]
* snip 7 lines *

And the reason for that time difference is this: [Dynamic-linking native method java.io.FileOutputStream.writeBytes ... JNI]

We can also see, that the JIT compiler is not affecting this benchmark. There are only three methods which are compiled (such as java.lang.String::indexOf in the above snippet) and they all happen before the fOne method is called.

Thus answered 29/4, 2009 at 22:48 Comment(0)
J
7

There are several reasons. The JIT (Just In Time) compiler may not have run. The JVM can do optimizations that differ between invocations. You're measuring elapsed time, so maybe something other than Java is running on your machine. The processor and RAM caches are probably "warm" on subsequent invocations.

You really need to make multiple invocations (in the thousands) to get an accurate per method execution time.

Jaredjarek answered 29/4, 2009 at 22:51 Comment(0)
C
5
  1. The code tested is quite trivial. the most expensive action taken is

     System.out.println(k);
    

    so what you are measuring is how fast the debug output is written. This varies widely, and may even depend on position of the debug window on the screen, if it needs to scroll its size, etc.

  2. JIT/Hotspot incrementally optimizes often-used codepaths.

  3. The processor optimizes for expected codepaths. Paths used more often execute faster.

  4. Your sample size is way too small. Such microbenchmarks usually do a warm-up phase you can see how extensively this should be done like Java is really fast at doing nothing.

Cartwell answered 29/4, 2009 at 23:7 Comment(0)
S
3

In addition to JITting, other factors could be:

  • The process's output stream blocking when you call System.out.println
  • Your process getting scheduled out by another process
  • The garbage collector doing some work on a background thread

If you want to get good benchmarks, you should

  • Run the code you're benchmarking a large number of times, several thousand at least, and calculate the average time.
  • Ignore the times of the first several calls (due to JITting, etc.)
  • Disable the GC if you can; this may not be an option if your code generates a lot of objects.
  • Take the logging (println calls) out of the code being benchmarked.

There are benchmarking libraries on several platforms that will help you do this stuff; they can also calculate standard deviations and other statistics.

Schwartz answered 29/4, 2009 at 23:11 Comment(0)
S
2

The most likely culprit is the JIT (just-in-time) hotspot engine. Basically the first time code is executed the machine code is "remembered" by the JVM and then reused on future executions.

Sedulity answered 29/4, 2009 at 22:51 Comment(0)
D
1

I think it is because the second time the generated code was already optimized, after the first run.

Douai answered 29/4, 2009 at 22:52 Comment(0)
V
1

As has been suggested, JIT could be the culprit, but so could I/O wait time as well as resource wait time if other processes on the machine were using resources at that moment.

The moral of this story is that micrbenchmarking is a hard problem, especially for Java. I don't know why you're doing this, but if you're trying to choose between two approaches for a problem, don't measure them this way. Use the strategy design pattern and run your entire program with the two different approaches and measure the whole system. That makes little bumps in processing time even out over the long run, and gives you a much more realistic view of how much the performance of your entire app is bottlenecked at that point (hint: it's probably less than you think.)

Voltage answered 29/4, 2009 at 23:12 Comment(0)
S
1

Well the most probably answer is initialization. JIT is for sure not the right answer as it takes a lot more cycles before it starts to optimize. But at the very first time there can be:

  • looking up classes (is cached so no second lookup needed)
  • loading classes (once loaded stays in memory)
  • getting additional code from the native library (native code is cached)
  • finally it loads the code to be executed in the L1 cache of the CPU. That is the most propable case for speedup in your sense and at the same time a reason why the benchmark (being a microbenchmark) does not say much. If your code is small enough the second invocation of a loop can be run completely from inside the CPU which is fast. In the real world this does not happen because programs are bigger and the reusage of the L1 cache is far from being that big.
Squish answered 30/4, 2009 at 7:37 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.