I know micro-benchmarking is hard. I'm not trying to build a poor micro-benchmark. Rather, I have run into this problem when making (what I thought to be) harmless refactoring. There is stripped down demo of the problem below.
The program builds an ArrayList of ten thousand random integers and then finds sum of the elements. In the example, summing is repeated a million times to improve signal v. noise ratio in the measurement of elapsed time. In real program, there's a million of slightly different lists, but the problematic effect holds regardless.
App#arraySumInlined
is the method version before refactoring with summing kept inline in the loop body.App#arraySumSubFunctionCall
is the method version having loop body extracted into separate method.
Now, the surprising thing (to me) is that arraySumInlined
takes ~7 sec, but arraySumSubFunctionCall
takes ~42 sec. This seems to me like an impressive enough difference.
If I uncomment both arraySumInlined
and arraySumSubFunctionCall
then they complete in ~7 sec each. I.e. arraySumSubFunctionCall
ceases to be that slow.
What's going on here? Are there any broader implications? E.g. never before have I thought of extract method refactoring as of something that could turn 7 sec method call into 42 sec one.
While researching this, I have found several questions involving JIT (e.g. Java method call performance and Why does this code using streams run so much faster in Java 9 than Java 8?), but they seem to deal with opposite cases: inlined code performing worse than code in a separate method.
Environment details: Windows 10 x64, Intel Core i3-6100.
λ java -version
openjdk version "11.0.4" 2019-07-16
OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.4+11)
OpenJDK 64-Bit Server VM AdoptOpenJDK (build 11.0.4+11, mixed mode)
λ javac -version
javac 11.0.4
import java.util.ArrayList;
import java.util.Random;
import java.util.concurrent.TimeUnit;
public class App {
public static void main(String[] args) {
final int size = 10_000;
final int iterations = 1_000_000;
final var data = integerListWithRandomValues(size);
//arraySumInlined(iterations, data);
arraySumSubFunctionCall(iterations, data);
}
private static void arraySumSubFunctionCall(int iterations,
final ArrayList<Integer> data) {
final long start = System.nanoTime();
long result = 0;
for (int i = 0; i < iterations; ++i) {
result = getSum(data);
}
final long end = System.nanoTime();
System.out.println(String.format("%f sec (%d)",
TimeUnit.NANOSECONDS.toMillis(end - start) / 1000.0, result));
}
private static void arraySumInlined(int iterations,
final ArrayList<Integer> data) {
final long start = System.nanoTime();
long result = 0;
for (int i = 0; i < iterations; ++i) {
result = data.stream().mapToInt(e -> e).sum();
}
final long end = System.nanoTime();
System.out.println(String.format("%f sec (%d)",
TimeUnit.NANOSECONDS.toMillis(end - start) / 1000.0, result));
}
private static int getSum(final ArrayList<Integer> data) {
return data.stream().mapToInt(e -> e).sum();
}
private static ArrayList<Integer> integerListWithRandomValues(final int size) {
final var result = new ArrayList<Integer>();
final var r = new Random();
for (int i = 0; i < size; ++i) {
result.add(r.nextInt());
}
return result;
}
}
-XX:MaxInlineLevel=15
. – Sonnierresult
into accumulator instead of effectively using only the value from the last iteration. – Branch-XX:MaxInlineLevel=15
does make performance of both versions about the same. – Branch-XX:InlineSmallCode
to a smaller value also seems to reduce run time. – Epicenter