Different benchmarking results between forks in JMH
Asked Answered
G

0

6

In short, my question is: why the JMH benchmarking results can be stable within fork, but differ significantly between forks.

I observed this in many benchmakrs (usually involving processing of the data set). Here's a simple example:

import static java.util.concurrent.TimeUnit.*;
import static java.util.stream.Collectors.*;
import java.util.*;

import org.openjdk.jmh.infra.Blackhole;
import org.openjdk.jmh.annotations.*;

@Warmup(iterations = 5, time = 1, timeUnit = SECONDS)
@Measurement(iterations = 15, time = 1, timeUnit = SECONDS)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(MICROSECONDS)
@Fork(50)
@State(Scope.Benchmark)
public class AvgTest {
    private long[] longs = new Random(1).longs(1000).toArray();

    @Benchmark
    public void test(Blackhole bh) {
        bh.consume(Arrays.stream(longs).boxed().collect(averagingLong(x->x)));
    }
}

I use 5 one-second warmup iterations and 15 one-second measurement iterations. The whole procedure is repeated 50 times (along with JVM restart) as specified by @Fork(50). Usual fork look like this:

# Run progress: 8,00% complete, ETA 00:15:34
# Fork: 5 of 50
# Warmup Iteration   1: 10,752 us/op
# Warmup Iteration   2: 5,504 us/op
# Warmup Iteration   3: 5,107 us/op
# Warmup Iteration   4: 5,144 us/op
# Warmup Iteration   5: 5,157 us/op
Iteration   1: 5,140 us/op
Iteration   2: 5,157 us/op
Iteration   3: 5,148 us/op
Iteration   4: 5,143 us/op
Iteration   5: 5,153 us/op
Iteration   6: 5,148 us/op
Iteration   7: 5,151 us/op
Iteration   8: 5,143 us/op
Iteration   9: 5,143 us/op
Iteration  10: 5,138 us/op
Iteration  11: 5,144 us/op
Iteration  12: 5,142 us/op
Iteration  13: 5,151 us/op
Iteration  14: 5,144 us/op
Iteration  15: 5,135 us/op

As you can see, the per-iteration results are very stable, standard deviation is low. However sometimes (probably once in several dozens) I see the forks like this:

# Run progress: 26,00% complete, ETA 00:12:31
# Fork: 14 of 50
# Warmup Iteration   1: 13,482 us/op
# Warmup Iteration   2: 12,800 us/op
# Warmup Iteration   3: 12,140 us/op
# Warmup Iteration   4: 12,102 us/op
# Warmup Iteration   5: 12,094 us/op
Iteration   1: 12,114 us/op
Iteration   2: 12,164 us/op
Iteration   3: 12,263 us/op
Iteration   4: 12,271 us/op
Iteration   5: 12,319 us/op
Iteration   6: 12,309 us/op
Iteration   7: 12,305 us/op
Iteration   8: 12,308 us/op
Iteration   9: 12,257 us/op
Iteration  10: 12,267 us/op
Iteration  11: 12,270 us/op
Iteration  12: 12,285 us/op
Iteration  13: 12,292 us/op
Iteration  14: 12,242 us/op
Iteration  15: 12,253 us/op

The results are also quite stable, but more than 2x slower than in the usual fork.

Here's per-fork summary (fork number, mean time and standard deviation in microseconds sorted by mean time):

Fork#  Mean       SD
37    5.142    0.006 
23    5.142    0.007 
46    5.143    0.014 
 5    5.145    0.006 
15    5.145    0.007 
17    5.146    0.011 
 9    5.147    0.024 
47    5.148    0.006 
 7    5.149    0.005 
44    5.149    0.004 
33    5.150    0.010 
18    5.151    0.006 
26    5.151    0.008 
11    5.153    0.007 
22    5.153    0.005 
 6    5.154    0.006 
12    5.155    0.008 
50    5.156    0.006 
20    5.157    0.009 
45    5.157    0.006 
49    5.157    0.010 
25    5.160    0.009 
34    5.160    0.006 
21    5.163    0.009 
27    5.163    0.018 
16    5.163    0.010 
31    5.163    0.014 
 3    5.165    0.006 
29    5.167    0.008 
30    5.170    0.033 
48    5.174    0.008 
_____________________
38    5.210    0.020 
 8    5.219    0.008 
24    5.220    0.005 
 4    5.224    0.007 
39    5.225    0.007 
35    5.227    0.006 
10    5.229    0.007 
13    5.229    0.007 
41    5.232    0.005 
42    5.232    0.007 
40    5.249    0.008 
_____________________
36    5.625    0.028 
 1    5.653    0.032 
32    5.669    0.029 
19    5.706    0.035 
28    5.772    0.051 
 2    5.858    0.032 
_____________________
43    8.948    0.010 
14   12.261    0.055 

So as you can see, for the most of iterations the average value falls into 5.142..5.174 us interval, then small jump to 5.210..5.249 us interval, then larger jump to 5.625..5.858 us, then two outliers. The raw results are available in this gist.

So what are these jumps and outliers? Is it benchmarking procedure glitch or such effect can appear in production as well and my program in rare cases will become 2.5x slower? Is this some hardware- or JVM-related issue? Can I predict at the beginning of the execution whether I'm on the fast fork or on the slow?

Measurements were performed in Windows 7 64bit Intel i5 QuadCore system using the Oracle JDK 1.8.0_45 and JMH 1.10.3.

Gennygeno answered 17/8, 2015 at 9:53 Comment(6)
I've also noticed a similar problem a couple of times. Due to timing issues some method is JIT-compiled differently in each fork. But once it has been compiled, the performance is stable - that's why there is no deviation within one fork. You'll probably don't see this effect if you turn off -XX:-BackgroundCompilation.Dint
-XX:-TieredCompilation will also make benchmark results more predictable (and better!)Dint
@apangin, thank you for the hints. Will rerun the benchmark with -XX:-BackgroundCompilation later today. For me it looks that the type profiles should be quite clean, so things like inlining should not vary greatly. Will also check -XX:+PrintInlining, though it would probably be quite hard to analyze...Gennygeno
@apangin, using -XX:-TieredCompilation it works faster! But there's also the puzzle, on most iterations it's 4.1us, but in rare cases (3 of 50) it's about 2.7us. Using -XX:-BackgroundCompilation it slower (around 8.9us), but stable (no outliers in 50 forks). Will continue digging tomorrow.Gennygeno
I've discovered one more situation when JMH yields different score between forks. But this is apparently not your case. In your case the variance seems to be caused by the race between the application thread and background compiler threads.Dint
@apangin, seems like so. Had no time yet to check the compilation results in detail. Anyways thank you for help!Gennygeno

© 2022 - 2024 — McMap. All rights reserved.