Why is JMH saying that returning 1 is faster than returning 0

import org.openjdk.jmh.annotations.*; import java.util.concurrent.TimeUnit; @State(Scope.Thread) @BenchmarkMode(Mode.Throughput) @OutputTimeUnit(TimeUnit.MILLISECONDS) @Fork(value = 3, jvmArgsAppend = {"-server", "-disablesystemassertions"}) public class ZeroVsOneBenchmark { @Benchmark @Warmup(iterations = 3, time = 2, timeUnit = TimeUnit.SECONDS) public int zero() { return 0; } @Benchmark @Warmup(iterations = 3, time = 2, timeUnit = TimeUnit.SECONDS) public int one() { return 1; } }

# Run complete. Total time: 00:03:05 Benchmark Mode Samples Score Score error Units c.m.ZeroVsOneBenchmark.one thrpt 60 1680674.502 24113.014 ops/ms c.m.ZeroVsOneBenchmark.zero thrpt 60 735975.568 14779.380 ops/ms

# Run complete. Total time: 01:01:56 Benchmark Mode Samples Score Score error Units c.m.ZeroVsOneBenchmark.one thrpt 90 1762956.470 7554.807 ops/ms c.m.ZeroVsOneBenchmark.two thrpt 90 1764642.299 9277.673 ops/ms c.m.ZeroVsOneBenchmark.zero thrpt 90 773010.467 5031.920 ops/ms

JMH is a good tool but still not perfect.

Certainly there is no speed difference between returning 0, 1 or any other integer. However it makes difference how the value is consumed by JMH and how this is compiled by HotSpot JIT.

To prevent JIT from optimizing out calculations, JMH uses the special Blackhole class to consume values returned from a benchmark. Here is a one for integer values:

public final void consume(int i) {
    if (i == i1 & i == i2) {
        // SHOULD NEVER HAPPEN
        nullBait.i1 = i; // implicit null pointer exception
    }
}

Here i is a value returned from a benchmark. In your case it is either 0 or 1. When i == 1 the never-happen condition looks like if (1 == i1 & 1 == i2) which is compiled as follows:

0x0000000002b4ffe5: mov    0xb0(%r13),%r10d   ;*getfield i1
0x0000000002b4ffec: mov    0xb4(%r13),%r8d    ;*getfield i2
0x0000000002b4fff3: cmp    $0x1,%r8d
0x0000000002b4fff7: je     0x0000000002b50091  ;*return

But when i == 0 JIT tries to "optimize" two comparisions to 0 using setne instructions. However the result code becomes too complicated:

0x0000000002a40b28: mov    0xb0(%rdi),%r10d   ;*getfield i1
0x0000000002a40b2f: mov    0xb4(%rdi),%r8d    ;*getfield i2
0x0000000002a40b36: test   %r10d,%r10d
0x0000000002a40b39: setne  %r10b
0x0000000002a40b3d: movzbl %r10b,%r10d
0x0000000002a40b41: test   %r8d,%r8d
0x0000000002a40b44: setne  %r11b
0x0000000002a40b48: movzbl %r11b,%r11d
0x0000000002a40b4c: xor    $0x1,%r10d
0x0000000002a40b50: xor    $0x1,%r11d
0x0000000002a40b54: and    %r11d,%r10d
0x0000000002a40b57: test   %r10d,%r10d
0x0000000002a40b5a: jne    0x0000000002a40c15  ;*return

That is, slower return 0 is explained by more CPU instructions executed in Blackhole.consume().

Note to JMH developers: I would suggest rewriting Blackhole.consume like

if (i == l1) {
     // SHOULD NEVER HAPPEN
    nullBait.i1 = i; // implicit null pointer exception
}

where volatile long l1 = Long.MIN_VALUE. In this case the condition will still be always-false but it will be compiled equally for all return values.

Recommended topics

Hot tags