Unexpected VarHandle performance (4X slower than alternatives)
Asked Answered
M

1

17

In JEP193, one of the specific goals of VarHandles is to provide an alternative to of using FieldUpdaters and AtomicIntegers (and avoid some of the overhead associated with them).

AtomicIntegers can be particularly wasteful in terms of memory since they're a separate object (they use around 36 bytes each, depending on a few factors such as wether compressed OOPs are enabled or not, etc.).

If you have many integers that may need to be updated atomically (in many small objects), there are essentially three options if you want to reduce the waste:

  • Use an AtomicFieldUpdater
  • Use a VarHandle
  • Or re-arrange the code to use AtomicIntegerArray instead of fields in objects.

So I decided to test the alternatives and get an idea of the performance implications of each.

Using an atomic (volatile mode) increment of an integer field as a proxy, I'm getting the following results on a Mid 2014 MacBook Pro:

Benchmark                         Mode  Cnt          Score          Error  Units
VarHandleBenchmark.atomic        thrpt    5  448041037.223 ± 36448840.301  ops/s
VarHandleBenchmark.atomicArray   thrpt    5  453785339.203 ± 64528885.282  ops/s
VarHandleBenchmark.fieldUpdater  thrpt    5  459802512.169 ± 52293792.737  ops/s
VarHandleBenchmark.varhandle     thrpt    5  136482396.440 ±  9439041.030  ops/s

In this benchmark, the VarHandles are roughly four times slower.

What I'm trying to understand is where the overhead comes from?

Is this due to the signature-polymorphic access methods? Am I making a mistake in the micro benchmark?

Benchmark details follow.


I ran the benchmark with the following JVM on a Mid 2014 MacBook Pro

> java -version
openjdk version "11.0.2" 2019-01-15
OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.2+9)
OpenJDK 64-Bit Server VM AdoptOpenJDK (build 11.0.2+9, mixed mode)

Source code of the benchmark:

import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.Fork;
import org.openjdk.jmh.annotations.Measurement;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.annotations.Threads;
import org.openjdk.jmh.annotations.Warmup;
import org.openjdk.jmh.infra.Blackhole;

import java.lang.invoke.MethodHandles;
import java.lang.invoke.VarHandle;
import java.util.concurrent.atomic.AtomicInteger;
import java.util.concurrent.atomic.AtomicIntegerArray;
import java.util.concurrent.atomic.AtomicIntegerFieldUpdater;

@State(Scope.Thread)
@Fork(value = 1, jvmArgs = {"-Xms256m", "-Xmx256m", "-XX:+UseG1GC"})
@Warmup(iterations = 3, time = 3)
@Measurement(iterations = 5, time = 5)
@Threads(4)
public class VarHandleBenchmark {

    // array option
    private final AtomicIntegerArray array = new AtomicIntegerArray(1);

    // vanilla AtomicInteger
    private final AtomicInteger counter = new AtomicInteger();

    // count field and its VarHandle
    private volatile int count;
    private static final VarHandle COUNT;

    // count2 field and its field updater
    private volatile int count2;
    private static final AtomicIntegerFieldUpdater<VarHandleBenchmark> COUNT2 ;

    static {
        try {

            COUNT = MethodHandles.lookup()
                    .findVarHandle(VarHandleBenchmark.class, "count", Integer.TYPE);
            COUNT2 = AtomicIntegerFieldUpdater.newUpdater(VarHandleBenchmark.class, "count2");
        } catch (ReflectiveOperationException e) {
            throw new AssertionError(e);
        }
    }

    @Benchmark
    public void atomic(Blackhole bh) {
        bh.consume(counter.getAndAdd(1));
    }

    @Benchmark
    public void atomicArray(Blackhole bh) {
        bh.consume(array.getAndAdd(0, 1));
    }

    @Benchmark
    public void varhandle(Blackhole bh) {
        bh.consume(COUNT.getAndAdd(this, 1));
    }

    @Benchmark
    public void fieldUpdater(Blackhole bh) {
        bh.consume(COUNT2.getAndAdd(this, 1));
    }
}

UPDATE: after applying apangin's solution, these are the results of the benchmark:

Benchmark                         Mode  Cnt          Score          Error  Units
VarHandleBenchmark.atomic        thrpt    5  464045527.470 ± 42337922.645  ops/s
VarHandleBenchmark.atomicArray   thrpt    5  465700610.882 ± 18116770.557  ops/s
VarHandleBenchmark.fieldUpdater  thrpt    5  473968453.591 ± 49859839.498  ops/s
VarHandleBenchmark.varhandle     thrpt    5  429737922.796 ± 41629104.677  ops/s

The difference disappears.

Mackie answered 14/11, 2019 at 17:59 Comment(0)
E
25

VarHandle.getAndAdd is a signature polymorphic method. That is, the type of its arguments and the type of its return value is derived from the actual source code.

Blackhole.consume is an overloaded method. There are multiple variants of this methods:

  • consume(int)
  • consume(Object)
  • etc.

In your code, according to the language rules, consume(Object) method is used. Therefore, VarHandle also returns an Object - a boxed Integer.

In order to use the correct method, you'll need to rewrite varhandle benchmark as follows:

bh.consume((int) COUNT.getAndAdd(this, 1));

Now varhandle will run with the same performance as other benchmarks.

Embouchure answered 14/11, 2019 at 21:18 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.