Is it possible to make java.lang.invoke.MethodHandle as fast as direct invokation?
Asked Answered
A

2

13

I'm comparing performance of MethodHandle::invoke and direct static method invokation. Here is the static method:

public class IntSum {
    public static int sum(int a, int b){
        return a + b;
    }
}

And here is my benchmark:

@State(Scope.Benchmark)
public class MyBenchmark {

    public int first;
    public int second;
    public final MethodHandle mhh;

    @Benchmark
    @OutputTimeUnit(TimeUnit.NANOSECONDS)
    @BenchmarkMode(Mode.AverageTime)
    public int directMethodCall() {
        return IntSum.sum(first, second);
    }

    @Benchmark
    @OutputTimeUnit(TimeUnit.NANOSECONDS)
    @BenchmarkMode(Mode.AverageTime)
    public int finalMethodHandle() throws Throwable {
        return (int) mhh.invoke(first, second);
    }

    public MyBenchmark() {
        MethodHandle mhhh = null;

        try {
            mhhh = MethodHandles.lookup().findStatic(IntSum.class, "sum", MethodType.methodType(int.class, int.class, int.class));
        } catch (NoSuchMethodException | IllegalAccessException e) {
            e.printStackTrace();
        }

        mhh = mhhh;
    }

    @Setup
    public void setup() throws Exception {
        first = 9857893;
        second = 893274;
    }
}

I got the following result:

Benchmark                      Mode  Cnt  Score   Error  Units
MyBenchmark.directMethodCall   avgt    5  3.069 ± 0.077  ns/op
MyBenchmark.finalMethodHandle  avgt    5  6.234 ± 0.150  ns/op

MethodHandle has some performance degradation.

Running it with -prof perfasm shows this:

....[Hottest Regions]...............................................................................
 31.21%   31.98%         C2, level 4  java.lang.invoke.LambdaForm$DMH::invokeStatic_II_I, version 490 (27 bytes) 
 26.57%   28.02%         C2, level 4  org.sample.generated.MyBenchmark_finalMethodHandle_jmhTest::finalMethodHandle_avgt_jmhStub, version 514 (84 bytes) 
 20.98%   28.15%         C2, level 4  org.openjdk.jmh.infra.Blackhole::consume, version 497 (44 bytes) 

As far as I could figure out the reason for the benchmark result is that the Hottest Region 2 org.sample.generated.MyBenchmark_finalMethodHandle_jmhTest::finalMethodHandle_avgt_jmhStub contains all the type-checks performed by the MethodHandle::invoke inside the JHM loop. Assembly output fragment (some code ommitted):

....[Hottest Region 2]..............................................................................
C2, level 4, org.sample.generated.MyBenchmark_finalMethodHandle_jmhTest::finalMethodHandle_avgt_jmhStub, version 519 (84 bytes) 
;...
0x00007fa2112119b0: mov     0x60(%rsp),%r10
;...
0x00007fa2112119d4: mov     0x14(%r12,%r11,8),%r8d  ;*getfield form
0x00007fa2112119d9: mov     0x1c(%r12,%r8,8),%r10d  ;*getfield customized
0x00007fa2112119de: test    %r10d,%r10d
0x00007fa2112119e1: je      0x7fa211211a65    ;*ifnonnull
0x00007fa2112119e7: lea     (%r12,%r11,8),%rsi
0x00007fa2112119eb: callq   0x7fa211046020    ;*invokevirtual invokeBasic
;...
0x00007fa211211a01: movzbl  0x94(%r10),%r10d  ;*getfield isDone
;...
0x00007fa211211a13: test    %r10d,%r10d
;jumping at the begging of jmh loop if not done
0x00007fa211211a16: je      0x7fa2112119b0    ;*aload_1 
;...

Before calling the invokeBasic we perform the type-checking (inside the jmh loop) which affects the output avgt.

QUESTION: Why isn't all the type-check moved outside of the loop? I declared public final MethodHandle mhh; inside the benchmark. So I expected the compiler can figured it out and eliminate the same type-checks. How to make the same typechecks eliminated? Is it possible?

Armpit answered 15/3, 2018 at 5:33 Comment(11)
The method has signatureMethodHandle.invoke(Object... args). Is it possible that the int values are also being auto-boxed/unboxed? Looks like there's a lot of black magic in this class.Valerivaleria
@Valerivaleria This is signature-polymorphic method and has special treatment by javac. You can look at the compiled bytecode. The signature of the compiled method is MethodHandle.invoke(II)IArmpit
Ah, that's a new concept for me. Wild!Valerivaleria
@Valerivaleria Btw, the @PolymorphicSignature is not public. We cannot create methods like this by ourselves :).Armpit
But why don’t you use invokeExact? And which Java version did you use? When using Java 8 and having an interface with a matching signature, you can convert direct method handles to interface implementations via LambdaMetaFactory, as shown in this answer.Counterblow
@Counterblow I benchmarked invokeExact but the problem was that I did not get any performance improvement. Compiled code was also the same (the same type checks). Anyway, invoke works the same as invokeExact if the MethodType matches, doesn't it?Armpit
It depends on the JRE version; there were implementations were using invoke was significantly slower than invokeExact, so if you have a choice, prefer invokeExact. If it doesn’t help in your Java version, it doesn’t hurt either. By the way, how much warmup iterations did you have? To my experience, method handles need a lot of warmup…Counterblow
@Counterblow I benchmarked with 5 warmup and 5 iterations. It seemed to be enough for coming to steady state... No?Armpit
@Armpit @PolymorphicSignature - compiler overloads... :) of course we are not suppose to get a handle of those. btw @ForceInline is private also, but JMH somehow has @CompilerControl(CompilerControl.Mode.INLINE) (even if stated that this could be ignored)Bergerac
@Bergerac I just thought it might be convenient to have polymorphic signature. So I can avoid unnecesary boxing conversion when returnin value...Armpit
I’ve encountered a threshold in the order of twenty with method handles, though, it was with composed handles and in the case of multiple transformations, each step seemed to have its own counter, so when dealing with method handles, I’d always make a test with a really large number of warmup iterations, just to be sure. The other conclusion is to use the LambdaMetaFactory for direct handles, whenever possible.Counterblow
C
18

You use reflective invocation of MethodHandle. It works roughly like Method.invoke, but with less run-time checks and without boxing/unboxing. Since this MethodHandle is not static final, JVM does not treat it as constant, that is, MethodHandle's target is a black box and cannot be inlined.

Even though mhh is final, it contains instance fields like MethodType type and LambdaForm form that are reloaded on each iteration. These loads are not hoisted out of the loop because of a black-box call inside (see above). Furthermore, LambdaForm of a MethodHandle can be changed (customized) in run-time between calls, so it needs to be reloaded.

How to make the call faster?

  1. Use static final MethodHandle. JIT will know the target of such MethodHandle and thus may inline it at the call site.

  2. Even if you have non-static MethodHandle, you may bind it to a static CallSite and invoke it as fast as direct methods. This is similar to how lambdas are called.

    private static final MutableCallSite callSite = new MutableCallSite(
            MethodType.methodType(int.class, int.class, int.class));
    private static final MethodHandle invoker = callSite.dynamicInvoker();
    
    public MethodHandle mh;
    
    public MyBenchmark() {
        mh = ...;
        callSite.setTarget(mh);
    }
    
    @Benchmark
    public int boundMethodHandle() throws Throwable {
        return (int) invoker.invokeExact(first, second);
    }
    
    1. Use regular invokeinterface instead of MethodHandle.invoke as @Holger suggested. An instance of interface for calling given MethodHandle can be generated with LambdaMetafactory.metafactory().
Cantaloupe answered 15/3, 2018 at 15:39 Comment(8)
pardon my ignorance, but if the OP knows that the CallSite will not change, can this code be made to use a ConstantCallSite instead? If so, since it is a constant CallSite would that require for it to be static also?Bergerac
@Bergerac ConstantCallSite requires to specify the target method in the constructor. In this sense ConstantCallSite is useless - this will be the same as creating static MethodHandle directly. MutableCallSite on the other hand allows to delay the decision about the target until later in runtime.Cantaloupe
Ahhh... It means constant folding is applied only to static final. I thought for some reason that if we declare an immutable field as final the compiler can know that it is immutable and final and hoist some bound checking outside of the loop (in my case). Maybe you know where to find about JIT hoisting/constant folding? I looked at the opto package but it seems blurred across it...Armpit
@Armpit Final non-static fields are not considered constants by default, unless -XX:+TrustFinalNonStaticFields is set. See ciField::initialize_from.Cantaloupe
@Bergerac BTW the SignaturePolymorphic methods have a strict definition in JVMS. docs.oracle.com/javase/specs/jvms/se8/html/jvms-2.html#jvms-2.9 So, no other methoda except ones in java.lang.invoke can be.Armpit
I don't understand your second example. Can you please replace mh = ... with a concrete statement? It would help make the example more understandable.Shaped
@Shaped mh points to the target method to be called. There is an example in the original question.Cantaloupe
@Cantaloupe The original question only mentions mhh and mhhh, both of which are MethodHandles. I hope you understand how this can get confusing.Shaped
W
5

Make MethodHandle mhh static:

Benchmark            Mode  Samples  Score   Error  Units
directMethodCall     avgt        5  0,942 ± 0,095  ns/op
finalMethodHandle    avgt        5  0,906 ± 0,078  ns/op

Non-static:

Benchmark            Mode  Samples  Score   Error  Units
directMethodCall     avgt        5  0,897 ± 0,059  ns/op
finalMethodHandle    avgt        5  4,041 ± 0,463  ns/op
Weisler answered 15/3, 2018 at 5:44 Comment(2)
Cool, really works. Now MethodHandle::invoke and the actual IntSum::sum it invokes is simply inlined into the jmh loop. Why? What happened? Is it possible to do so in non-static case?Armpit
@Armpit I agree, why adding static would work? :| I thought that this is just a problem with set-up here, so I code my own version of this (with a setup class), but the results are the same as in your case, twice the difference...Bergerac

© 2022 - 2024 — McMap. All rights reserved.