weakCompareAndSwap vs compareAndSwap
Asked Answered
A

3

13

This question is not about the difference between them - I know what spurious failure is and why it happens on LL/SC. My question is if I'm on intel x86 and using java-9 (build 149), why is there a difference between their assembly code?

public class WeakVsNonWeak {

    static jdk.internal.misc.Unsafe UNSAFE = jdk.internal.misc.Unsafe.getUnsafe();

    public static void main(String[] args) throws NoSuchFieldException, SecurityException {

        Holder h = new Holder();
        h.setValue(33);
        Class<?> holderClass = Holder.class;
        long valueOffset = UNSAFE.objectFieldOffset(holderClass.getDeclaredField("value"));

        int result = 0;
        for (int i = 0; i < 30_000; ++i) {
            result = strong(h, valueOffset);
        }
        System.out.println(result);

    }

    private static int strong(Holder h, long offset) {
        int sum = 0;
        for (int i = 33; i < 11_000; ++i) {
            boolean result = UNSAFE.weakCompareAndSwapInt(h, offset, i, i + 1);
            if (!result) {
                sum++;
            }
        }
        return sum;

    }

    public static class Holder {

        private int value;

        public int getValue() {
            return value;
        }

        public void setValue(int value) {
            this.value = value;
        }
    }
}

Running with:

 java -XX:-TieredCompilation 
      -XX:CICompilerCount=1 
      -XX:+UnlockDiagnosticVMOptions  
      -XX:+PrintIntrinsics 
      -XX:+PrintAssembly 
      --add-opens java.base/jdk.internal.misc=ALL-UNNAMED
      WeakVsNonWeak

Output of compareAndSwapInt(relevant parts):

     0x0000000109f0f4b8: movabs $0x111927c18,%rsi  ;   {metadata({method} {0x0000000111927c18} 'compareAndSwapInt' '(Ljava/lang/Object;JII)Z' in 'jdk/internal/misc/Unsafe')}
  0x0000000109f0f4c2: mov    %r15,%rdi
  0x0000000109f0f4c5: test   $0xf,%esp
  0x0000000109f0f4cb: je     0x0000000109f0f4e3
  0x0000000109f0f4d1: sub    $0x8,%rsp
  0x0000000109f0f4d5: callq  0x00000001098569d2  ;   {runtime_call SharedRuntime::dtrace_method_entry(JavaThread*, Method*)}
  0x0000000109f0f4da: add    $0x8,%rsp
  0x0000000109f0f4de: jmpq   0x0000000109f0f4e8
  0x0000000109f0f4e3: callq  0x00000001098569d2  ;   {runtime_call SharedRuntime::dtrace_method_entry(JavaThread*, Method*)}
  0x0000000109f0f4e8: pop    %r9
  0x0000000109f0f4ea: pop    %r8
  0x0000000109f0f4ec: pop    %rcx
  0x0000000109f0f4ed: pop    %rdx
  0x0000000109f0f4ee: pop    %rsi
  0x0000000109f0f4ef: lea    0x210(%r15),%rdi
  0x0000000109f0f4f6: movl   $0x4,0x288(%r15)
  0x0000000109f0f501: callq  0x00000001098fee40  ;   {runtime_call Unsafe_CompareAndSwapInt(JNIEnv_*, _jobject*, _jobject*, long, int, int)}
  0x0000000109f0f506: vzeroupper 
  0x0000000109f0f509: and    $0xff,%eax
  0x0000000109f0f50f: setne  %al
  0x0000000109f0f512: movl   $0x5,0x288(%r15)
  0x0000000109f0f51d: lock addl $0x0,-0x40(%rsp)
  0x0000000109f0f523: cmpl   $0x0,-0x3f04dd(%rip)        # 0x0000000109b1f050

Output of weakCompareAndSwapInt:

  0x000000010b698840: sub    $0x18,%rsp
  0x0000010b698847: mov    %rbp,0x10(%rsp)
  0x000000010b69884c: mov    %r8d,%eax
  0x000000010b69884f: lock cmpxchg %r9d,(%rdx,%rcx,1)
  0x000000010b698855: sete   %r11b
  0x000000010b698859: movzbl %r11b,%r11d        ;*invokevirtual compareAndSwapInt {reexecute=0 rethrow=0 return_oop=0}
                                                ; - jdk.internal.misc.Unsafe::weakCompareAndSwapInt@7 (line 1369)

I am by far not versatile enough to understand the entire output, but can definitely see the difference between lock addl and lock cmpxchg.

EDIT Peter's answer got me thinking. Let's see if compareAndSwap will be an intrinsic call:

-XX:+PrintIntrinsics -XX:-PrintAssembly

 @ 7   jdk.internal.misc.Unsafe::compareAndSwapInt (0 bytes)   (intrinsic)
 @ 20      jdk.internal.misc.Unsafe::weakCompareAndSwapInt (11 bytes)   (intrinsic).

And then run the example twice with/without:

-XX:DisableIntrinsic=_compareAndSwapInt

This is sort of weird, the output is exactly the same (same exact instructions) with the only differences that with enable intrinsic I get calls like this:

  0x000000010c23e355: callq  0x00000001016569d2  ;   {runtime_call SharedRuntime::dtrace_method_entry(JavaThread*, Method*)}
  0x000000010c23e381: callq  0x00000001016fee40  ;   {runtime_call Unsafe_CompareAndSwapInt(JNIEnv_*, _jobject*, _jobject*, long, int, int)}

And disabled:

  0x00000001109322d5: callq  0x0000000105c569d2  ;   {runtime_call _ZN13SharedRuntime19dtrace_method_entryEP10JavaThreadP6Method}
    0x00000001109322e3: callq  0x0000000105c569d2  ;   {runtime_call _ZN13SharedRuntime19dtrace_method_entryEP10JavaThreadP6Method}

This is rather intriguing, shouldn't the intrinsic code be different?

EDIT-2 the8472 makes sense too.

lock addl is a substitute for mfence that flushes the StoreBuffer on x86 as far as I know and it has to do with visibility and not atomicity indeed. Right before this entry, is:

 0x00000001133db6f6: movl   $0x4,0x288(%r15)
 0x00000001133db701: callq  0x00000001060fee40  ;   {runtime_call Unsafe_CompareAndSwapInt(JNIEnv_*, _jobject*, _jobject*, long, int, int)}
 0x00000001133db706: vzeroupper 
 0x00000001133db709: and    $0xff,%eax
 0x00000001133db70f: setne  %al
 0x00000001133db712: movl   $0x5,0x288(%r15)
 0x00000001133db71d: lock addl $0x0,-0x40(%rsp)
 0x00000001133db723: cmpl   $0x0,-0xd0bc6dd(%rip)        #     0x000000010631f050
                                            ;   {external_word}

If you look here is will delegate to another native call to Atomic:: cmpxchg that seems to be doing the swap atomically.

Why that is not a substitute to a direct lock cmpxchg is a mystery to me.

Atalee answered 29/12, 2016 at 14:46 Comment(9)
with your edits and numerous assembly samples from different optimization levels it's not quite clear what you're actually asking.Coverall
So sun.misc.Unsafe still hasn’t gone, but moved to a different package, jdk.internal.misc, proving that it’s actually not a compatibility issue, that keeps that class alive?Langelo
@Langelo It has not moved, there are two versions now. as Shipilev says sun.misc.Unsafe will be deleted - this time for sure. There are multiple enhancements in the other places that sun.misc.Unsafe used to be useful that are now obsolete (like AtomicFieldUpdater). They have even added release/acquire semantics directly into the Unsafe!Atalee
I just thought that VarHandle is supposed to handle all this stuff officially and now I see an Unsafe class that apparently is even extended, compared to the Java 8 version. This doesn’t look like getting rid of it…Langelo
@Langelo sun.misc.Unsafe is going to be deleted, not the second one. They still need a way to expose Unsafe and make it, well safe. VarHandle is the safe PUBLIC api that jdk.internal.misc.Unsafe exposes.Atalee
Well, every time there is a non-public API wrapped by a public one, people are going to bypass the official API, thinking there was some benefit in using in unofficial API directly. I don’t see any reason, why there has to be a class named Unsafe beneath the official API. It doesn’t contain any implementation anyway, it’s the JVM treating the calls as intrinsics or native methods handling the invocation, so there is no reason not to do that directly for the methods of the official API. In fact, that happens for a lot of API methods, but the existence of Unsafe creates a wrong impression.Langelo
It speaks a lot that methods like Unsafe.storeFence() are not even used in Java 8 internally; this method is only used by 3rd party libraries…Langelo
@Langelo this slightly goes out of the scope of the question. btw there is a need for relaxed semantics, there has always been; how otherwise will they be exposed for people who actually need them? I mean we have lazySet for quite a while now and it has not killed anyone (yet). This would be a good question for someone who actually had taken these decisions.Atalee
I have no problem with additional semantics, though I have some doubts about methods offering semantics which don’t even exist within the memory model specification, but I really hope, now, that they are becoming officially supported operations, someone is finally going to document them. But VarHandle is an abstraction that allows alternative JVM implementations, while the Unsafe class is tight to several assumptions about it, i.e. that there is always a never-changing field offset. So offering these unofficial operations to 3rd party developers is hindering alternative implementations.Langelo
A
8

TL;DR You're looking at the wrong place in the assembly output.

Both compareAndSwapInt and weakCompareAndSwapInt calls are compiled to exactly the same ASM sequence on x86-64. However, the methods themselves are compiled differently (but it does not usually matter).

  1. The definition of compareAndSwapInt and weakCompareAndSwapInt in the source code is different. The former is a native method, while the latter is a Java method.

    @HotSpotIntrinsicCandidate
    public final native boolean compareAndSwapInt(Object o, long offset,
                                                  int expected,
                                                  int x);
    
    @HotSpotIntrinsicCandidate
    public final boolean weakCompareAndSwapInt(Object o, long offset,
                                                      int expected,
                                                      int x) {
        return compareAndSwapInt(o, offset, expected, x);
    }
    
  2. What you've seen is how these standalone methods are compiled. A native method compiles to a stub that calls a corresponding C function. But this is not what runs in the fast path.

  3. Intrinsic methods are those which calls are replaced with HotSpot-specific inline implementation. Note: The calls are replaced, but not the methods themselves.

  4. If you look at the assembly output of your WeakVsNonWeak.strong method, you'll see that it contains lock cmpxchg instruction, whether it calls UNSAFE.compareAndSwapInt or UNSAFE.weakCompareAndSwapInt.

    0x000001bd76170c44: lock cmpxchg %ecx,(%r11)
    0x000001bd76170c49: sete   %r10b
    0x000001bd76170c4d: movzbl %r10b,%r10d        ;*invokevirtual compareAndSwapInt
                                                  ; - WeakVsNonWeak::strong@25 (line 23)
                                                  ; - WeakVsNonWeak::main@46 (line 14)
    
    0x0000024f56af1097: lock cmpxchg %r11d,(%r8)
    0x0000024f56af109c: sete   %r10b
    0x0000024f56af10a0: movzbl %r10b,%r10d        ;*invokevirtual weakCompareAndSwapInt
                                                  ; - WeakVsNonWeak::strong@25 (line 23)
                                                  ; - WeakVsNonWeak::main@46 (line 14)
    

    Once the main method is JIT-compiled, the standalone version of Unsafe.* methods will not be called directly.

Aftermost answered 16/1, 2017 at 21:51 Comment(2)
you are right: it's hard to read the output in it's entire glory without some proper experience (like me). You're explanations are fantastic! what I've seen and shown in the code are the individual method output from the c2 compilation which != intrinsic code; once strong method is compiled, using UNSAFE.compareAndSwapInt or UNSAFE.weakCompareAndSwapInt yields the same output meaning they're intrinsic code is the same.Atalee
Thank you for the response @apangin, please, correct me if I am wrong, but LL/SC is NOT what makes the CAS a weakCAS. what makes a weakCAS is the lack of retry attempt at acquiring cache exclusivity. This means that on non x86 processors, a strong CAS can very well be built from a LL/SC instruction. Both LOCK prefix and LL/SC return with a failure under contention(LOCK assert in the case of the LOCK prefix, OR Failure at either LL or SC, OR context switching n the case of LL/SC)... This means that both LOCK and LL/SC's DEFAULT behavior IS being weak... Am I correct?Verst
N
5

In the first case, a native method is being used. Either the code hasn't been optimised or it's not an intrinsic.

In the second case an intrinsic has been used to inline the assembly required, rather than call a JNI method. I would have though both cases would do this but I guess not.

Newhall answered 29/12, 2016 at 14:56 Comment(4)
indeed you are probably right, but I am not sure why. See the editAtalee
@Atalee I agree it appears backwards. The intrinsic should have the mov and the non-intrinsic should have the callqNewhall
that's not the point. compareAndSwap intrinsic and compareAndSwap non-intrinsics differs only in the functions from callq. I was expecting a lot moreAtalee
@Atalee I am pretty sure you can ignore the dtrace method entry calls. These shouldn't do anything (except help with tracing)Newhall
C
4

I believe the lock addl is not the atomic op itself but a store-load barrier implementation. the atomic happens in the callq.

Since you're already logging with PrintIntrinsics you should check if it actually gets intrinsified.

Coverall answered 29/12, 2016 at 15:33 Comment(1)
indeed you are right also (see EDIT-2), but it does not answer the main question. nevertheless thank you for your input.Atalee

© 2022 - 2024 — McMap. All rights reserved.