Spinlock with XCHG unlocking

Asked 19/4, 2016 at 23:23 Answered 16/5, 2016 at 3:43

The example implementation Wikipedia provides for a spinlock with the x86 XCHG command is:

; Intel syntax

locked:                      ; The lock variable. 1 = locked, 0 = unlocked.
     dd      0

spin_lock:
     mov     eax, 1          ; Set the EAX register to 1.

     xchg    eax, [locked]   ; Atomically swap the EAX register with
                             ;  the lock variable.
                             ; This will always store 1 to the lock, leaving
                             ;  the previous value in the EAX register.

     test    eax, eax        ; Test EAX with itself. Among other things, this will
                             ;  set the processor's Zero Flag if EAX is 0.
                             ; If EAX is 0, then the lock was unlocked and
                             ;  we just locked it.
                             ; Otherwise, EAX is 1 and we didn't acquire the lock.

     jnz     spin_lock       ; Jump back to the MOV instruction if the Zero Flag is
                             ;  not set; the lock was previously locked, and so
                             ; we need to spin until it becomes unlocked.

     ret                     ; The lock has been acquired, return to the calling
                             ;  function.

spin_unlock:
     mov     eax, 0          ; Set the EAX register to 0.

     xchg    eax, [locked]   ; Atomically swap the EAX register with
                             ;  the lock variable.

     ret                     ; The lock has been released.

from here https://en.wikipedia.org/wiki/Spinlock#Example_implementation

What I don't understand is why the unlock would need to be atomic. What's wrong with

spin_unlock:
     mov     [locked], 0

Fuel answered 19/4, 2016 at 23:23 Comment(8)

I agree that mov should work, especially given that only the least significant bit is used in the variable. – Gravitate 19/4, 2016 at 23:30

I suppose using XCHG for the unlock gives spin_unlock a return value, 1 for success and 0 for an error because lock wasn't held. – Trample 19/4, 2016 at 23:41

The problem is not actually atomicity - ordinary aligned 32-bit stores are always atomic on x86 - but order. locked atomics (including the implicitly locked xchg) have total order on x86, while ordinary stores only have release consistency. Of course, release semantics are enough for a spinlock, provided the acquire is done with a locked atomic. – Rosalbarosalee 20/4, 2016 at 0:30

You want other threads that might be spinning on the same lock to see the update as soon as possible. MOV isn't good enough for that. See stackoverflow.com/questions/19652824/… – Canzona 20/4, 2016 at 13:38

@HansPassant I don't see anything in that question or the answers to it supporting this dubious claim. – Rosalbarosalee 20/4, 2016 at 16:43

Is spinning on xchg ideal? With counting locks it's much better to spin on just a load, and only try taking the lock if you see it become unlocked. Spinning on xchg will potentially delay the unlocker's xchg from happening. If you don't write to the lock at all while it's locked, the core that owns the lock will still own the cache line when it tries to unlock, right? – Cyaneous 21/4, 2016 at 0:10

@PeterCordes will XCHGing 1 by 1 ( = no change at all in memory) change the cache line ownership ? – Blanchblancha 21/4, 2016 at 13:19

@Tommylee2k: hmm, good question. I think so, since the docs say a locked load is always followed by a locked store as far as externally-visible behaviour. But I think that optimization would be possible, as long as the memory-barrier effect still happened. (e.g. MFENCE does it without a locked bus cycle). Might be worth testing with an experiment if you have the time. Can two threads can run at un-contended speed running xchg [mem],eax when [mem]=eax? – Cyaneous 21/4, 2016 at 13:27

The unlock does need to have release semantics to properly protect the critical section. But it doesn't need sequential-consistency. Atomicity isn't really the issue (see below).

So yes, on x86 a simple store is safe, and glibc's pthread_spin_unlock does so::

    movl    $1, (%rdi)
    xorl    %eax, %eax
    retq

See also a simple but maybe usable x86 spinlock implementation I wrote in this answer, using a read-only spin loop with a pause instruction.

Possibly this code was adapted from a bit-field version.

Unlocking with btr to zero one flag in a bitfield isn't safe, because it's a non-atomic read-modify-write of the containing byte (or the containing naturally-aligned 4 byte dword or 2 byte word).

So maybe whoever wrote it didn't realize that simple stores to aligned addresses are atomic on x86, like on most ISAs. But what x86 has that weakly-ordered ISAs don't is that every store has release semantics. An xchg to release the lock makes every unlock a full memory barrier, which goes beyond normal locking semantics. (Although on x86, taking a lock will be a full barrirer, because there's no way to do an atomic RMW or atomic compare-and-swap without an xchg or other locked instruction, and those are full barriers like mfence.)

The unlocking store doesn't technically need to be atomic, since we only ever store zero or 1, so only the lower byte matters. e.g. I think it would still work if the lock was unaligned and split across a cache-line boundary. Tearing can happen but doesn't matter, and what's really happening is that the low byte of the lock is modified atomically, with operations that always put zeros into the upper 3 bytes.

If you wanted to return the old value to catch double-unlocking bugs, a better implementation would separately load and store:

spin_unlock:
     ;; pre-condition: [locked] is non-zero

     mov     eax,  [locked]        ; old value, for debugging
     mov     dword [locked], 0     ; On x86, this is an atomic store with "release" semantics.

     ;test    eax,eax
     ;jz    double_unlocking_detected    ; or leave this to the caller
     ret

Cyaneous answered 16/5, 2016 at 3:43 Comment(0)

-1

the point of

spin_unlock:
     mov     eax, 0          ; Set the EAX register to 0.

     xchg    eax, [locked]   ; Atomically swap the EAX register with
                             ;  the lock variable.

     ret                     ; The lock has been released.

is not only to unlock, but also to fill eax with the correct return value ( 1 if unlocked, 0 otherwise)

if the lock was not obtained before calling spin_unlock ( [locked] holds the value 0 in this case), spin_unlock should return 0

Blanchblancha answered 20/4, 2016 at 12:29 Comment(5)

If the lock was not held by the thread/process doing the unlock, the behavior is undefined anyway. – Rosalbarosalee 20/4, 2016 at 12:59

in this case, unlock() returns 0, which isn't the worst of options – Blanchblancha 21/4, 2016 at 7:50

But if the process doing the unlocking didn't hold the lock, another may have. And now the lock is unlocked while the other process believes it held, and another process can acquire the lock. The only advantage is that you can now panic(), but you probably have data corruption already. – Rosalbarosalee 21/4, 2016 at 11:24

depends on if you define incorrect usage of locking to be business of the locking mechanism itself. I prefer getting a return value, and decide myself what to do with it :-) ... – Blanchblancha 21/4, 2016 at 13:14

If you want a return value to detect locking errors, then you should just mov eax, [locked] / mov [locked], 0. You're supposed to be holding the lock already when you call unlock; this will still catch double-unlock bugs. I think the point here is the MFENCE effect of using xchg, to give stronger-than-release semantics to the store. As people said in comments on the question, this shouldn't be needed since x86 is strongly ordered. An SFENCE could provide release semantics for NT stores, but usually that's done in the code that did NT stores, not in the locking. – Cyaneous 16/5, 2016 at 2:49

Recommended topics

Hot tags