atomic_compare_exchange vs mutex
Asked Answered
B

2

8

What is the point in replacing a mutex lock with block like this

void stack_push(stack* s, node* n)
{
    node* head;
    do
    {
        head = s->head;
        n->next = head;
    }
    while ( ! atomic_compare_exchange(s->head, head, n));
} 

Can't understand what benefit we can get by replacing mutex with this atomic excange?

Blenny answered 20/9, 2012 at 15:25 Comment(11)
I added a language tag. Please correct if not right.Marcy
The point is that the present code doesn't have locks.Sammysamoan
@KerrekSB: ... but it has a loop instead? Why is a lock generally assumed to be slower?Milden
@Mehrdad: So what? At least one thread is going to break out of the loop, guaranteed.Sammysamoan
@KerrekSB: And the same is not true with a lock?Milden
You might want a read memory barrier before the load of s->head. Not strictly necessary but I suspect it improves performance.Wilbertwilborn
@Mehrdad: No. Everyone has to wait for the thread that holds the lock, which may be delayed indefinitely (or have died).Sammysamoan
@KerrekSB: A thread having died is a moot point -- correctly written code shouldn't do that. And regarding everyone waiting for a lock: is there really an upper time bound for how long a CAS can take? Can't that also take a relatively long time in certain cases?Milden
CAS latency is long as instructions go, but it's peanuts compared to the work the OS has to do to run a mutex. We're talking a single very very slow instruction, verses tons of code and functions calls and a context switch into the kernel.Wilbertwilborn
You probably really need: atomic_compare_exchange(&s->head, &head, n)Austin
just read Anthony Williams "C++ concurrency in action". a mutex is usually implemented by an atomic boolean flag, but with other atomic types you can do more and avoid the locking. In high-contention situations, mutexes can be critically slow.Pitchblack
R
7

It is typically faster than a mutex. That being said, you cannot just simply replace all mutexes with a CAS. A single CAS will swap one reference with another safely among many threads.

If you have a compound function in which one write depends on another read (for example), you would need a mutex to ensure atomicity.

Rosabella answered 20/9, 2012 at 15:37 Comment(1)
Couldn't you use a CAS with memory_order_acq_rel to emulate a lock & have the correct memory ordering?Ranjiv
W
15

There are a number of advantages;

  1. it's a lot quicker (on Windows, like 10x or 100x - not so much on Linux, like 10% better)
  2. it scales MUCH better (although still not enough - only to about 100 logical cores)
  3. it's MUCH cooler and you seem far more intelligent and capable
  4. where no waits or sleeps are required, this code can be used in places where waits or sleeps are forbidden, e.g. interrupt handlers, certain parts of the Windows (DISPATCH_LEVEL) and Linux kernels, etc
Wilbertwilborn answered 20/9, 2012 at 15:42 Comment(9)
Btw why is it slower on Linux?Milden
It isn't. What differs is how quick the mutex is. On Windows, mutexes are slooooow. On Linux, they're damn quick (I think they must be user-mode rather than kernel-mode). So on Windows, the lock-free method is a LOT quicker. On Linux, it's only a bit quicker.Wilbertwilborn
Ahh interesting. Kinda curious how you could have a user-mode-only mutex though... seems like you'd need to cross process boundaries.Milden
@mehrdad: I believe linux's mutexes are built upon a futex, which is an atomic integer shared between processes, and a wait queue in the kernel. As implemented, it only has to go into the kernel when collisions occur. en.wikipedia.org/wiki/FutexCinchonism
@DaveS: Oh, but that's only when there's no contention! Isn't that what Windows does too? This page says, "EnterCriticalSection will not enter the kernel unless there is contention on the lock. If there is no contention on the lock, then the API will obtain the lock in the user space and return without entering privileged mode. If there is contention, then it will follow very similar paths as WaitForSingleObject within the kernel." How is Linux faster?Milden
@Mehrdad: I would recommend testing both implementations then. I'm wondering if the concept that 'Windows mutexes are slow' are based on an earlier implementation, or one of those things that might have been true before, but aren't true now.Cinchonism
Windows has something which really is called a mutex and I meant that. A critical section in Windows is a different entity.Wilbertwilborn
You forgot the negative points: mutexes are a lot simpler to code and to get right. Lock-free coding is a mine field and so easy to get wrong (in particular if you want to really squeeze the best performance out of it by not using sequentially consistent memory order). What's most annoying in multi-threading: that you cannot validate the code by running simple tests, because a multi-threaded code is not deterministic (even serial code isn't, but it's much closer).Pitchblack
@Walter, That's why you have to test your code. The only issue might be if it's incorrect, but the failure happens intermittently with a very low likelihood, such that you need lots of repeats of the tests to catch it. But the same issue can happen with mutexes in that there is a deadlock scenario with low likelihood but that can still happen.Washing
R
7

It is typically faster than a mutex. That being said, you cannot just simply replace all mutexes with a CAS. A single CAS will swap one reference with another safely among many threads.

If you have a compound function in which one write depends on another read (for example), you would need a mutex to ensure atomicity.

Rosabella answered 20/9, 2012 at 15:37 Comment(1)
Couldn't you use a CAS with memory_order_acq_rel to emulate a lock & have the correct memory ordering?Ranjiv

© 2022 - 2024 — McMap. All rights reserved.