Understanding std::atomic::compare_exchange_weak() in C++11
Asked Answered
F

5

109
bool compare_exchange_weak (T& expected, T val, ..);

compare_exchange_weak() is one of compare-exchange primitives provided in C++11. It's weak in the sense that it returns false even if the value of the object is equal to expected. This is due to spurious failure on some platforms where a sequence of instructions (instead of one as on x86) are used to implement it. On such platforms, context switch, reloading of the same address (or cache line) by another thread, etc can fail the primitive. It's spurious as it's not the value of the object (not equal to expected) that fails the operation. Instead, it's kind of timing issues.

But what puzzles me is what's said in C++11 Standard (ISO/IEC 14882),

29.6.5 .. A consequence of spurious failure is that nearly all uses of weak compare-and-exchange will be in a loop.

Why does it have to be in a loop in nearly all uses ? Does that mean we shall loop when it fails because of spurious failures? If that's the case, why do we bother use compare_exchange_weak() and write the loop ourselves? We can just use compare_exchange_strong() which I think should get rid of spurious failures for us. What are the common use cases of compare_exchange_weak()?

Another question related. In his book "C++ Concurrency In Action" Anthony says,

//Because compare_exchange_weak() can fail spuriously, it must typically
//be used in a loop:

bool expected=false;
extern atomic<bool> b; // set somewhere else
while(!b.compare_exchange_weak(expected,true) && !expected);

//In this case, you keep looping as long as expected is still false,
//indicating that the compare_exchange_weak() call failed spuriously.

Why is !expected there in the loop condition? Does it there to prevent that all threads may starve and make no progress for some time?

One last question

On platforms that no single hardware CAS instruction exist, both the weak and strong version are implemented using LL/SC (like ARM, PowerPC, etc). So is there any difference between the following two loops? Why, if any? (To me, they should have similar performance.)

// use LL/SC (or CAS on x86) and ignore/loop on spurious failures
while (!compare_exchange_weak(..))
{ .. }

// use LL/SC (or CAS on x86) and ignore/loop on spurious failures
while (!compare_exchange_strong(..)) 
{ .. }

I come up w/ this last question you guys all mention that there maybe a performance difference inside a loop. It's also mentioned by the C++11 Standard (ISO/IEC 14882):

When a compare-and-exchange is in a loop, the weak version will yield better performance on some platforms.

But as analyzed above, two versions in a loop should give the same/similar performance. What's the thing I miss?

Feverish answered 8/8, 2014 at 9:11 Comment(5)
W/r/t the first question, in many cases you need to loop anyways (whether you use the strong or the weak version), and the weak version may have better performance than the strong one.Garmon
Both weak and strong CAS are implemented "using LL/SC", in the same way that both bubble sort and quicksort are implemented "using swap"; that is, in the sense that that is the primitive operation used to get the task done. What they wrap around LL/SC is very different. Weak CAS is just LL/SC. Strong CAS is LL/SC with a bunch of other stuff.Wilkerson
forums.manning.com/posts/list/33062.page does it help?Deen
@TuXiaomi with the answer in that link, I can't see why "the weak version will yield better performance on some platforms" as stated in the Standard.Googins
@Googins On others, compare_exchange_weak can fail spuriously, due to interrupts or actions of other processors or threads. On those platforms, compare_exchange_strong is effectively a loop on compare_exchange_weak - if it failed spuriously then it loops again. Does it help? Maybe I am wrongDeen
F
22

I'm trying to answer this myself, after going through various online resources (e.g., this one and this one), the C++11 Standard, as well as the answers given here.

The related questions are merged (e.g., "why !expected ?" is merged with "why put compare_exchange_weak() in a loop ?") and answers are given accordingly.


Why does compare_exchange_weak() have to be in a loop in nearly all uses?

Typical Pattern A

You need achieve an atomic update based on the value in the atomic variable. A failure indicates that the variable is not updated with our desired value and we want to retry it. Note that we don't really care about whether it fails due to concurrent write or spurious failure. But we do care that it is us that make this change.

expected = current.load();
do desired = function(expected);
while (!current.compare_exchange_weak(expected, desired));

A real-world example is for several threads to add an element to a singly linked list concurrently. Each thread first loads the head pointer, allocates a new node and appends the head to this new node. Finally, it tries to swap the new node with the head.

Another example is to implement mutex using std::atomic<bool>. At most one thread can enter the critical section at a time, depending on which thread first set current to true and exit the loop.

Typical Pattern B

This is actually the pattern mentioned in Anthony's book. In contrary to pattern A, you want the atomic variable to be updated once, but you don't care who does it. As long as it's not updated, you try it again. This is typically used with boolean variables. E.g., you need implement a trigger for a state machine to move on. Which thread pulls the trigger is regardless.

expected = false;
// !expected: if expected is set to true by another thread, it's done!
// Otherwise, it fails spuriously and we should try again.
while (!current.compare_exchange_weak(expected, true) && !expected);

Note that we generally cannot use this pattern to implement a mutex. Otherwise, multiple threads may be inside the critical section at the same time.

That said, it should be rare to use compare_exchange_weak() outside a loop. On the contrary, there are cases that the strong version is in use. E.g.,

bool criticalSection_tryEnter(lock)
{
  bool flag = false;
  return lock.compare_exchange_strong(flag, true);
}

compare_exchange_weak is not proper here because when it returns due to spurious failure, it's likely that no one occupies the critical section yet.

Starving Thread?

One point worth mentioning is that what happens if spurious failures continue to happen thus starving the thread? Theoretically it could happen on platforms when compare_exchange_XXX() is implement as a sequence of instructions (e.g., LL/SC). Frequent access of the same cache line between LL and SC will produce continuous spurious failures. A more realistic example is due to a dumb scheduling where all concurrent threads are interleaved in the following way.

Time
 |  thread 1 (LL)
 |  thread 2 (LL)
 |  thread 1 (compare, SC), fails spuriously due to thread 2's LL
 |  thread 1 (LL)
 |  thread 2 (compare, SC), fails spuriously due to thread 1's LL
 |  thread 2 (LL)
 v  ..

Can it happen?

It won't happen forever, fortunately, thanks to what C++11 requires:

Implementations should ensure that weak compare-and-exchange operations do not consistently return false unless either the atomic object has value different from expected or there are concurrent modifications to the atomic object.

Why do we bother use compare_exchange_weak() and write the loop ourselves? We can just use compare_exchange_strong().

It depends.

Case 1: When both need to be used inside a loop. C++11 says:

When a compare-and-exchange is in a loop, the weak version will yield better performance on some platforms.

On x86 (at least currently. Maybe it'll resort to a similiar scheme as LL/SC one day for performance when more cores are introduced), the weak and strong version are essentially the same because they both boil down to the single instruction cmpxchg. On some other platforms where compare_exchange_XXX() isn't implemented atomically (here meaning no single hardware primitive exists), the weak version inside the loop may win the battle because the strong one will have to handle the spurious failures and retry accordingly.

But,

rarely, we may prefer compare_exchange_strong() over compare_exchange_weak() even in a loop. E.g., when there is a lot of things to do between atomic variable is loaded and a calculated new value is exchanged out (see function() above). If the atomic variable itself doesn't change frequently, we don't need repeat the costly calculation for every spurious failure. Instead, we may hope that compare_exchange_strong() "absorb" such failures and we only repeat calculation when it fails due to a real value change.

Case 2: When only compare_exchange_weak() need to be used inside a loop. C++11 also says:

When a weak compare-and-exchange would require a loop and a strong one would not, the strong one is preferable.

This is typically the case when you loop just to eliminate spurious failures from the weak version. You retry until exchange is either successful or failed because of concurrent write.

expected = false;
// !expected: if it fails spuriously, we should try again.
while (!current.compare_exchange_weak(expected, true) && !expected);

At best, it's reinventing the wheels and perform the same as compare_exchange_strong(). Worse? This approach fails to take full advantage of machines that provide non-spurious compare-and-exchange in hardware.

Last, if you loop for other things (e.g., see "Typical Pattern A" above), then there is a good chance that compare_exchange_strong() shall also be put in a loop, which brings us back to the previous case.

Feverish answered 9/8, 2014 at 9:51 Comment(1)
On this point: "It won't happen forever, fortunately, thanks to what C++11 requires: Implementations should ensure ..." My understanding is that the word "should" here does not mean it's a requirement (if it was a requirement, it would say "must" instead). So, this clause actually doesn't guarantee that it won't loop forever. Right?Pacheco
P
89

Why doing exchange in a loop?

Usually, you want your work to be done before you move on, thus, you put compare_exchange_weak into a loop so that it tries to exchange until it succeeds (i.e., returns true).

Note that also compare_exchange_strong is often used in a loop. It does not fail due to spurious failure, but it does fail due to concurrent writes.

Why to use weak instead of strong?

Quite easy: Spurious failure does not happen often, so it is no big performance hit. In constrast, tolerating such a failure allows for a much more efficient implementation of the weak version (in comparison to strong) on some platforms: strong must always check for spurious failure and mask it. This is expensive.

Thus, weak is used because it is a lot faster than strong on some platforms

When should you use weak and when strong?

The reference states hints when to use weak and when to use strong:

When a compare-and-exchange is in a loop, the weak version will yield better performance on some platforms. When a weak compare-and-exchange would require a loop and a strong one would not, the strong one is preferable.

So the answer seems to be quite simple to remember: If you would have to introduce a loop only because of spurious failure, don't do it; use strong. If you have a loop anyway, then use weak.

Why is !expected in the example

It depends on the situation and its desired semantics, but usually it is not needed for correctness. Omitting it would yield a very similar semantics. Only in a case where another thread might reset the value to false, the semantics could become slightly different (yet I cannot find a meaningful example where you would want that). See Tony D.'s comment for a detailed explanation.

It is simply a fast track when another thread writes true: Then the we abort instead of trying to write true again.

About your last question

But as analyzed above, two versions in a loop should give the same/similar performance. What's the thing I miss?

From Wikipedia:

Real implementations of LL/SC do not always succeed if there are no concurrent updates to the memory location in question. Any exceptional events between the two operations, such as a context switch, another load-link, or even (on many platforms) another load or store operation, will cause the store-conditional to spuriously fail. Older implementations will fail if there are any updates broadcast over the memory bus.

So, LL/SC will fail spuriously on context switch, for example. Now, the strong version would bring its "own small loop" to detect that spurious failure and mask it by trying again. Note that this own loop is also more complicated than a usual CAS loop, since it must distinguish between spurious failure (and mask it) and failure due to concurrent access (which results in a return with value false). The weak version does not have such own loop.

Since you provide an explicit loop in both examples, it is simply not necessary to have the small loop for the strong version. Consequently, in the example with the strong version, the check for failure is done twice; once by compare_exchange_strong (which is more complicated since it must distinguish spurious failure and concurrent acces) and once by your loop. This expensive check is unnecessary and the reason why weak will be faster here.

Also note that your argument (LL/SC) is just one possibility to implement this. There are more platforms that have even different instruction sets. In addition (and more importantly), note that std::atomic must support all operations for all possible data types, so even if you declare a ten million byte struct, you can use compare_exchange on this. Even when on a CPU that does have CAS, you cannot CAS ten million bytes, so the compiler will generate other instructions (probably lock acquire, followed by a non-atomic compare and swap, followed by a lock release). Now, think of how many things can happen while swapping ten million bytes. So while a spurious error may be very rare for 8 byte exchanges, it might be more common in this case.

So in a nutshell, C++ gives you two semantics, a "best effort" one (weak) and a "I will do it for sure, no matter how many bad things might happen inbetween" one (strong). How these are implemented on various data types and platforms is a totally different topic. Don't tie your mental model to the implementation on your specific platform; the standard library is designed to work with more architectures than you might be aware of. The only general conclusion we can draw is that guaranteeing success is usually more difficult (and thus may require additional work) than just trying and leaving room for possible failure.

Phagy answered 8/8, 2014 at 9:21 Comment(9)
"Only use strong if by no means you can tolerate spurious failure." - is there really an algorithm that distinguishes between failures due to concurrent writes and spurious failure? All the one's I can think of either allow us to miss updates sometimes or don't in which case we need a loop anyhow.Organo
@Voo: Updated answer. Now hints from the reference are included. There may be an algorithm that does make a distinction. For example, consider a "one must update it" semantics: Updating something must be done exactly once, so once we fail due to concurrent write, we know someone else did it and we can abort. If we fail due to spurious failure, than nobody has updated it, so we must retry.Phagy
"Why is !expected in the example? It is not needed for correctness. Omitting it would yield the same semantics." - not so... if say the first exchange fails because it finds b is already true, then - with expected now true - without && !expected it loops and tries another (silly) exchange of true and true which may well "succeed" trivially breaking from the while loop, but could exhibit meaningfully different behaviour if b had meanwhile changed back to false, in which case the loop would continue and may ultimately set b true yet again before breaking.Atreus
@TonyD: Right I should clarify that.Phagy
Sorry guys, I added one more last question;)Feverish
@EricZ: Added a paragraph for itPhagy
My brain is going to melt or fall out or something but omg at least I understand it...Boarer
Do you have a reference for the fact that compare_exchange_strong() can fail due to concurrent writes? This seems basic, but I can't find confirmation on cppreference.com.Doiron
Why even use a compare_exchange operation with !expected? Couldn't you do the same thing with just a store()?Galipot
F
22

I'm trying to answer this myself, after going through various online resources (e.g., this one and this one), the C++11 Standard, as well as the answers given here.

The related questions are merged (e.g., "why !expected ?" is merged with "why put compare_exchange_weak() in a loop ?") and answers are given accordingly.


Why does compare_exchange_weak() have to be in a loop in nearly all uses?

Typical Pattern A

You need achieve an atomic update based on the value in the atomic variable. A failure indicates that the variable is not updated with our desired value and we want to retry it. Note that we don't really care about whether it fails due to concurrent write or spurious failure. But we do care that it is us that make this change.

expected = current.load();
do desired = function(expected);
while (!current.compare_exchange_weak(expected, desired));

A real-world example is for several threads to add an element to a singly linked list concurrently. Each thread first loads the head pointer, allocates a new node and appends the head to this new node. Finally, it tries to swap the new node with the head.

Another example is to implement mutex using std::atomic<bool>. At most one thread can enter the critical section at a time, depending on which thread first set current to true and exit the loop.

Typical Pattern B

This is actually the pattern mentioned in Anthony's book. In contrary to pattern A, you want the atomic variable to be updated once, but you don't care who does it. As long as it's not updated, you try it again. This is typically used with boolean variables. E.g., you need implement a trigger for a state machine to move on. Which thread pulls the trigger is regardless.

expected = false;
// !expected: if expected is set to true by another thread, it's done!
// Otherwise, it fails spuriously and we should try again.
while (!current.compare_exchange_weak(expected, true) && !expected);

Note that we generally cannot use this pattern to implement a mutex. Otherwise, multiple threads may be inside the critical section at the same time.

That said, it should be rare to use compare_exchange_weak() outside a loop. On the contrary, there are cases that the strong version is in use. E.g.,

bool criticalSection_tryEnter(lock)
{
  bool flag = false;
  return lock.compare_exchange_strong(flag, true);
}

compare_exchange_weak is not proper here because when it returns due to spurious failure, it's likely that no one occupies the critical section yet.

Starving Thread?

One point worth mentioning is that what happens if spurious failures continue to happen thus starving the thread? Theoretically it could happen on platforms when compare_exchange_XXX() is implement as a sequence of instructions (e.g., LL/SC). Frequent access of the same cache line between LL and SC will produce continuous spurious failures. A more realistic example is due to a dumb scheduling where all concurrent threads are interleaved in the following way.

Time
 |  thread 1 (LL)
 |  thread 2 (LL)
 |  thread 1 (compare, SC), fails spuriously due to thread 2's LL
 |  thread 1 (LL)
 |  thread 2 (compare, SC), fails spuriously due to thread 1's LL
 |  thread 2 (LL)
 v  ..

Can it happen?

It won't happen forever, fortunately, thanks to what C++11 requires:

Implementations should ensure that weak compare-and-exchange operations do not consistently return false unless either the atomic object has value different from expected or there are concurrent modifications to the atomic object.

Why do we bother use compare_exchange_weak() and write the loop ourselves? We can just use compare_exchange_strong().

It depends.

Case 1: When both need to be used inside a loop. C++11 says:

When a compare-and-exchange is in a loop, the weak version will yield better performance on some platforms.

On x86 (at least currently. Maybe it'll resort to a similiar scheme as LL/SC one day for performance when more cores are introduced), the weak and strong version are essentially the same because they both boil down to the single instruction cmpxchg. On some other platforms where compare_exchange_XXX() isn't implemented atomically (here meaning no single hardware primitive exists), the weak version inside the loop may win the battle because the strong one will have to handle the spurious failures and retry accordingly.

But,

rarely, we may prefer compare_exchange_strong() over compare_exchange_weak() even in a loop. E.g., when there is a lot of things to do between atomic variable is loaded and a calculated new value is exchanged out (see function() above). If the atomic variable itself doesn't change frequently, we don't need repeat the costly calculation for every spurious failure. Instead, we may hope that compare_exchange_strong() "absorb" such failures and we only repeat calculation when it fails due to a real value change.

Case 2: When only compare_exchange_weak() need to be used inside a loop. C++11 also says:

When a weak compare-and-exchange would require a loop and a strong one would not, the strong one is preferable.

This is typically the case when you loop just to eliminate spurious failures from the weak version. You retry until exchange is either successful or failed because of concurrent write.

expected = false;
// !expected: if it fails spuriously, we should try again.
while (!current.compare_exchange_weak(expected, true) && !expected);

At best, it's reinventing the wheels and perform the same as compare_exchange_strong(). Worse? This approach fails to take full advantage of machines that provide non-spurious compare-and-exchange in hardware.

Last, if you loop for other things (e.g., see "Typical Pattern A" above), then there is a good chance that compare_exchange_strong() shall also be put in a loop, which brings us back to the previous case.

Feverish answered 9/8, 2014 at 9:51 Comment(1)
On this point: "It won't happen forever, fortunately, thanks to what C++11 requires: Implementations should ensure ..." My understanding is that the word "should" here does not mean it's a requirement (if it was a requirement, it would say "must" instead). So, this clause actually doesn't guarantee that it won't loop forever. Right?Pacheco
T
19

Why does it have to be in a loop in nearly all uses ?

Because if you don't loop and it fails spuriously your program hasn't done anything useful - you didn't update the atomic object and you don't know what its current value is (Correction: see comment below from Cameron). If the call doesn't do anything useful what's the point of doing it?

Does that mean we shall loop when it fails because of spurious failures?

Yes.

If that's the case, why do we bother use compare_exchange_weak() and write the loop ourselves? We can just use compare_exchange_strong() which I think should get rid of spurious failures for us. What are the common use cases of compare_exchange_weak()?

On some architectures compare_exchange_weak is more efficient, and spurious failures should be fairly uncommon, so it might be possible to write more efficient algorithms using the weak form and a loop.

In general it is probably better to use the strong version instead if your algorithm doesn't need to loop, as you don't need to worry about spurious failures. If it needs to loop anyway even for the strong version (and many algorithms do need to loop anyway), then using the weak form might be more efficient on some platforms.

Why is !expected there in the loop condition?

The value could have got set to true by another thread, so you don't want to keep looping trying to set it.

Edit:

But as analyzed above, two versions in a loop should give the same/similar performance. What's the thing I miss?

Surely it's obvious that on platforms where spurious failure is possible the implementation of compare_exchange_strong has to be more complicated, to check for spurious failure and retry.

The weak form just returns on spurious failure, it doesn't retry.

Taradiddle answered 8/8, 2014 at 9:22 Comment(6)
+1 Factually accurate on all counts (which the Q desperately needs).Atreus
About you don't know what its current value is in the 1st point, when a spurious failure takes place, shouldn't the current value equal the expected value at that instant? Otherwise, it'd be a real failure.Feverish
IMO, both the weak and strong version are implemented using LL/SC on platforms that no single CAS hardware primitive exists. So to me why is there any performance difference between while(!compare_exchange_weak(..)) and while(!compare_exchange_strong(..))?Feverish
Sorry guys, I added one more last question.Feverish
A spurious failure can still happen even if the current value equals the expected one. Sneftel's answer explains why the strong form does extra work, but surely it's obvious that if it has to check for spurious failure and retry that is more work than just returning false?Taradiddle
@Jonathan: Just a nitpick, but you do know the current value if it fails spuriously (of course, whether that's still the current value by the time you read the variable is another issue entirely, but that's regardless of weak/strong). I've used this to, for example, attempt to set a variable assuming its value is null, and if fails (spuriously or not) keep trying but only depending on what the actual value is.Isosceles
W
14

Alright, so I need a function which performs atomic left-shifting. My processor doesn't have a native operation for this, and the standard library doesn't have a function for it, so it looks like I'm writing my own. Here goes:

void atomicLeftShift(std::atomic<int>* var, int shiftBy)
{
    do {
        int oldVal = std::atomic_load(var);
        int newVal = oldVal << shiftBy;
    } while(!std::compare_exchange_weak(oldVal, newVal));
}

Now, there's two reasons that loop might be executed more than once.

  1. Someone else changed the variable while I was doing my left shift. The results of my computation should not be applied to the atomic variable, because it would effectively erase that someone else's write.
  2. My CPU burped and the weak CAS spuriously failed.

I honestly don't care which one. Left shifting is fast enough that I may as well just do it again, even if the failure was spurious.

What's less fast, though, is the extra code that strong CAS needs to wrap around weak CAS in order to be strong. That code doesn't do much when the weak CAS succeeds... but when it fails, strong CAS needs to do some detective work to determine whether it was Case 1 or Case 2. That detective work takes the form of a second loop, effectively inside my own loop. Two nested loops. Imagine your algorithms teacher glaring at you right now.

And as I previously mentioned, I don't care about the result of that detective work! Either way I'm going to be redoing the CAS. So using strong CAS gains me precisely nothing, and loses me a small but measurable amount of efficiency.

In other words, weak CAS is used to implement atomic update operations. Strong CAS is used when you care about the result of CAS.

Wilkerson answered 8/8, 2014 at 15:53 Comment(1)
Is your example code right? On the face of it, it looks like this code never does anything that's observable by the caller of the function. In particular, it looks like the old value of *var gets read, but no new value ever gets written to *var, or to anything else the caller can see afterwards, which means the function is a no-op (and thus may be elided by the compiler by the as-if rule). OTOH I'm looking for a reference to std::compare_exchange_weak and finding nothing, so I'm thinking maybe you meant to say var->compare_exchange_weak instead?Pacheco
L
0

I think most of the answers above address "spurious failure" as some kind of problem, performance VS correctness tradeoff.

It can be seen as the weak version is faster most of the times, but in case of spurious failure, it becomes slower. And the strong version is a version that has no possibility of spurious failure, but it is almost always slower.

For me, the main difference is how these two version handle the ABA problem:

weak version will succeed only if noone has touched the cache line between load and store, so it will 100% detect ABA problem.

strong version will fail only if the comparison fails, so it will not detect ABA problem without extra measures.

So, in theory, if you use weak version on weak-ordered architecture, you don't need ABA detection mechanism and the implementation will be much simpler, giving better performance.

But, on x86 (strong-ordered architecture), weak version and strong version are the same, and they both suffer from ABA problem.

So if you write a completely cross-platform algorithm, you need to address ABA problem anyway, so there is no performance benefit from using the weak version, but there is a performance penalty for handling spurious failures.

In conclusion - for portability and performance reasons, the strong version is always a better-or-equal option.

Weak version can only be a better option if it lets you skip ABA countermeasures completely or your algorithm doesn't care about ABA.

Lipfert answered 6/2, 2020 at 15:57 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.