Consider this example:
#include <iostream>
#include <atomic>
#include <thread>
#include <chrono>
#include <cassert>
int main(){
std::atomic<int> v = 0;
std::atomic<bool> flag = false;
std::thread t1([&](){
while(!flag.load(std::memory_order::relaxed)){} // #1
assert(v.exchange(2,std::memory_order::relaxed) == 1); // #2
});
std::thread t2([&](){
if(v.exchange(1,std::memory_order::relaxed) == 0){ // #3
flag.store(true, std::memory_order::relaxed); // #4
}
});
t1.join();
t2.join();
}
In this example, the loop at #1
exits only when #4
sets the flag
to be true
, which the flag is set to be true
only when the reading part of the RMW operation at #3
reads 0
. Since the requirement for the reading part of the RMW operation is [atomics.order] p10
Atomic read-modify-write operations shall always read the last value (in the modification order) written before the write associated with the read-modify-write operation.
This implies that no other RMW operations can read the value 0
if the RMW operation at #3
reads 0
. In other words, if #2
could read 0
and write 2
, #3
wouldn't read 0
and #4
wouldn't be executed. In other words, the reading value of the read-modify-write operation is uniquely owned by that operation if all other operations are RMW operations too(this is the essence of how a spin-lock can work).
So, the Q1 is: the assertion at #2
will never fail, right?
However, if #2
is changed to a pure load, something like this:
assert(v.load(std::memory_order::relaxed) == 1); // #2'
According to [intro.races] p18
If a side effect X on an atomic object M happens before a value computation B of M, then the evaluation B takes its value from X or from a side effect Y that follows X in the modification order of M.
The side effect that happens before #2'
is only the initial value 0
, even though the side effect 1
stored at #3
follows 0
in the modification order, the pure load can still read 0
since [intro.races] p18 uses "or", which is also implied by [atomics.order] p11
Recommended practice: The implementation should make atomic stores visible to atomic loads, and atomic loads should observe atomic stores, within a reasonable amount of time.
From the perspective of implementations, there exists a time lag such that the store at #3
is invisible to #2'
within a reasonable time. This result is also implied by the "or" in [intro.races] p18 from the perspective of C++ standard.
Q2:
If the RMW operation at #2
is changed to the pure load like #2'
, the assertion can fail, right?
Q3:
if #2'
can fail and #2
never fails does it mean the RMW is less prone to reading stale values than non-RMW reads, at least in this example?, Does it mean the RMW
is more prone to reading the latter modification in the modification order than non-RMW reads, at least in this example?
addition:
I don't think #2
can be reordered with #1
by the compiler, since #2
is similar to a failed CAS in the spin-lock(i.e. which is a pure load with relaxed memory order), if that reorder exists, the spin-lock won't work as well. Moreover, any reordering in this example by the compiler is observable due to the assertion. However, from the perspective of memory order, #3
does not happen before #2
and vice versa, which is theoretically that #3
may fail. I am not sure. However, this example depends on the logical order of execution, any destruction to the order is observable.
Note:
This is a subsequent question of Is the load part of a read-modify-write operation of atomic object guaranteed to read the last value in modification order compared to load operation?, which has an unclear example and an incomprehensible assumption, which are improved and clear in this question.
#2
fails, that means its reading value is0
, which means#3
cannot read0
, the flag is not set to betrue
, the loop at#1
cannot exist, the#2
is not executed. Hmmm, some paradox here. – Eijkman#2
reads0
and writes2
, it means0
is immediately precede2
, as well as, if#3
reads0
and writes1
, then0
is immediately precede1
, because the mod order is a total order, either1
precede2
or2
precde1
, the one whose written value in the latter violates that rule. – Eijkman#2
or#2'
is executed only when#1
reads0
and writes1
, so it is known that0
precedes1
in the modification order,#2
is forced to read1
while a pure load#2'
may read0
, with this comparison, is the load reading0
considered as reading a "stale" value? – Eijkmanflag
but not view change ofv
. Q3 - no.. load can simply read value from CPU cache, when RMW need do more work and synchronization – Cleanserv
. So we can say that operation in one thread completed before in another begin. Global total order. This is exactly point of synchronization – Cleanserv
we can say - one completed before second begin. if in t1 rmw completed before in t2 begin, then this also before write to flag. but from another side t1 begin only after t2 write to flag. so contradiction. as result we can say that rmw t2 completed before t1 and assert ok. and memory order here not play any role. rmw on v is exactly point of synchronization between t1 and t2. – Cleanser#3
and a read to the same object after#2
, by your logic, since rmw on v is exactly point of synchronization between t1 and t2, so such two operations on the same non-atomic object would not have data race. However, that's not true,#3
does not synchronize with#2
by the definition of formal wording. – Eijkmanv
- impossible say that in some thread in was before after another. If only read the same - before-after undefined. But in case 2 rmw onv
- one rmw completed before another begin. In this sense point of synchronization. And again from this nothing say about another memory/variable changes is visible – Cleanser#2
and#3
had a happens-before relationship, then any non-atomic operations based on them would have that relationship, but it's not true even if both operations are RMW, as long as their memory order arerelaxed
, they cannot form synchronization relationship. – Eijkman#2
is coherence-ordered before#3
or vice versa, however, coherence-ordered is irrelevant to happen-before. – Eijkmanflag = false; int i = 0; //thread 1: i = 1; falg = true(relaxed); // thread2: while(!flag.load(relaxed)){} assert(i==1)
. By your logic, the loop won't exit except it seesflag
is set to betrue
, so the store toi
is completed before readi
, however, it's not true because the memory order isrelaxed
, regardless whether the load offlag
read the value written by the store. – Eijkmani
andflag
. that modification offlag
is visible in another thread, not mean that modification isi
will be also visible. i say absolute another things. that 2 rmw on same v can not overlap. that #2 before #3 or #3 before #2 - this is not related to any memory order (which is always use how minimum 2 memory location). and then i say if A before B - A can not view effect of B (note - i not say that if A before B - B will be view effect of A. i say that A not view effect of B) – CleanserA
is coherence-ordered beforeB
,A
certainly cannot view the side effect produced byB
. However, I don't know how you relate "coherence-order before" with "happen-before" or "synchronization", they are not directly relevant. – Eijkman