The load from v
synchronizes with whichever of the two stores wrote the value that v.load()
returns.
The standard itself makes this more explicit. See n3337 atomics.order p2: "An atomic operation A that performs a release operation on an atomic object M synchronizes with an atomic operation B that performs an acquire operation on M and takes its value from any side effect in the release sequence headed by A."
To illustrate this, here's an example:
int a,b;
std::atomic<int> v = 0;
void thread_A() {
a = 42;
v.store(10, std::memory_order_release);
}
void thread_B() {
b = 17;
v.store(20, std::memory_order_release);
}
void thread_C() {
switch (v.load(std::memory_order_acquire)) {
case 10:
// thread A must have done this store
std::cout << a; // ok, prints 42
std::cout << b; // UB, data race
break;
case 20:
// thread B must have done this store
std::cout << a; // UB, data race
std::cout << b; // ok, prints 17
break;
case 0:
// neither A or B has done its store
std::cout << a; // UB, data race
std::cout << b; // UB, data race
break;
}
}
So if v.load()
in thread C returns 10, we know from our program's logic that this value must have been stored by the v.store()
in thread A; nowhere else in our program could have done it. Because of the release ordering on that store, all previous writes made by thread A are also visible. We can safely read from the non-atomic variable a
, and we are guaranteed to get the value 42
.
More formally, the v.store(10)
synchronizes with the v.load()
that returns 10, and the v.load()
is sequenced before the cout << a
, so v.store(10)
inter-thread happens before cout << a
(intro.multithread p11). And a = 42
is sequenced before v.store(10)
, which as we said inter-thread happens before cout << a
, so a = 42
inter-thread happens before cout << a
; in particular a = 42
happens before cout << a
(p12) and so there is no data race (p21). Moreover a = 42
is now a visible side effect with respect to cout << a
(p13), and there are no other side effects on a
to be seen, so the value of the evaluation of a
in cout << a
shall be the value stored by a = 42
, namely 42
.
But in this case, since v.load()
returned 10 and not 20, we don't know whether the v.store()
in thread B has happened yet. Perhaps it did and has since been overwritten by the store in thread A. Or perhaps it didn't happen at all. So we can't prove that b = 17
happens before cout << b
, nor vice versa, and thus this is a data race which causes undefined behavior.
The case where v.load()
returns 20 is similar, but reversed. If v.load()
returns 0, then neither of the two stores has occurred, and it is a data race to access either a
or b
.
As you can see, this is only useful if threads A and B store different values. If we change the program so that A and B both do v.store(10, std::memory_order_release)
, then having thread C observe that v.load() == 10
tells us nothing about which of the two threads did the store. The load synchronizes with one of them, but we don't know which. Therefore, in this case, thread C cannot safely access either a
or b
, because either could be the one that is in a data race.
The cppreference text, taken out of context, could make it sound like the mere act of doing v.load(std::memory_order_acquire)
will cause the thread to actually wait for some or all other stores in other threads to complete, sort of like a mutex or a std::latch
. You would not be the first to have misread it that way. But that wouldn't make sense - a load is just a load after all. It returns the value that v
happens to have at that particular instant in time, without blocking or waiting for any event from any other thread.
See also Why does this cppreference excerpt seem to wrongly suggest that atomics can protect critical sections?
w
and then a release store of 42 tov
(whose previous value was, say, 0). B does an acquire load ofv
and then a load ofw
. If the value that B loaded fromv
is equal to 42, then the value it loaded fromw
is guaranteed to be equal to 17. – Schopenhauerism