I'm writing some lock-free code, and I came up with an interesting pattern, but I'm not sure if it will behave as expected under relaxed memory ordering.
The simplest way to explain it is using an example:
std::atomic<int> a, b, c;
auto a_local = a.load(std::memory_order_relaxed);
auto b_local = b.load(std::memory_order_relaxed);
if (a_local < b_local) {
auto c_local = c.fetch_add(1, std::memory_order_relaxed);
}
Note that all operations use std::memory_order_relaxed
.
Obviously, on the thread that this is executed on, the loads for a
and b
must be done before the if
condition is evaluated.
Similarly, the read-modify-write (RMW) operation on c
must be done after the condition is evaluated (because it's conditional on that... condition).
What I want to know is, does this code guarantee that the value of c_local
is at least as up-to-date as the values of a_local
and b_local
? If so, how is this possible given the relaxed memory ordering? Is the control dependency together with the RWM operation acting as some sort of acquire fence? (Note that there's not even a corresponding release anywhere.)
If the above holds true, I believe this example should also work (assuming no overflow) -- am I right?
std::atomic<int> a(0), b(0);
// Thread 1
while (true) {
auto a_local = a.fetch_add(1, std::memory_order_relaxed);
if (a_local >= 0) { // Always true at runtime
b.fetch_add(1, std::memory_order_relaxed);
}
}
// Thread 2
auto b_local = b.load(std::memory_order_relaxed);
if (b_local < 777) {
// Note that fetch_add returns the pre-incrementation value
auto a_local = a.fetch_add(1, std::memory_order_relaxed);
assert(b_local <= a_local); // Is this guaranteed?
}
On thread 1, there is a control dependency which I suspect guarantees that a
is always incremented before b
is incremented (but they each keep being incremented neck-and-neck). On thread 2, there is another control dependency which I suspect guarantees that b
is loaded into b_local
before a
is incremented. I also think that the value returned from fetch_add
will be at least as recent as any observed value in b_local
, and the assert
should therefore hold. But I'm not sure, since this departs significantly from the usual memory-ordering examples, and my understanding of the C++11 memory model is not perfect (I have trouble reasoning about these memory ordering effects with any degree of certainty). Any insights would be appreciated!
Update: As bames53 has helpfully pointed out in the comments, given a sufficiently smart compiler, it's possible that an if
could be optimised out entirely under the right circumstances, in which case the relaxed loads could be reordered to occur after the RMW, causing their values to be more up-to-date than the fetch_add
return value (the assert
could fire in my second example). However, what if instead of an if
, an atomic_signal_fence
(not atomic_thread_fence
) is inserted? That certainly can't be ignored by the compiler no matter what optimizations are done, but does it ensure that the code behaves as expected? Is the CPU allowed to do any re-ordering in such a case?
The second example then becomes:
std::atomic<int> a(0), b(0);
// Thread 1
while (true) {
auto a_local = a.fetch_add(1, std::memory_order_relaxed);
std::atomic_signal_fence(std::memory_order_acq_rel);
b.fetch_add(1, std::memory_order_relaxed);
}
// Thread 2
auto b_local = b.load(std::memory_order_relaxed);
std::atomic_signal_fence(std::memory_order_acq_rel);
// Note that fetch_add returns the pre-incrementation value
auto a_local = a.fetch_add(1, std::memory_order_relaxed);
assert(b_local <= a_local); // Is this guaranteed?
Another update: After reading all the responses so far and combing through the standard myself, I don't think it can be shown that the code is correct using only the standard. So, can anyone come up with a counter-example of a theoretical system that complies with the standard and also fires the assert?
fetch_add
operation, and not the control dependency that causes the assertion to be true. I can't find anything that would indicate that a control dependency would cause any additional synchronization beyond what a sequenced-before relationship would. – Postliminyfetch_add
-- so they're synchronizing, I just can't figure out quite in what capacity. Perhaps all sequenced-before relationships would have the same effect here? – Faniaif
is limiting what kind of reordering the compiler can do in practice, although maybe not in theory. On top of that, it would seem that thefetch_add
has to be limiting what values can be seen from the modification order. I think that when youfetch_add
, the value that you get back can't be before any modification that happened-before thefetch_add
. Which wouldn't be the case with a regularload
. – Postliminyif
not being optimized away, though -- so it may end up being implementation defined (though I doubt any compilers will actually be able to optimize away an if that uses an atomic variable since the value could be changed by any thread -- it would have to analyze the behaviour of the entire program to determine the possible values of the atomic var). – Faniaa
andb
such that a compiler smart enough to provea_local>=0
is always true would be prevented from re-ordering the fetch-add onb
to before the fetch-add ona
, for example. – Packthreadstd::atomic_signal_fence(std::memory_order_seq_cst);
was inserted at the top of eachif
? This would stop the compiler from reordering even if it could optimized out theif
, but it would not tell the CPU any additional information -- but can the CPU reorder anything in an unexpected way in that case? Could theassert
fire in that case? – Fania