I'm trying to understand the purpose of std::atomic_thread_fence(std::memory_order_seq_cst);
fences, and how they're different from acq_rel
fences.
So far my understanding is that the only difference is that seq-cst fences affect the global order of seq-cst operations ([atomics.order]/4
). And said order can only be observed if you actually perform seq-cst loads.
So I'm thinking that if I have no seq-cst loads, then I can replace all my seq-cst fences with acq-rel fences without changing the behavior. Is that correct?
And if that's correct, why am I seeing code like this "implementation Dekker's algorithm with Fences", that uses seq-cst fences, while keeping all atomic reads/writes relaxed? Here's the code from that blog post:
std::atomic<bool> flag0(false),flag1(false);
std::atomic<int> turn(0);
void p0()
{
flag0.store(true,std::memory_order_relaxed);
std::atomic_thread_fence(std::memory_order_seq_cst);
while (flag1.load(std::memory_order_relaxed))
{
if (turn.load(std::memory_order_relaxed) != 0)
{
flag0.store(false,std::memory_order_relaxed);
while (turn.load(std::memory_order_relaxed) != 0)
{
}
flag0.store(true,std::memory_order_relaxed);
std::atomic_thread_fence(std::memory_order_seq_cst);
}
}
std::atomic_thread_fence(std::memory_order_acquire);
// critical section
turn.store(1,std::memory_order_relaxed);
std::atomic_thread_fence(std::memory_order_release);
flag0.store(false,std::memory_order_relaxed);
}
void p1()
{
flag1.store(true,std::memory_order_relaxed);
std::atomic_thread_fence(std::memory_order_seq_cst);
while (flag0.load(std::memory_order_relaxed))
{
if (turn.load(std::memory_order_relaxed) != 1)
{
flag1.store(false,std::memory_order_relaxed);
while (turn.load(std::memory_order_relaxed) != 1)
{
}
flag1.store(true,std::memory_order_relaxed);
std::atomic_thread_fence(std::memory_order_seq_cst);
}
}
std::atomic_thread_fence(std::memory_order_acquire);
// critical section
turn.store(0,std::memory_order_relaxed);
std::atomic_thread_fence(std::memory_order_release);
flag1.store(false,std::memory_order_relaxed);
}
std::atomic_thread_fence(std::memory_order_acq_rel);
to zero instructions for x86, rather than an mfence or equivalent, so it's not preventing StoreLoad reordering. So if your hypothesis is right, that would mean nothing (?) in a program would need to block StoreLoad reordering across acq_rel fences to satisfy any of the ISO C++ requirements that apply to programs without SC loads. – Heisecompare_exchange_weak
with seq_cst fail order, might be relevant, or might just be equivalent to fetch_add(0), I forget. – Heiseseq_cst
C++ fence). I forget the details of how ISO C++ rules connect SC fences to "happens-before"s that govern what other threads are allowed to observe :/ – Heiseseq_cst
only just as strong as ISO C++ requires, not a lot stronger like most other ISAs, thus more efficient. See The strong-ness of x86 store instruction wrt. SC-DRF? – Heise