Why isn't a C++11 acquire_release fence enough for Dekker synchronization?

Asked 2/12, 2014 at 11:57 Answered 3/12, 2014 at 10:52

Solved multithreading c++11 synchronization atomic memory-fences

The failure of Dekker-style synchronization is typically explained with reordering of instructions. I.e., if we write

atomic_int X;
atomic_int Y;
int r1, r2;
static void t1() { 
    X.store(1, std::memory_order_relaxed)
    r1 = Y.load(std::memory_order_relaxed);
}
static void t2() {
    Y.store(1, std::memory_order_relaxed)
    r2 = X.load(std::memory_order_relaxed);
}

Then the loads can be reordered with the stores, leading to r1==r2==0.

I was expecting an acquire_release fence to prevent this kind of reordering:

static void t1() {
    X.store(1, std::memory_order_relaxed);
    atomic_thread_fence(std::memory_order_acq_rel);
    r1 = Y.load(std::memory_order_relaxed);
}
static void t2() {
    Y.store(1, std::memory_order_relaxed);
    atomic_thread_fence(std::memory_order_acq_rel);
    r2 = X.load(std::memory_order_relaxed);
}

The load cannot be moved above the fence and the store cannot be moved below the fence, and so the bad result should be prevented.

However, experiments show r1==r2==0 can still occur. Is there a reordering-based explanation for this? Where's the flaw in my reasoning?

Lukelukens answered 2/12, 2014 at 11:57 Comment(0)

As I understand it (mainly from reading Jeff Preshings blog), an atomic_thread_fence(std::memory_order_acq_rel) prevents any reorderings except for StoreLoad, i.e., it still allows to reorder a Store with a subsequent Load. However, this is exactly the reordering that has to be prevented in your example.

More precisely, an atomic_thread_fence(std::memory_order_acquire) prevents the reordering of any previous Load with any subsequent Store and any subsequent Load, i.e., it prevents LoadLoad and LoadStore reorderings across the fence.

An atomic_thread_fence(std::memory_order_release) prevents the reordering of any subsequent Store with any preceding Store and any preceding Load, i.e., it prevents LoadStore and StoreStore reorderings across the fence.

An atomic_thread_fence(std::memory_order_acq_rel) then prevents the union, i.e., it prevents LoadLoad, LoadStore, and StoreStore, which means that only StoreLoad may still happen.

Adventitia answered 3/12, 2014 at 10:52 Comment(0)

memory_order_acq_rel actually behaves just as acquire and release fence at the same place. But the issue is that they do not prevent all the possible reordering, they prevent consequent loads or prior stores from being reordered around the fence. So, prior loads and consequent stores can still pass through the fence.

In Dekker synchronization, it is important to prevent e.g. the load from being reordered before the store in another thread, i.e. before the fence. Now, unroll your loops where this synchronization occurs and you'll get that the load from previous iteration can fall through the fence on the current iteration.

memory_order_seq_cst works fine for Dekker's synchronization because it prevents any reordering across this point. For example, tbb uses it Dekker's algorithm and mfence for the work-stealing.

For better understanding, see great animation from Herb Sutter lecture "Atomic<> weapons 1/2", at 0:43.

Horace answered 2/12, 2014 at 15:50 Comment(0)

Recommended topics

Hot tags