In C++, is there any effective difference between a acquire/release atomic access and a relaxed access combined with a fence?

E

2

5

Specifically, is there any effective difference between:

i = a.load(memory_order_acquire);

or

a.store(5, memory_order_release);

and

atomic_thread_fence(memory_order_acquire);
i = a.load(memory_order_relaxed);

or

a.store(5, memory_order_relaxed);
atomic_thread_fence(memory_order_release);

respectively?

Do non-relaxed atomic accesses provide signal fences as well as thread fences?

Exudate answered 11/2, 2017 at 14:51 Comment(0)

K

3

You need

atomic_thread_fence(memory_order_release);
a.store(5, memory_order_relaxed);

and

i = a.load(memory_order_relaxed);
atomic_thread_fence(memory_order_acquire);

To replace

a.store(5, memory_order_release);

and

i = a.load(memory_order_acquire);

Non-relaxed atomic accesses do provide signal fences as well as thread fences.

Knowhow answered 11/2, 2017 at 15:47 Comment(0)

P

4

In your code, for both load and store, the order between the fence and the atomic operation should be reversed and then it is similar to the standalone operations, but there are differences.

Acquire and release operations on atomic variables act as one-way barriers, but in opposite directions. That is, a store/release operation prevents memory operations that precede it (in the program source) from being reordered after it, while a load/acquire operation prevents memory operations that follow it from being reordered before it.

// thread 1
// shared memory operations A
a.store(5, std::memory_order_release);

x = 42; // regular int


// thread 2
while (a.load(std::memory_order_acquire) != 5);
// shared memory operations B

Memory operations A cannot move down below the store/release, while memory operations B cannot move up above the load/acquire. As soon as thread 2 reads 5, memory operation A are visible to B and synchronization is complete.
Being a one-way barrier, the write to x can join, or even precede, memory operations A, but since it is not part of the acquire/release relationship x cannot be reliably accessed by thread 2.

Replacing the atomic operations with standalone thread fences and relaxed operations is similar:

// thread 1
// shared memory operations A
std::atomic_thread_fence(memory_order_release);
a.store(5, std::memory_order_relaxed);


// thread 2
while (a.load(std::memory_order_relaxed) != 5);
std::atomic_thread_fence(memory_order_acquire);
// shared memory operations B

This achieves the same result but an important difference is that both fences do not act as one-way barriers; If they did, the atomic store to a could be reordered before the release fence and the atomic load from a could be reordered after the acquire fence and that would break the synchronization relationship.

In general:

A standalone release fence prevents preceding operations from being reordered with (atomic) stores that follow it.
A standalone acquire fence prevents following operations from being reordered with (atomic) loads that precede it.

The standard allows Acquire/Release fences to be mixed with Acquire/Release operations.

Do non-relaxed atomic accesses provide signal fences as well as thread fences?

It is not fully clear to me what you are asking here because thread fences are normally used with relaxed atomic operations, but std::thread_signal_fence is similar to a std::atomic_thread_fence, except that it is supposed to operate within the same thread and therefore the compiler does not generate CPU instructions for inter-thread synchronization. It basically acts as a compiler-only barrier.

Phenacite answered 11/2, 2017 at 21:59 Comment(5)

// shared memory operations A should this be // shared memory operations B instead? Or am I wrong? it seems operations on B makes the answer more sensible to me at least :) +1'ed anyway – Cortex 3/11, 2019 at 15:41

@Cortex It looks correct to me.. Operations A are sequenced before the release operation/fence (i.e. they come earlier in program order) whereas operations B are sequenced after the acquire. With the given relationship (i.e. a.load returns 5), operations A 'happen before' operations B. For example, if A (thread 1) includes a store to a non-atomic integer: var = 42, a load from var in B (thread 2) is guaranteed to return 42 if a.load has returned 5 – Phenacite 5/11, 2019 at 11:9

I see your point now. Thanks! I'd really appreciate if you could recommend some resources to learn this stuff :) – Cortex 5/11, 2019 at 12:42

@Cortex Jeff Preshing has written a number of great articles on his blog – Phenacite 6/11, 2019 at 1:7

@PYA: preshing.com/20131125/… is specifically about the difference between operations (1-way reordering) vs. fences (2-way LoadLoad + LoadStore barrier for example) for acquire and release. – Uella 20/9, 2023 at 17:39

K

3

You need

atomic_thread_fence(memory_order_release);
a.store(5, memory_order_relaxed);

and

i = a.load(memory_order_relaxed);
atomic_thread_fence(memory_order_acquire);

To replace

a.store(5, memory_order_release);

and

i = a.load(memory_order_acquire);

Non-relaxed atomic accesses do provide signal fences as well as thread fences.

Knowhow answered 11/2, 2017 at 15:47 Comment(0)

Recommended topics

Hot tags