In your code, for both load
and store
, the order between the fence and the atomic operation should be reversed and then it is similar to the standalone operations, but there are differences.
Acquire and release operations on atomic variables act as one-way barriers, but in opposite directions.
That is, a store/release operation prevents memory operations that precede it (in the program source) from being reordered after it,
while a load/acquire operation prevents memory operations that follow it from being reordered before it.
// thread 1
// shared memory operations A
a.store(5, std::memory_order_release);
x = 42; // regular int
// thread 2
while (a.load(std::memory_order_acquire) != 5);
// shared memory operations B
Memory operations A cannot move down below the store/release
, while memory operations B cannot move up above the load/acquire
.
As soon as thread 2 reads 5, memory operation A are visible to B and synchronization is complete.
Being a one-way barrier, the write to x
can join, or even precede, memory operations A, but since it is not part of the acquire/release relationship x
cannot be reliably accessed by thread 2.
Replacing the atomic operations with standalone thread fences and relaxed operations is similar:
// thread 1
// shared memory operations A
std::atomic_thread_fence(memory_order_release);
a.store(5, std::memory_order_relaxed);
// thread 2
while (a.load(std::memory_order_relaxed) != 5);
std::atomic_thread_fence(memory_order_acquire);
// shared memory operations B
This achieves the same result but an important difference is that both fences do not act as one-way barriers;
If they did, the atomic store to a
could be reordered before the release fence and the atomic load from a
could be reordered after the acquire fence and
that would break the synchronization relationship.
In general:
- A standalone release fence prevents preceding operations from being reordered with (atomic) stores that follow it.
- A standalone acquire fence prevents following operations from being reordered with (atomic) loads that precede it.
The standard allows Acquire/Release fences to be mixed with Acquire/Release operations.
Do non-relaxed atomic accesses provide signal fences as well as thread fences?
It is not fully clear to me what you are asking here because thread fences are normally used with relaxed atomic operations,
but std::thread_signal_fence
is similar to a std::atomic_thread_fence
, except that it is supposed to operate within the same thread and
therefore the compiler does not generate CPU instructions for inter-thread synchronization.
It basically acts as a compiler-only barrier.
// shared memory operations A
should this be// shared memory operations B
instead? Or am I wrong? it seems operations onB
makes the answer more sensible to me at least :) +1'ed anyway – Cortex