Memory fences: acquire/load and release/store

Asked 24/4, 2016 at 15:1 Answered 2/12, 2021 at 2:10

Solved c++memory-barriers lock-free stdatomic memory-model

My understanding of std::memory_order_acquire and std::memory_order_release is as follows:

Acquire means that no memory accesses which appear after the acquire fence can be reordered to before the fence.

Release means that no memory accesses which appear before the release fence can be reordered to after the fence.

What I don't understand is why with the C++11 atomics library in particular, the acquire fence is associated with load operations, while the release fence is associated with store operations.

To clarify, the C++11 <atomic> library enables you to specify memory fences in two ways: either you can specify a fence as an extra argument to an atomic operation, like:

x.load(std::memory_order_acquire);

Or you can use std::memory_order_relaxed and specify the fence separately, like:

x.load(std::memory_order_relaxed);
std::atomic_thread_fence(std::memory_order_acquire);

What I don't understand is, given the above definitions of acquire and release, why does C++11 specifically associate acquire with load, and release with store? Yes, I've seen many of the examples that show how you can use an acquire/load with a release/store to synchronize between threads, but in general it seems that the idea of acquire fences (prevent memory reordering after statement) and release fences (prevent memory reordering before statement) is orthogonal to the idea of loads and stores.

So, why, for example, won't the compiler let me say:

x.store(10, std::memory_order_acquire);

I realize I can accomplish the above by using memory_order_relaxed, and then a separate atomic_thread_fence(memory_order_acquire) statement, but again, why can't I use store directly with memory_order_acquire?

A possible use case for this might be if I want to ensure that some store, say x = 10, happens before some other statement executes that might affect other threads.

Sequestered answered 24/4, 2016 at 15:1 Comment(7)

In a typical lock-free algorithm, you read an atomic to see if a shared resource is ready for consumption (ready to be acquired), and you write an atomic to indicate that a shared resource is ready to be used (to release the resource). You don't want reads of the shared resource to move before the atomic guarding it is checked; and you don't want initialization of the to-be-shared resource to move after the atomic is written to, indicating release. – Designed 24/4, 2016 at 15:8

In the example only atomic_thread_fence(std::memory_order_acquire) is a true fence. See 1.10:5 Multi-threaded executions and data races [intro.multithread] in the standard, which says (quoting the draft n3797) "A synchronization operation without an associated memory location is a fence and can be either an acquire fence, a release fence, or both an acquire and release fence." In contrast, x.load(std::memory_order_acquire) is an atomic operation that does an acquire operation on x, it would be a synchronization operation if the value matches a store release into x. – Apiary 4/5, 2016 at 19:30

In the introduction the standard (draft n3797) doesn't restrict acquire operations to loads and release operations to stores. That is unfortunate. You have to go to clause 29.3:1 Order and consistency [atomics.order] to find "memory_order_acquire, memory_order_acq_rel, and memory_order_seq_cst: a load operation performs an acquire operation on the affected memory location" and "memory_order_release, memory_order_acq_rel, and memory_order_seq_cst: a store operation performs a release operation on the affected memory location" – Apiary 4/5, 2016 at 19:52

@Apiary But even a "true fence" doesn't have to produce a CPU fence at all; it interacts with precedent or subsequent atomic operations to produce some effect. Only very naive compilers will associate a given CPU instruction to each source code occurrence of a "true fence". – Mylesmylitta 29/5, 2019 at 23:19

"is orthogonal to the idea of loads and stores" Under atomic semantics as reads aren't even ordered events in the modification order. You need a write to get a place into that order; even you just always write the exact same value, the writes of the exact same value is ordered. Then you speak of after that write event in the modification order. (Physically that means a cache has taken the cache line.) But a release read would be ambiguous as other reads of the same write event aren't ordered. Would you change the semantic to include reads in the modification order? – Mylesmylitta 2/6, 2019 at 0:21

OTOH adding acquire writes to the semantic seems simpler as they are ordered but just don't observe a write. Just pretend the previous write was observed, and its value was ignored, like what I call the throw-away-acq: (void)x.load(mo_acquire); (which is an operation that is seldom used). – Mylesmylitta 2/6, 2019 at 0:26

I just found this article which describes concept of acquire/release synchronization with good example , that may be helpful for someone who is confused about it . – Shillelagh 24/1, 2022 at 5:36

Say I write some data, and then I write an indication that the data is now ready. It's imperative that no other thread who sees the indication that the data is ready not see the write of the data itself. So prior writes cannot move past that write.

Say I read that some data is ready. It's imperative that any reads I issue after seeing that take place after the read that saw that the data was ready. So subsequent reads cannot move behind that read.

So when you do a synchronized write, you typically need to make sure that all writes you did before that are visible to anyone who sees the synchronized write. And when you do a synchronized read, it's typically imperative that any reads you do after that take place after the synchronized read.

Or, to put it another way, an acquire is typically reading that you can take or access the resource, and subsequent reads and writes must not be moved before it. A release is typically writing that you are done with the resource, and preceding writes must not be moved to after it.

Malchy answered 29/4, 2016 at 20:37 Comment(0)

(Partial answer correcting a mistake in the early part of the question. David Schwartz's answer already nicely covers the main question you're asking. Jeff Preshing's article on acquire / release is also good reading for another take on it.)

The definitions you gave for acquire / release are wrong for fences; they only apply to acquire operations and release operations, like x.store(mo_release), not std::atomic_thread_fence(mo_release).

Acquire means that no memory accesses which appear after the acquire fence can be reordered to before the fence. [wrong, would be correct for acquire operation]

Release means that no memory accesses which appear before the release fence can be reordered to after the fence. [wrong, would be correct for release operation]

They're insufficient for fences, which is why ISO C++ has stronger ordering rules for acquire fences (blocking LoadStore / LoadLoad reordering) and release fences (LoadStore / StoreStore).

Of course ISO C++ doesn't define "reordering", that would imply there is some global coherent state that you're accessing. ISO C++ instead

Jeff Preshing's articles are relevant here:

Acquire and Release Semantics (acquire / release operations such as loads, stores, and RMWs)
Acquire and Release Fences Don't Work the Way You'd Expect explains why those one-way barrier definitions are incorrect and insufficient for fences, unlike for operations. (Because it would let the fence reorder all the way to one end of your program and leave all the operations unordered wrt. each other, because it's not tied to an operation itself.)

A possible use case for this might be if I want to ensure that some store, say x = 10, happens before some other statement executes that might affect other threads.

If that "other statement" is a load from an atomic shared variable, you actually need std::memory_order_seq_cst to avoid StoreLoad reordering. acquire / release / acq_rel won't block that.

If you mean make sure the atomic store is visible before some other atomic store, the normal way is to make the 2nd atomic store use mo_release.

If the 2nd store isn't atomic, it's unlikely any reader could safely sync with anything in a way that it could observe the value without data-race UB.

(Although you do run into a use case for a release fence when hacking up a SeqLock that uses plain non-atomic objects for the payload, to allow a compiler to optimize. But that's an implementation-specific behaviour that depends on knowing how std::atomic stuff compiles for real CPUs. See Implementing 64 bit atomic counter with 32 bit atomics for example.)

Navy answered 2/12, 2021 at 2:10 Comment(1)

I should've provided Jeff's post as a comment rather than a pure link-only answer. But actually it's even greater to have this nice answer. So thank you and my answer can keep hidden :) – Toombs 2/12, 2021 at 4:22

-4

std::memory_order_acquire fence only ensures all load operation after the fence is not reordered before any load operation before the fence, thus memory_order_acquire cannot ensure the store is visible for other threads when after loads are executed. This is why memory_order_acquire is not supported for store operation, you may need memory_order_seq_cst to achieve the acquire of store.

As an alternative, you may say

x.store(10, std::memory_order_releaxed);
x.load(std::memory_order_acquire);  // this introduce a data dependency

to ensure all loads not reordered before the store. Again, the fence not work here.

Besides, memory order in atomic operation could be cheaper than a memory fence, because it only ensures the order relative to the atomic instruction, not all instruction before and after the fence.

See also formal description and explanation for detail.

Sextet answered 4/5, 2016 at 16:17 Comment(6)

The first sentence is not quite right (-1). Actually, any memory access that follows an acquire fence cannot be reordered with any load operation that precedes that fence. (Conversely, any memory access that precedes a release fence cannot be reordered with any store operation that follows that fence.) – Estellaestelle 1/8, 2016 at 8:37

@JohnWickerson Actually memory_order_releaxed only ensures loads after fence happens after any atomic operation or fence with memory_order_release. It do not provides any ordering in stores after the fence. See atomic-fence synchronization section in atomic_thread_fence – Sextet 2/8, 2016 at 11:53

Interesting! I believe that the cppreference.com website that you refer to is actually wrong here. According to the official C11 standard, release and acquire fences behave in the way I described. – Estellaestelle 11/8, 2016 at 9:32

If you're interested, I have written a little more about the issue on my blog: johnwickerson.wordpress.com/2016/08/11/… – Estellaestelle 17/8, 2016 at 13:32

I have a question regarding the code mentioned in this answer. I think the code here does not make any sense, since the 'x.store' operation itself can be reordered after the acquire fence. So, even if the loads after the acquire may not be reordered before the acquire fence, the store itself can go after the acquire, right? – Lamas 24/10, 2016 at 21:34

@Aditya stores and loads to the same atomic variable (in the same thread) cannot be reordered. – Sextet 25/10, 2016 at 2:26

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags