C++ memory model: do seq_cst loads synchronize with seq_cst stores?

Asked 27/11, 2017 at 22:15 Answered 28/11, 2017 at 1:53

c++language-lawyer atomic memory-model stdatomic

In the C++ memory model, there is a total order on all loads and stores of all sequentially consistent operations. I'm wondering how this interacts with operations that have other memory orderings that are sequenced before/after sequentially consistent loads.

For example, consider two threads:

std::atomic<int> a(0);
std::atomic<int> b(0);
std::atomic<int> c(0);

//////////////
// Thread T1
//////////////

// Signal that we've started running.
a.store(1, std::memory_order_relaxed);

// If T2's store to b occurs before our load below in the total
// order on sequentially consistent operations, set flag c.
if (b.load(std::memory_order_seq_cst) == 1) {
  c.store(1, std::memory_order_relaxed)
}


//////////////
// Thread T2
//////////////

// Blindly write to b.
b.store(1, std::memory_order_seq_cst)

// Has T1 set c? If so, then we know our store to b occurred before T1's load
// in the total order on sequentially consistent operations.
if (c.load(1, std::memory_order_relaxed)) {
  // But is this guaranteed to be visible yet?
  assert(a.load(1, std::memory_order_relaxed) == 1);
}

Is it guaranteed that the assertion in T2 cannot fire?

I'm looking for detailed citations of the standard here. In particular I think this this would require showing that the load from b in T1 synchronizes with the store to b in T2 in order to establish that the store to a inter-thread happens before the load from a, but as far as I can tell the standard says that memory_order_seq_cst stores synchronize with loads, but not the other way around.

Seisin answered 27/11, 2017 at 22:15 Comment(0)

Do seq_cst loads synchronize with seq_cst stores?

They do if all necessary requirements are met; in your example code, the assert can fire

§29.3.3
There shall be a single total order S on all memory_order_seq_cst operations

This total order applies to the seq_cst operations themselves.. In isolation, a store(seq_cst) has release semantics, whereas a load(seq_cst) has acquire semantics.

§29.3.1-2 [atomics.order]
memory_order_release, memory_order_acq_rel, and memory_order_seq_cst:
a store operation performs a release operation on the affected memory location.
.....
§29.3.1-4 [atomics.order]
memory_order_acquire, memory_order_acq_rel, and memory_order_seq_cst:
a load operation performs an acquire operation on the affected memory location.

Therefore, atomic operations with non-seq_cst ordering (or non-atomic operations) are ordered with respect to seq_cst operations per the acquire/release ordering rules:

a store(seq_cst) operation cannot be reordered with any memory operation that is sequenced before it (i.e. comes earlier in program order)..
a load(seq_cst) operation cannot be reordered with any memory operation that is sequenced after it.

In your example, although c.store(relaxed) in T1 is ordered (inter-thread) after b.load(seq_cst) (the load is an acquire operation), c.load(relaxed) in T2 is unordered with respect to b.store(seq_cst) (which is a release operation, but it does not prevent the reordering).

You can also look at the operations on a. Since those are not ordered with respect to anything, a.load(relaxed) can return 0, causing the assert to fire.

Site answered 28/11, 2017 at 1:53 Comment(19)

Thanks, I agree and this was my suspicion. Part of the reason I asked is that cppreference implies that the guarantees are stronger: "Any operation with this memory order is both an acquire operation and a release operation"; i.e. even a load would be a release operation. It seems like this is simply incorrect? – Seisin 28/11, 2017 at 2:8

"both an acquire operation and a release operation".. cppreference can be confusing sometimes; that would only apply to read-modify-write operations. – Site 28/11, 2017 at 2:21

Okay yeah, agreed. "Any operation" is too strong here according to the standard. – Seisin 28/11, 2017 at 2:26

(the load sets an acquire barrier): That's potentially confusing terminology, because atomic_thread_fence(mo_acquire) is not just "an Acquire Operation", it's a 2-way barrier, unlike the 1-way barrier of an acquire load. So yes there's a "barrier", but I don't like your description of it "setting an acquire barrier" because that sounds like load + fence. – Brinkley 29/11, 2017 at 3:27

@PeterCordes Standalone fences are not involved, I was careful to avoid any references (and neither does the question mention them). The remark about X86 isn't relevant in the context of the question which was about guarantees provided by the standard, not a particular implementation. – Site 30/11, 2017 at 18:35

I know there aren't standalone fences but the phrase "acquire barrier" made me wonder for a minute if that's what you meant, then worry that some other readers would think that. – Brinkley 30/11, 2017 at 18:44

@PeterCordes Your comment made me realize (again) how easy it is to confuse things by using specific terminology.. I updated the anser in an attempt use 'standardese'. – Site 30/11, 2017 at 19:10

Looks good. I don't think anything is lost in this case by speaking "standardese", but another approach could have been to say "the acquire load sets a 1-way barrier". Then maybe say "unlike 2-way std::atomic fences" and link Jeff Preshing's excellent barriers vs. loads/stores article. – Brinkley 30/11, 2017 at 19:37

@PeterCordes For a dogmatic language lawyer (which I am not), problem here might be that the standard does not define an acquire operation to act as a one-way barrier. All it says is that a release operation synchronizes with a an acquire operation and it defines the guarantees that are a result of that synchronization process. Of course, the only sensible implementation is to use barriers (and it is also easier to reason about), but AFAIK that is not formally required. – Site 30/11, 2017 at 19:58

I think the guarantees provided in the standard are exactly equivalent to the load-acquire itself being a 1-way barrier that keeps it from being delayed past any later loads/stores (but not the reverse), so nothing is lost or added in describing it that way (like this post from Jeff Preshing). Plain x86 loads, and ARM ldar, implement 1-way barrier semantics in HW. When you say "use barriers", I hope you don't mean a separate asm instruction, because those are invariably 2-way barriers. – Brinkley 30/11, 2017 at 20:10

@PeterCordes No, I was not referring to asm instructions – Site 30/11, 2017 at 20:13

Ok good, sorry I keep bringing up hardware and asm :) I have looked at how the standard describes the requirements, but I wasn't looking for differences between what it implies / requires and what a 1-way-barrier model would guarantee. Like I said, I think it's actually the same, so it is safe to describe the guarantee in the standard as a 1-way barrier, so long as everyone is clear on exactly what kind of barrier we're talking about, because the term is used in many contexts. :P Of course the as-if rule means an implementation can work however it wants. – Brinkley 30/11, 2017 at 20:34

@PeterCordes I totally agree.. It is defitively safe to use the 1-way barrier approach to describe how things work. It is also (at least for me) the easiest way to visualize things. – Site 30/11, 2017 at 21:2

Ah, so you're saying only a dogmatic language lawyer would have objections to that. Got it :P – Brinkley 30/11, 2017 at 21:3

I wouldn't say hardware barriers are "invariably 2-way" although it depends on what you mean. The canonical "SPARC" barriers are certainly asymmetric in that they apply different rules depending on the direction an operation would move across the barrier. So a load-aquire might be implemented with a plain load followed by LoadLoad and LoadStore barriers. This makes a more or less one-way barrier (earlier stores can freely cross and earlier loads can cross up until the next load). – Cachinnate 1/12, 2017 at 1:20

@PeterCordes I wouldn't say hardware barriers are "invariably 2-way" although it depends on exactly what you mean by "two way". The canonical "SPARC-style" barriers like StoreLoad are certainly asymmetric in that they apply different rules depending on the direction an operation would move across the barrier. So a load-aquire might be implemented with a plain load followed by LoadLoad and LoadStore barriers. This makes a more or less one-way barrier (earlier stores can freely cross and earlier loads can cross up until the next load). – Cachinnate 1/12, 2017 at 5:36

@BeeOnRope: good point, I hadn't considered just LoadStore before a store. That is sort of 1-way. And also interesting point about earlier loads crossing until the next load, with load + LoadLoad barrier. But it's still a 2-way load barrier, it just isn't tied to the acquire-load. – Brinkley 1/12, 2017 at 5:40

But what do you mean by "2-way barrier"? My interpretation of "2-way" would be the that the same rules would apply in both directions, e.g., if stores can't cross the barrier one way (i.e., older stores migrating before the barrier) then they can't cross the other way either. So an asymetric barrier (e.g., that lets stores migrate one way, but not the other) can't qualify as a 2-way barrier. That doesn't mean it is a 1-way barrier though: it is an asymmetric barrier that restricts movement both ways, but in opposite ways. Power lwsync is such an asymmetric barrier, for example. – Cachinnate 1/12, 2017 at 5:45

@Seisin You have to understand that in a MT program a read is really not a write. It took me a while to realize that. You can't view reads as writes. Writes are modifications that are the modification order; reads use the modification order and don't contribute to it. Reads depend on writes, not other reads. If reads were writes you would get a completely different model. In fact if you replace each read with a RMW you get a very strong model. (With a mutex protected value this is essentially what you have.) – Mur 28/11, 2019 at 0:20

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags