How are memory_order_seq_cst fences useful anymore in C++20?
Asked Answered
S

1

7

Consider this code:

std::atomic<int> x{ 0 };
std::atomic<int> y{ 0 };
int a;
int b;

void thread1()
{
    //atomic op A
    x.store(1, std::memory_order_relaxed);

    //fence X
    std::atomic_thread_fence(std::memory_order_seq_cst);
    //sequenced-before P, thus in SC order X=>P

    //atomic op P
    a = y.load(std::memory_order_seq_cst);//0
    //reads-before(from-read) Q, thus in SC order P=>Q
}

void thread2()
{
    //atomic op Q
    y.store(1, std::memory_order_seq_cst);
    //sequenced-before B, thus in SC order Q=>B

    //atomic op B
    b = x.load(std::memory_order_seq_cst);
}

int main()
{
    std::thread t2(thread2);
    std::thread t1(thread1);
    t1.join();
    t2.join();
    assert(a == 1 || b == 1);//true?
    return 0;
}

The question is: is the assertion a == 1 || b == 1 always true in C++20?

I think it's true in C++17. The idea is this: first assume a get 0, then prove b must get 1. Let's split it into two parts:

1. Fence X precedes atomic op B in the SC total order

  • X is sequenced-before P, thus X=>P
  • P reads 0 and Q writes 1, thus P reads-before(from-read) Q, thus P=>Q
    • I don't find the clause in the standard that guarantees this, but this paper said so. Please point out if I get it wrong. Edit: Now I know. It's the same rule about coherence-ordered before, as in Nate's answer.
  • Q is sequenced-before B, thus Q=>B

2. Atomic op B reads the value(1) written by atomic op A

C++17 has this in the standard:

For atomic operations A and B on an atomic object M, where A modifies M and B takes its value, if there is a memory_order::seq_cst fence X such that A is sequenced before X and B follows X in S, then B observes either the effects of A or a later modification of M in its modification order.

It's exactly the case here. Op A and B are on atomic object x, where A stores 1 in x and B reads x's value; and there is the seq_cst fence X; A is sequenced before X and B follows X in S(the above); and there's no modification of M later than A. So B observes the value written by A. So b gets 1.

Is the above reasoning correct?

But in C++20, the above sited text about fences is removed (by P0668), and upgraded to "strengthened" seq_cst fences, which reads:

An atomic operation A on some atomic object M is coherence-ordered before another atomic operation B on M if

  • A is a modification, and B reads the value stored by A, or
  • A precedes B in the modification order of M, or
  • A and B are not the same atomic read-modify-write operation, and there exists an atomic modification X of M such that A reads the value stored by X and X precedes B in the modification order of M, or
  • there exists an atomic modification X of M such that A is coherence-ordered before X and X is coherence-ordered before B.

There is a single total order S on all memory_order::seq_cst operations, including fences, that satisfies the following constraints. First, if A and B are memory_order::seq_cst operations and A strongly happens before B, then A precedes B in S. Second, for every pair of atomic operations A and B on an object M, where A is coherence-ordered before B, the following four conditions are required to be satisfied by S:

  • if A and B are both memory_order::seq_cst operations, then A precedes B in S; and
  • if A is a memory_order::seq_cst operation and B happens before a memory_order::seq_cst fence Y , then A precedes Y in S; and
  • if a memory_order::seq_cst fence X happens before A and B is a memory_order::seq_cst operation, then X precedes B in S; and
  • if a memory_order::seq_cst fence X happens before A and B happens before a memory_order::seq_- cst fence Y , then X precedes Y in S.

I don't see how a==1 || b==1 is guaranteed any more. It's only about how SC fences are ordered with regard to other SC operations, or operations that are coherence-ordered with each other.

I can't find anything else about how SC fences interact with non-SC atomic operations in the C++20 standard, and a load that does not see a value from a store doesn't mean it's coherence-ordered before the store. (Or does it? If so that would fix the problem; see comments) 31.4 : 3.2 applies when A precedes B in the modification order of M, but reads aren't part of the modification order of an object, are they?

Is it I didn't look into the standard hard enough, or the code simply doesn't work any more in C++20? If it's the later, is the regression intended? (In P0668, they claim to "strengthen" fences).

Sturdy answered 29/10, 2021 at 18:53 Comment(6)
Yes, in C++17 a == 1 || b == 1 is guaranteed. Relaxed store + seq_cst barrier is at least as strong as a seq_cst store, and any interleaving of source order has at least one load after the store in the other thread. I don't know about C++20 yet, haven't read the details. I expect it's still something the committee wants to be true, so if there is a change that might be a defect. (But I don't expect real-world implementations would change how they compile it, so in practice everything's fine on normal CPUs, of course.)Jubbah
It's only about how SC fences are ordered with regard to other SC operations. - not exactly. The "Second," clause is about any pair of operations on the same object M that are coherence-ordered, e.g. including an mo_relaxed load that happens to see a value from an mo_relaxed store. (eel.is/c++draft/atomics#order-3.1). But I'm not sure it applies for a load that doesn't see a stored value. 3.2 "A precedes B in the modification order of M" wouldn't apply if A is a read, not a modification, I think? If it does apply, then so would the SC fence bullet list.Jubbah
@PeterCordes Thank you! I see your point. If you count reads the value of M before a write to M (aka "reads-before") as coherence-ordered before, you can assume a==0 && b==0 then say B(the load of x) is coherence-ordered before A(the store of x). Then applying the second bullet list : "- if A is a memory_order::seq_cst operation and B happens before a memory_order::seq_cst fence Y , then A precedes Y in S;", you get B=>X in S. That's a circle in S (X=>P=>Q=>B=>X), which is prohibited. So a==0 && b==0 cannot be true.Sturdy
The standard text does not seem to say that "reads-before" is "coherence-ordered before". Since the word "coherence" in the name, maybe it should be?Sturdy
I'm pretty sure that as-written, reading the old value does not make a load coherence-ordered before a store. (But yes, if it did, that would engage the SC fence rules to do what they should). It's possible for one thread to see a store to object M while another thread still sees the old value (e.g. on real HW via store-forwarding between SMT threads), and IDK what other parts of the standard depend on coherence-ordered ops, so IDK if this could repair the standard without breaking anything else.Jubbah
If a read happens-before a write, it's guaranteed not to see the value (by the read-write coherence rule, 6.9.2.1 :: 17). But again, I don't think not seeing the value implies anything else.Jubbah
A
3

Yes, I think we can prove that a == 1 || b == 1 is always true. Most of the ideas here were worked out in comments by zwhconst and Peter Cordes, so I just thought I would write it up as an exercise.

(Note that X, Y, A, B below are used as the dummy variables in the standard's axioms, and may change from line to line. They do not coincide with the labels in your code.)

Suppose b = x.load() in thread2 yields 0.

We do have the coherence ordering that you asked about. Specifically, if b = x.load yields 0, then I claim that x.load() in thread2 is coherence ordered before x.store(1) in thread1, thanks to the third bullet in the definition of coherence ordering. For let A be x.load(), B be x.store(1), and X be the initialization x{0} (see below for quibble). Clearly X precedes B in the modification order of x, since X happens-before B (synchronization occurs when the thread is started), and if b == 0 then A has read the value stored by X.

(There is possibly a gap here: initialization of an atomic object is not an atomic operation (3.18.1p3), so as worded, the coherence ordering does not apply to it. I have to believe it was intended to apply here, though. Anyway, we could dodge the issue by putting x.store(0, std::memory_order_relaxed); in main before starting the threads, which would still address the spirit of your question.)

Now in the definition of the ordering S, apply the second bullet with A = x.load() and B = x.store(1) as before, and Y being the atomic_thread_fence in thread1. Then A is coherence-ordered before B, as we just showed; A is seq_cst; and B happens-before Y by sequencing. So therefore A = x.load() precedes Y = fence in the order S.

Now suppose a = y.load() in thread1 also yields 0.

By a similar argument to before, y.load() is coherence ordered before y.store(1), and they are both seq_cst, so y.load() precedes y.store(1) in S. Also, y.store(1) precedes x.load() in S by sequencing, and likewise atomic_thread_fence precedes y.load() in S. We therefore have in S:

  • x.load precedes fence precedes y.load precedes y.store precedes x.load

which is a cycle, contradicting the strict ordering of S.

Androsterone answered 1/11, 2021 at 21:47 Comment(2)
Ah, yes, I hadn't grokked that 3rd bullet point before. I think I saw RMW and stopped looking, wrongly assuming what it was going to be about. So the load is coherence-ordered before the store if it doesn't see the value. (Agreed about static initialization counting as the X operation to match the intent of that wording). And yes, then all the other pieces fall into place once we can justify calling that coherence ordering.Jubbah
Nice! Now I know what "not the same" means in English :PSturdy

© 2022 - 2024 — McMap. All rights reserved.