Transitivity of release-acquire
Asked Answered
R

1

9

Just when I thought I got some grip around atomics, I see another article. This is an excerpt from GCC wiki, under Overall Summary:

 -Thread 1-       -Thread 2-                   -Thread 3-
 y.store (20);    if (x.load() == 10) {        if (y.load() == 10)
 x.store (10);      assert (y.load() == 20)      assert (x.load() == 10)
                    y.store (10)
                  }

Release/acquire mode only requires the two threads involved to be synchronized. This means that synchronized values are not commutative to other threads. The assert in thread 2 must still be true since thread 1 and 2 synchronize with x.load(). Thread 3 is not involved in this synchronization, so when thread 2 and 3 synchronize with y.load(), thread 3's assert can fail. There has been no synchronization between threads 1 and 3, so no value can be assumed for 'x' there.

The article is saying that the assert in thread 2 won't fail, but that in 3 might.

I find that surprising. Here's my chain of reasoning that the thread 3 assert won't fail—perhaps someone can tell me where I'm wrong.

  1. Thread 3 observes y == 10 only if thread 2 wrote 10.
  2. Thread 2 writes 10 only if it saw x == 10.
  3. Thread 2 (or any thread) sees x == 10 only if thread 1 wrote 10. There are no further updates to x from any thread.
  4. Since thread 2 observed x == 10, and thread 3, too, having synchronized with thread 2, should observe x == 10.

Release/acquire mode only requires the two threads involved to be synchronized.

Can someone point to a source for this 2-party-only requirement, please? My understanding (granted, perhaps wrong) is that the producer has no knowledge of with whom it's synchronizing. I.e., thread 1 can't say, "my updates are only for thread 2". Likewise, thread 2 can't say, "give me the updates from thread 1". Instead, a release of x = 10 by thread 1 is for anyone to observe, if they so chose.

Thus, x = 10 being the last update (by thread 1), any acquire from anywhere in the system happened-after (ensured by transitive synchronization) is guaranteed to observe that write, isn't it?

This means that synchronized values are not commutative to other threads.

Regardless of whether it's true, the author perhaps meant transitive, not commutative, right?

Lastly, if I'm wrong above, I'm curious to know what synchronization operation(s) would guarantee that thread 3's assert won't fail.

Reeducate answered 11/2, 2023 at 18:16 Comment(12)
Does this answer your question? Will two atomic writes to different locations in different threads always be seen in the same order by other threads?Reflexive
You assume that because T2 can "see" a new value, T3 must also "see" that value. That assumption is wrong unless you make the stores sequentially consistent or both threads synchronize with T1. Re "commutative": They mean the x and y stores within the threads cannot be reordered (x * y != y * x).Reflexive
@Homer512: This doesn't seem like an exact duplicate. Memory-ordering is subtle enough that this case is worth discussing separately from the IRIW litmus test. I think it might be impossible to observe assert failures on real hardware, even though the ISO C++ memory model doesn't formally guarantee it. In terms of what happens in a system like POWER with coherent shared cache and store-forwarding between SMT threads sharing a physical core (which is what the highest-voted answer on that linked Q&A is about), if T3 sees T2's stores early, it also sees loads that T2 saw.Alloy
So yes, ISO C++ doesn't guarantee that other threads agree about the order of two independent stores, but these stores aren't fully independent. I'm not an expert on PowerPC's memory model, but if I had to guess, I'd say the assert in T3 can't fail in practice on real PPC hardware, or any other ISA running threads across coherent shared memory, without really weird mechanisms that make stores visible to other cores early, like before they're known to be non-speculative. (That's a big no-no, it means you'd have to roll back more than one core on mis-speculation.) Or maybe I'm wrong.Alloy
@PeterCordes After reading the linked paper in that answer, I agree that the code should be fine in practice even if it violates C++'s memory model. If thread 2 removes the assert, it may even use relaxed memory operations without affecting the outcome of thread 3 (see section 4.4), unless I'm misreading that partReflexive
Just to be clear, we are assuming that in the example, all loads are acquire and all stores are release (but not seq_cst, which is normally what you get if you call load() and store() without an ordering parameter)?Laryssa
I agree with you that I don't understand the reasoning here either. Happens-before is transitive and one can verify from the [intro.races] rules that in fact x.store(10) happens before the x.load() of thread 3. So by write-read coherence, the x.load() in thread 3 must take the value 10 (or some value later in the modification order of x, but here there aren't any).Laryssa
The article is dated 2012, so it is presumably working from the C++11 memory model instead of C++17 or C++20. There are some differences, if I recall correctly, but I don't think they would be relevant here.Laryssa
I should correct my previous comment: happens-before is not necessarily transitive when consume operations are in play. But that is not applicable here. The note in [intro.races p11] points out that without consume operations, "happens before" is equivalent to "simply happens before", and the "simply happens before" relation is transitive by definition (11.3). And in this program, we can use the various parts of p9 repeatedly to verify explicitly that the store in thread 1 inter-thread happens before the load in thread 3.Laryssa
That the doc is awful isn't anything new...Astute
@NateEldredge "we are assuming that in the example" To even make sense of the very bad doc the OP quoted, we need assumptions indeed: the example code must not use SEQ_CST, commutative must mean transitive...Astute
@NateEldredge: I think the GCC wiki is just plain wrong; happens-before from release / acquire surely must always have been able to connect across threads, unlike consume. A sign of wrong thinking that might have lead to their conclusion is the earlier paragraph When 2 threads synchronize in sequentially consistent mode, all the visible variables must be flushed through the system. All atomic stores eventually commit to cache-coherent shared memory (if you're talking in those terms). The question is whether this thread waits for that to happen or not.Alloy
P
8

Seems like you've found a mistake in the GCC wiki.
The assert in T3 shouldn't fail in C++.
Here are the explanations with the relevant quotes from the C++20 standard:

  1. x.store (10) in T1 happens before assert (x.load() == 10) in T3, because:
    • statements within every thread are ordered with sequenced before

    9 Every value computation and side effect associated with a full-expression is sequenced before every value computation and side effect associated with the next full-expression to be evaluated.48

    • x.store (10) synchronizes with if (x.load() == 10) and y.store (10) synchronizes with if (y.load() == 10)

    2 An atomic operation A that performs a release operation on an atomic object M synchronizes with an atomic operation B that performs an acquire operation on M and takes its value from any side effect in the release sequence headed by A.

    • as a result x.store (10) inter-thread happens before assert (x.load() == 10)

    9 An evaluation A inter-thread happens before an evaluation B if
    (9.1)   — A synchronizes with B, or
    (9.2)   — A is dependency-ordered before B, or
    (9.3)   — for some evaluation X
    (9.3.1)    — A synchronizes with X and X is sequenced before B, or
    (9.3.2)    — A is sequenced before X and X inter-thread happens before B, or
    (9.3.3)    — A inter-thread happens before X and X inter-thread happens before B.

    • this also means x.store (10) happens before assert (x.load() == 10)

    10 An evaluation A happens before an evaluation B (or, equivalently, B happens after A) if:
    (10.1)   — A is sequenced before B, or
    (10.2)   — A inter-thread happens before B.

  2. the above means that x.load() in assert (x.load() == 10) must return 10 written by x.store (10).
    (We assume here that x was published correctly and therefore the initial value of x comes before x.store (10) in the modification order of x).

    18 If a side effect X on an atomic object M happens before a value computation B of M, then the evaluation B shall take its value from X or from a side effect Y that follows X in the modification order of M.
    [Note 18: This requirement is known as write-read coherence. — end note]

Plainsman answered 13/2, 2023 at 20:18 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.