Does a single load synchronize with multiple stores?
Asked Answered
S

2

6

The following is a quote from C++ Standard - Memory Order:

If an atomic store in thread A is tagged memory_order_release and an atomic load in thread B from the same variable is tagged memory_order_acquire, all memory writes (non-atomic and relaxed atomic) that happened-before the atomic store from the point of view of thread A, become visible side-effects in thread B. That is, once the atomic load is completed, thread B is guaranteed to see everything thread A wrote to memory.

The synchronization is established only between the threads releasing and acquiring the same atomic variable. Other threads can see different order of memory accesses than either or both of the synchronized threads.

Consider an atomic variable v and the following steps:

  1. Thread A stores in v using memory_order_release
  2. Thread B stores in v using memory_order_release
  3. Thread C loads from v using memory_order_acquire

Is the following statement true: "thread C is guaranteed to see everything thread A or B wrote to memory."

EDIT: I am moving my comment here to make the point more clear.

The C++ quote that I have up there does not say anything about B must read what is written by A. All it says is that A and B release/acquire on the same variable. That is exactly, what I am doing in those 3 steps: A and B release something, and C acquires something. Where does it say in the spec that acquire matches with the last release and not necessarily anything before that?

Sewell answered 29/9, 2021 at 19:54 Comment(8)
So far there is nothing to say that any of these operations happens-before any of the others, so no way to guarantee anything.Schopenhauerism
If all threads are just loading or storing to the same memory location, the memory order has no effect whatsoever. The only thing that matters is that they use atomic operations, and then C either sees the value A stored, the value B stored, or the value that was in there before A and B stored anything, just not anything else.Noni
I think you are missing the point of the paragraph. It's meant for a situation like: A does a store of 17 to w and then a release store of 42 to v (whose previous value was, say, 0). B does an acquire load of v and then a load of w. If the value that B loaded from v is equal to 42, then the value it loaded from w is guaranteed to be equal to 17.Schopenhauerism
What you can say in this situation is that if thread C's load gets a value that could only have been put there by A's store, then C sees everything that A wrote prior to the load. And the same for B's store. But if it does see the value which A stored, then unless there is more logic to the program, it has no way of knowing whether or not B did its store before that, so there cannot be any guarantee of C seeing what B previously wrote.Schopenhauerism
@NateEldredge Not sure what exactly you mean by operations. I said "an atomic variable v and the following steps". So steps 1 through 3 happen on an atopic variable in order (no race condition there).Sewell
By "operations" I mean the loads and stores. If you have some other synchronization (mutex, other atomics, etc) to ensure that those loads and stores are observed by all threads in that order, then typically it will be that other synchronization which also ensures the appropriate ordering on the preceding loads and stores.Schopenhauerism
Note cppreference.com is not the C++ standard, but rather an independent community project that attempts to provide more accessible information about the language. In particular the text you cite does not appear in the standard itself. But the standard does say things like "For example, an atomic store-release synchronizes with a load-acquire that takes its value from the store".Schopenhauerism
Maybe what you are looking for is atomics.order p2 in n3337: "An atomic operation A that performs a release operation on an atomic object M synchronizes with an atomic operation B that performs an acquire operation on M and takes its value from any side effect in the release sequence headed by A." The only way you learn anything about synchronization is if the value returned by the load matches a value known to be stored by a particular store. The cppreference text takes that as a given.Schopenhauerism
S
3

The load from v synchronizes with whichever of the two stores wrote the value that v.load() returns.

The standard itself makes this more explicit. See n3337 atomics.order p2: "An atomic operation A that performs a release operation on an atomic object M synchronizes with an atomic operation B that performs an acquire operation on M and takes its value from any side effect in the release sequence headed by A."

To illustrate this, here's an example:

int a,b;
std::atomic<int> v = 0;

void thread_A() {
    a = 42;
    v.store(10, std::memory_order_release);
}

void thread_B() {
    b = 17;
    v.store(20, std::memory_order_release);
}

void thread_C() {
    switch (v.load(std::memory_order_acquire)) {
    case 10:
        // thread A must have done this store
        std::cout << a; // ok, prints 42
        std::cout << b; // UB, data race
        break;
    case 20:
        // thread B must have done this store
        std::cout << a; // UB, data race
        std::cout << b; // ok, prints 17
        break;
    case 0:
        // neither A or B has done its store
        std::cout << a; // UB, data race
        std::cout << b; // UB, data race
        break;
    }
}

So if v.load() in thread C returns 10, we know from our program's logic that this value must have been stored by the v.store() in thread A; nowhere else in our program could have done it. Because of the release ordering on that store, all previous writes made by thread A are also visible. We can safely read from the non-atomic variable a, and we are guaranteed to get the value 42.

More formally, the v.store(10) synchronizes with the v.load() that returns 10, and the v.load() is sequenced before the cout << a, so v.store(10) inter-thread happens before cout << a (intro.multithread p11). And a = 42 is sequenced before v.store(10), which as we said inter-thread happens before cout << a, so a = 42 inter-thread happens before cout << a; in particular a = 42 happens before cout << a (p12) and so there is no data race (p21). Moreover a = 42 is now a visible side effect with respect to cout << a (p13), and there are no other side effects on a to be seen, so the value of the evaluation of a in cout << a shall be the value stored by a = 42, namely 42.

But in this case, since v.load() returned 10 and not 20, we don't know whether the v.store() in thread B has happened yet. Perhaps it did and has since been overwritten by the store in thread A. Or perhaps it didn't happen at all. So we can't prove that b = 17 happens before cout << b, nor vice versa, and thus this is a data race which causes undefined behavior.

The case where v.load() returns 20 is similar, but reversed. If v.load() returns 0, then neither of the two stores has occurred, and it is a data race to access either a or b.

As you can see, this is only useful if threads A and B store different values. If we change the program so that A and B both do v.store(10, std::memory_order_release), then having thread C observe that v.load() == 10 tells us nothing about which of the two threads did the store. The load synchronizes with one of them, but we don't know which. Therefore, in this case, thread C cannot safely access either a or b, because either could be the one that is in a data race.

The cppreference text, taken out of context, could make it sound like the mere act of doing v.load(std::memory_order_acquire) will cause the thread to actually wait for some or all other stores in other threads to complete, sort of like a mutex or a std::latch. You would not be the first to have misread it that way. But that wouldn't make sense - a load is just a load after all. It returns the value that v happens to have at that particular instant in time, without blocking or waiting for any event from any other thread.

See also Why does this cppreference excerpt seem to wrongly suggest that atomics can protect critical sections?

Schopenhauerism answered 1/10, 2021 at 13:55 Comment(3)
If C uses memory_order_relaxed instead of acquire and v returns 10, is reading from a UB too? Does this differ between strong and weak memory models? Thank you!Hypochondriac
@Viatorus: Yes, I think it is UB. Practically speaking, if v.load() is changed to relaxed, then the compiler or CPU may choose to reorder the load of a ahead of the load of v. Thus if v.load() returns 10, we know by that time thread_A has finished storing to a - but a might have been loaded earlier, conflicting with the store. More formally, in that case you will not be able to prove that the load of a happens-before the store to a, nor vice versa, and that is the definition of a data race. You don't have a synchronizes-with relation at all.Schopenhauerism
@Viatorus: I guess it depends what you mean by "strong" or "weak" memory model - at the level of C++, there is only one memory model. And the memory model depends on both the hardware and compiler. For instance, on x86, all load instructions are acquire, but we still have a race in your example, because the compiler may reorder the instructions to load from a before v.Schopenhauerism
S
0

Yes, if cppreference is not wrong.

The answer is in the paragraph on cppreference that your skipped in the your quote (pay attention on the last sentence):

All memory writes (including non-atomic and relaxed atomic) that happened-before the atomic store from the point of view of thread A, become visible side-effects in thread B. That is, once the atomic load is completed, thread B is guaranteed to see everything thread A wrote to memory. This promise only holds if B actually returns the value that A stored, or a value from later in the release sequence.

So if you sure or can proof that thread C made an acquire load after both A and B threads made release stores, then your statement "thread C is guaranteed to see everything thread A or B wrote to memory" is true.

Stearoptene answered 23/4, 2023 at 18:12 Comment(2)
You only get a release sequence using atomic RMW instructions, so there aren't any release sequences in OP's example. It's true that if you could prove that A's store happens before B's store, and then C observes the value stored by B, C will then observe earlier actions by thread A. But in the example, there is no way to prove that, and the stores done by A and B are just unordered with respect to happens-before (keep in mind it is only a partial ordering).Schopenhauerism
@NateEldredge according to OP's edit and his comment "So steps 1 through 3 happen on an atopic variable in order (no race condition there)" I supposed the question was abstract and OP want to know if the thread C synchronized with all threads (A & B) which released their values to the atomic v or just with the last thread (B) whose value was acquired by C from v.Stearoptene

© 2022 - 2024 — McMap. All rights reserved.