Does the standard impose more requirement on the load part of RMW than a pure load
Asked Answered
E

2

6

Consider this example:

#include <iostream>
#include <atomic>
#include <thread>
#include <chrono>
#include <cassert>
int main(){
    std::atomic<int> v = 0;
    std::atomic<bool> flag = false;
    std::thread t1([&](){
        while(!flag.load(std::memory_order::relaxed)){}  // #1
        assert(v.exchange(2,std::memory_order::relaxed) == 1);  // #2
    });
    std::thread t2([&](){
        if(v.exchange(1,std::memory_order::relaxed) == 0){ // #3
            flag.store(true, std::memory_order::relaxed); // #4
        }
    });
    t1.join();
    t2.join();
}

In this example, the loop at #1 exits only when #4 sets the flag to be true, which the flag is set to be true only when the reading part of the RMW operation at #3 reads 0. Since the requirement for the reading part of the RMW operation is [atomics.order] p10

Atomic read-modify-write operations shall always read the last value (in the modification order) written before the write associated with the read-modify-write operation.

This implies that no other RMW operations can read the value 0 if the RMW operation at #3 reads 0. In other words, if #2 could read 0 and write 2, #3 wouldn't read 0 and #4 wouldn't be executed. In other words, the reading value of the read-modify-write operation is uniquely owned by that operation if all other operations are RMW operations too(this is the essence of how a spin-lock can work).

So, the Q1 is: the assertion at #2 will never fail, right?

However, if #2 is changed to a pure load, something like this:

assert(v.load(std::memory_order::relaxed) == 1); // #2'

According to [intro.races] p18

If a side effect X on an atomic object M happens before a value computation B of M, then the evaluation B takes its value from X or from a side effect Y that follows X in the modification order of M.

The side effect that happens before #2' is only the initial value 0, even though the side effect 1 stored at #3 follows 0 in the modification order, the pure load can still read 0 since [intro.races] p18 uses "or", which is also implied by [atomics.order] p11

Recommended practice: The implementation should make atomic stores visible to atomic loads, and atomic loads should observe atomic stores, within a reasonable amount of time.

From the perspective of implementations, there exists a time lag such that the store at #3 is invisible to #2' within a reasonable time. This result is also implied by the "or" in [intro.races] p18 from the perspective of C++ standard.

Q2: If the RMW operation at #2 is changed to the pure load like #2', the assertion can fail, right?

Q3: if #2' can fail and #2 never fails does it mean the RMW is less prone to reading stale values than non-RMW reads, at least in this example?, Does it mean the RMW is more prone to reading the latter modification in the modification order than non-RMW reads, at least in this example?

addition:

I don't think #2 can be reordered with #1 by the compiler, since #2 is similar to a failed CAS in the spin-lock(i.e. which is a pure load with relaxed memory order), if that reorder exists, the spin-lock won't work as well. Moreover, any reordering in this example by the compiler is observable due to the assertion. However, from the perspective of memory order, #3 does not happen before #2 and vice versa, which is theoretically that #3 may fail. I am not sure. However, this example depends on the logical order of execution, any destruction to the order is observable.

Note:

This is a subsequent question of Is the load part of a read-modify-write operation of atomic object guaranteed to read the last value in modification order compared to load operation?, which has an unclear example and an incomprehensible assumption, which are improved and clear in this question.

Eijkman answered 23/10, 2024 at 13:20 Comment(28)
I don't know what the standard exaclty says but I found this video interesting : C++ Memory Model: from C++11 to C++23 - Alex Dathskovsky - CppCon 2023Tilghman
@HolyBlackCat If the assertion at #2 fails, that means its reading value is 0, which means #3 cannot read 0, the flag is not set to be true, the loop at #1 cannot exist, the #2 is not executed. Hmmm, some paradox here.Eijkman
@HolyBlackCat Hmmm, isn't [atomics.order] p10 is that requirement? If #2 reads 0 and writes 2, it means 0 is immediately precede 2, as well as, if #3 reads 0 and writes 1, then 0 is immediately precede 1, because the mod order is a total order, either 1 precede 2 or 2 precde 1, the one whose written value in the latter violates that rule.Eijkman
Yep, you're right. I agree the assert always passes.Supplant
does it mean the RMW is less prone to reading stale values than non-RMW reads, - How many times do I have to say that "stale" isn't a useful concept. Especially with that definition for general cases. So any variable that's never been modified is automatically stale? Or if there's one one modification that happens-before the load, it's stale? But as long as there are multiple such modifications, reading any other than the first is non-stale?Clement
@PeterCordes But as long as there are multiple such modifications, reading any other than the first is non-stale? In this example, I think "stale" is in this way. In other words, #2 or #2' is executed only when #1 reads 0 and writes 1, so it is known that 0 precedes 1 in the modification order, #2 is forced to read 1 while a pure load #2' may read 0, with this comparison, is the load reading 0 considered as reading a "stale" value?Eijkman
@PeterCordes Anyway, the "stale" wording can always be associated with "timeline", the modification order or a loading invisible to a store does not matter with the timeline. Maybe, I can ask like this: Is the load of RMW more prone to reading the latter modification in the modification order than the pure load?Eijkman
v.exchange created point of synchronization between 2 threads (All modifications to any particular atomic variable occur in a total order) and because this Q1 never fail. but in case Q2 was no any synchronization between 2 threads, and possible t1 view change of flag but not view change of v. Q3 - no.. load can simply read value from CPU cache, when RMW need do more work and synchronizationCleanser
@Cleanser v.exchange created point of synchronization between 2 threads. No, the memory order is relaxed, there is no synchronization relationship between them even though one reads the value written by another.Eijkman
@Eijkman no, here memory order have no any role. This is atomic modification of the same variable - v. So we can say that operation in one thread completed before in another begin. Global total order. This is exactly point of synchronizationCleanser
when we have 2 atomic rmw on v we can say - one completed before second begin. if in t1 rmw completed before in t2 begin, then this also before write to flag. but from another side t1 begin only after t2 write to flag. so contradiction. as result we can say that rmw t2 completed before t1 and assert ok. and memory order here not play any role. rmw on v is exactly point of synchronization between t1 and t2.Cleanser
@Cleanser Assuming there is a write to a non-atomic object before #3 and a read to the same object after #2, by your logic, since rmw on v is exactly point of synchronization between t1 and t2, so such two operations on the same non-atomic object would not have data race. However, that's not true, #3 does not synchronize with #2 by the definition of formal wording.Eijkman
@Eijkman no of course. Synchronisation in sense that one rmw was before and another after. This nothing say about any another variables. If we say only write to v - impossible say that in some thread in was before after another. If only read the same - before-after undefined. But in case 2 rmw on v - one rmw completed before another begin. In this sense point of synchronization. And again from this nothing say about another memory/variable changes is visibleCleanser
#1 before #2 and #3 before #4 . because #2 and #3 is rmw on same v - or #2 before #3 or #3 before #2. if #2 before #3 - was #1 before #4. so #1 can not view in flag result of #4, which is yet not begin at this point. here nowhere used, that write in one thread will be visible in another. absolutely another logicCleanser
If we have only read ( or write) to v in #2 and the same - only write or read in v in #3 - we can not say that one operation was before after another. Relationship will be not determinedCleanser
but from the fact that #3 was before #2 - it does not follow that some write before #3 in t2 will be visible in t1 after #2 - here it is necessary that t2 writes in #3 with release and t1 reads #2 with acquire. but here this is not used in any way. what is used is that the result of an operation in t2 that has not yet begun cannot be seen in t1Cleanser
@Cleanser I don't know how you define the wording "synchronize", anyway, the standard says: an atomic operation A that performs a release operation on an atomic object M synchronizes with an atomic operation B that performs an acquire operation on M ... Or, Is the "synchronization" by your meaning just the requirement to the load part of RMW operations? Namely, [atomics.order] p10?Eijkman
@Eijkman an atomic operation A on some atomic object M is coherence-ordered-before another atomic operation B on M if any of the following is true: 1) A is a modification, and B reads the value stored by A - let A is rmw on v in t1 and B rmw on v in t2. Or A reads result stored by B or B read results stored by A. So or A happens before B or B happens before A. #2 happens before #3 or #3 happens before #2. This i mean under synchronization .Cleanser
If #2 and #3 will be only atomic write, or read, we can not say that one operation happens before another. But in case both is rmw we can say that one happens before another ( which read result of first operation)Cleanser
q1 is based on several transitive happens-before and if #1 happens before #4, #1 can not view effect of #4. q1 not based on any memory order . Synchronisation i mean in generic sense, that #2 and #3 point of synchronization between t1 and t2: they may be performed in any order but may not overlap: either #2 will be complete before #3, or #3 will be complete before #2.Cleanser
@Cleanser coherence-ordered before does not mean "happens before", happens-before is defined in [intro.races] p10. It is irrelevant whether A and B are both RMW. Again, if #2 and #3 had a happens-before relationship, then any non-atomic operations based on them would have that relationship, but it's not true even if both operations are RMW, as long as their memory order are relaxed, they cannot form synchronization relationship.Eijkman
@Eijkman sense of this - #2 and #3 may not overlap: either #2 will be complete before #3, or #3 will be complete before #2 - you agree with this ? and if A(#1) was before B(#4) , A can not view any effect of BCleanser
@Cleanser "complete before" is not what the standard defines synchronization or happen-before, you can just say either #2 is coherence-ordered before #3 or vice versa, however, coherence-ordered is irrelevant to happen-before.Eijkman
@Eijkman this already play of words. if #2 coherence-ordered before #3, then #1 coherence-ordered before #4 and #1 can not view side effect (write) of #4 ( not view, but can not view)Cleanser
@Cleanser this already play of words. This is the definition of memory order. For example: flag = false; int i = 0; //thread 1: i = 1; falg = true(relaxed); // thread2: while(!flag.load(relaxed)){} assert(i==1) . By your logic, the loop won't exit except it sees flag is set to be true, so the store to i is completed before read i, however, it's not true because the memory order is relaxed, regardless whether the load of flag read the value written by the store.Eijkman
@Eijkman - no, your example have no any ralationship to my logic. here 2 different memory location - i and flag . that modification of flag is visible in another thread, not mean that modification is i will be also visible. i say absolute another things. that 2 rmw on same v can not overlap. that #2 before #3 or #3 before #2 - this is not related to any memory order (which is always use how minimum 2 memory location). and then i say if A before B - A can not view effect of B (note - i not say that if A before B - B will be view effect of A. i say that A not view effect of B)Cleanser
if A before B - not mean that B will be saw side effect of A (and i not say this). but this mean that A will be not view effect of B.Cleanser
that 2 rmw on same v can not overlap. any two atomic operations on the same atomic object do not overlap if your "before" means "coherence-order before". if A is coherence-ordered before B, A certainly cannot view the side effect produced by B. However, I don't know how you relate "coherence-order before" with "happen-before" or "synchronization", they are not directly relevant.Eijkman
D
2

It is correct that if two exchange operations occur in this program then one of them must return 0 and the other 1. This implies that the assert can never fail.

does it mean the RMW is less prone to reading stale values than non-RMW reads, at least in this example?

There's no such thing as a stale value. In order to explain what a stale value is supposed to be, you have to say something like "a value corresponding to an entry in the modification order such that a later entry already exists in the modification order", but in order to explain what you mean by "already", you have to assume some kind of linear ordering of time that comprises all threads, which is not how atomic variables work.

The way the standard expresses it: namely that the value read by an atomic RMW operation is always the value taken from the entry in the modification order immediately before the one that the RMW operation creates: is the right way. An atomic load operation can't possibly have this property, because it doesn't create any entry in the modification order.

Disadvantage answered 24/10, 2024 at 0:36 Comment(4)
I gave "stale" values meaning that a latter side effect in the modification order is newer than the former. In this example, #2 or #2' is executed only when #1 reads 0 and creates 1, so it is known that 0 precedes 1 in the modification order, in which case the load read a "stale" value if the value is 0.Eijkman
Since "stale" is an ambiguous wording in this context, I deleted that question and changed it to: Does it mean the RMW is more prone to reading the latter modification in the modification order than non-RMW reads, at least in this example?Eijkman
@Eijkman In this example, the RMW can only read 1, while in a different program in which you instead have a pure load, the load can read 0. But you are comparing two different program executions with each other, so it's meaningless to compare their modification orders.Disadvantage
Do you mean assert(v.exchange(0,std::memory_order::relaxed) == 1); is a equivalent transition to a pure load?Eijkman
F
0

Deleted wrong answer but I think the comments might still be helpful to other people.

Flow answered 23/10, 2024 at 16:24 Comment(7)
This question isn't about how to write good or sensible code, it's about a tricky detail of the memory model. You're right that #2 could potentially read a 0, e.g. with compile-time reordering so v.exchange was done before the potentially-infinite loop, but in that case t2 couldn't read a 0 so would never store flag=true, so the spin-wait loop would never terminate, never reaching the actual assert.Clement
WG21/P0062R1 points out some currently-allowed bad optimizations of atomics, but even it doesn't suggest that compilers should be allowed to reorder across potentially-infinite loops. (In this case the spin-loop includes and atomic operation so it's allowed to be infinite without UB.) But anyway, even if there's some other way for #2 to potentially execute and make its value visible to the other thread while exit from the spin-loop is still speculative, that doesn't lead to assert failure.Clement
Note that atomic RMWs still make the whole RMW atomic, not just tear-free load and separate tear-free store. If that's what your claim is based on, have a look at [atomics.order] p10 which the question quotes (or see it on eel.is/c++draft/atomics.order#10). It applies to all atomic RMWs like v.exchange, even with relaxed.Clement
Thanks for the last reference, that does mean the assert cannot fail - I will update my answer. I believe it's still permitted (but not recommended) for the assert never to be evaluated.Flow
Yes, a technically conforming implementation could ignore the "should" requirements that stores be visible to loads in a "reasonable" and "finite" time, so the spin loop could never exit. Or it might be possible to justify letting the #2 exchange happen before #3 even though it's after an infinite loop. Your answer still doesn't seem to answer the language-lawyer question at all.Clement
relaxed stores to a variable are not required to be made available to other threads in a timely manner (although it is recommended)) - why single out relaxed stores? It's ordering that matters, not timeliness. The standard doesn't guarantee anything about how soon a seq_cst store is seen by other threads, just the order of things when it eventually is seen.Clement
@PeterCordes I can't thank you enough - this discussion has corrected a major misconception that I had. I will delete the content of my answer as it's entirely wrong and the reality is correctly within the comments above. An OS kernel I worked on back in the day had a ordering model which was basically "we'll do things in any order we find convenient as long as you can't prove you asked for them in some other order" and when I came to C++ I think I just carried that over rather than recognizing there's no overlap between the single total order and real-world time ordering.Flow

© 2022 - 2025 — McMap. All rights reserved.