Is the read-modify-write operation a single indivsible operation?

Asked 23/10, 2023 at 7:59 Answered 23/10, 2023 at 7:59

In this comment under a CWG issue, Jens Maurer says

The read-compare-write is a single, indivisible operation ("atomically").

However, as discussed in

For purposes of ordering, is atomic read-modify-write one operation or two?

Can the read operations in `compare_exchange_strong` in different two thread read the same value?

Even though we haven't had any 100% confirmed answers, We seem to consider the read-modify-write operation to comprise two operations: read and write operations, which is also implied by this formal wording [atomics.order] p10

Atomic read-modify-write operations shall always read the last value (in the modification order) written before the write associated with the read-modify-write operation.

The wording mentions the "read" and "write" associated with a read-modify-write operation. However, Jens thinks a read-modify-write operation is a single indivisible operation.

What is the intent meaning in the standard here?

Update

A relevant example is:

std::atomic<int> x{0};

// Thread 1:
 int expected  = 0;
 x.compare_exchange_strong(expected,1,std::memory_order::release,std::memory_order::relaxed); // #1

// Thread 2:
 int expected  = 1;
 while(
    /*#2*/ 
    x.compare_exchange_strong(expected,2,std::memory_order::acquire,std::memory_order::relaxed)
){}

Given the assumption that the RMW operation in thread 2 reads the value written by the RMW operation in thread 1(i.e. #2 only runs one iteration)

[atomics.order] p2 says:

An atomic operation A that performs a release operation on an atomic object M synchronizes with an atomic operation B that performs an acquire operation on M and takes its value from any side effect in the release sequence headed by A.

[atomics.order] p1 says:

memory_order::release, memory_order::acq_rel, and memory_order::seq_cst: a store operation performs a release operation on the affected memory location.

memory_order::acquire, memory_order::acq_rel, and memory_order::seq_cst: a load operation performs an acquire operation on the affected memory location.

That means, if A is a store operation and B is a load operation, and they satisfy [atomics.order] p2, then they synchronize.

IIUC #1 synchronizes with #2, however, #1 and #2 are both RMW operations, Is #1 a store operation that performs a release operation, and #2 is a load operation that performs an acquire operation?

Sands answered 23/10, 2023 at 7:59 Comment(31)

I'm convinced it's a single operation. "We seem to consider the read-modify-write operation to comprise two operations" I haven't seen this to be the consensus. – Tersanctus 23/10, 2023 at 8:2

It's not divisible by other operations on the object being RMWed. That's what makes it atomic. In practice on real CPUs we can observe the store side of an acq_rel or seq_cst RMW reordering with a later load (as Nate's answer on your first link shows), but the load side of the RMW has acquire semantics so isn't allowed to reorder that way. So we can't explain the observed behaviour in terms of an atomic RMW being one pair of operations that stay glued together in the total order of events. – Wonderful 23/10, 2023 at 8:3

@Tersanctus See https://mcmap.net/q/20417/-for-purposes-of-ordering-is-atomic-read-modify-write-one-operation-or-two – Sands 23/10, 2023 at 8:4

@PeterCordes Yes, but how about the RMW operation itself? As you said in this comment #77126545, The current wording avoids treating the atomic RMW as a single operation – Sands 23/10, 2023 at 8:4

Considering just the object being operated on, it's a useful simplification to say an atomic RMW is indivisible. That's what "atomic" literally means. That's a correct answer to your question about Can the read operations in `compare_exchange_strong` in different two thread read the same value? where the ops are on the same atomic object. As Jens said, the fact that CAS_strong is an atomic RMW (if it succeeds) removes any room for doubt about that. – Wonderful 23/10, 2023 at 8:11

@PeterCordes That means, from the perspective of C++ standard, RWM is a single indivisible operation, and two RMWs can never read the value written by the same modification, right? – Sands 23/10, 2023 at 8:15

Yes, exactly, that's what I already explained in my answer to that question, as did others in comments under that answer replying to you. – Wonderful 23/10, 2023 at 8:16

@Sands "See ..." I think that answer doesn't prove otherwise. From a formal point of view, they do at most one non-relaxed operation per atomic, hence no synchronization ever happens, and they might as well use relaxed everywhere (I'm ignoring the fence, since it can't retroactively affect whether we enter the if or not). – Tersanctus 23/10, 2023 at 8:17

@HolyBlackCat: Nate's test-case uses me.exchange(true, std::memory_order_seq_cst); so both the load and store sides have seq_cst semantics (which is why it compiles to ldaxrb / stlxrb in the retry-loop, note the a and l in the mnemonics for Acquire and reLease). I haven't re-read and thought through exactly what it's testing so maybe I'm missing the point you're making, but I did upvote it at the time after reading it carefully then and finding it convincing. – Wonderful 23/10, 2023 at 8:22

@PeterCordes I'm not too good at this (don't really know how atomics work on CPU level), so I'm approaching this purely from the point of view of the standard formalism. From that point of view, seq_cst only differs from relaxed in its ability to create "synchronized with" relationship (doesn't happen here, as it requires >1 non-relaxed operation on a variable), and in the seq-cst order affecting the values returned from loads (which that code doesn't examine, hence they too don't matter). – Tersanctus 23/10, 2023 at 8:33

@HolyBlackCat: Oh, you're talking about different ops on the same object, counting an RMW as one. Indeed, ISO C++ doesn't guarantee correctness of Nate's code, that's why it's allowed to break. The argument is that if an atomic RMW was truly indivisible even wrt. ops on other objects, the store side as well as load side of it would have to become visible before a later relaxed load, effectively promoting that relaxed load to seq_cst for the purposes of Peterson's algorithm. The mechanism for that not happening on AArch64 is presumed to be reordering of the store side with the later load. – Wonderful 23/10, 2023 at 9:3

Maybe the bigger question is "does it matter"? Can you identify a particular piece of code, together with a particular passage in the standard which would appear to dictate two different behaviors for the code, depending on which interpretation is adopted? – Crispi 23/10, 2023 at 14:43

In the question of mine that you cite, I think it ended up being clear to everyone that the standard's memory model permits the 0,1 behavior. I find it easier to understand that behavior by thinking of atomic RMW as two operations. But you're also welcome to think of it as one operation for which the standard does not guarantee a particular ordering behavior that a person might naively think it would. – Crispi 23/10, 2023 at 14:49

In other words, the "two operations" question is relevant if your mental model involves there being a global order on operations (like order of commit to coherent L1 cache), and the architecture allowing "reorderings" that cause the global order to become inconsistent with program order. But of course that is not the way the C++ standard defines it at all. The cppreference text I was asking about was written from the former perspective, and I think this issue demonstrates the risks of doing so. – Crispi 23/10, 2023 at 14:55

@NateEldredge In your example, it's just that simple that it's not sure whether x.exchange(1, std::memory_order_acq_rel); is determined to happen before/after int xx = x.load(std::memory_order_acquire);, so it can read the initial 0. It does not matter whether the operation is RMW or just a store. – Sands 24/10, 2023 at 3:34

@xmh0511: I don't follow you. There is nothing in the example which unconditionally determines any operation from thread A to happen-before any from thread B, nor vice versa - there are no branches anywhere in the code. If xx == 1 then we are assured that the exchange happens-before the load, because a release store synchronized with an acquire load. That is all we can say. But it has no bearing to the question of which outputs the program is permitted to generate. – Crispi 24/10, 2023 at 15:3

@xmh0511: To rule out the possibility of 0,1, we would have to be able to show that whenever yy == 1, then x.exchange happens-before x.load. I claim there is no way to conclude that from the standard's ordering rules, regardless of whether you interpret the exchange as one operation or two. – Crispi 24/10, 2023 at 15:7

@xmh0511: Oh, maybe we are both saying the same thing. But then you still haven't given an example of where the "one/two operation" interpretation would make a concrete difference. – Crispi 24/10, 2023 at 15:22

@NateEldredge Maybe there are other rules in the standard that could be impacted by the subtle read, I'm not sure where they are. – Sands 25/10, 2023 at 2:53

@PeterCordes See my update part. – Sands 30/10, 2023 at 13:53

@NateEldredge See my update part, this is why I am concerned about whether an RMW operation comprises read and write operations. – Sands 30/10, 2023 at 13:54

@xmh0511: "I am concerned about whether an RMW operation comprises read and write operations." What does that mean? What does it mean for anything to be an "operation"? You're using non-standard terminology while expecting the standard to make sense with it. – Burro 30/10, 2023 at 14:6

If #2 runs two iterations before #1 executes, it will succeed and cause #1 to fail. (compare_exchange_strong/weak updates its first arg by reference). Otherwise, with #1 running first, #1 and the iteration of #2 that succeeds are both RMW operations. #1 is an RMW with release semantics, #2 is an RMW with acquire semantics. (Assuming that #2 is supposed to be the CAS inside the while loop; there is no #2 comment in the code block.) – Wonderful 30/10, 2023 at 14:28

An RMW can have relaxed, acquire, release, acq_rel, or seq_cst ordering. It includes a load and a store, effectively stuck together into one atomic operation, at least wrt. other reads and writes of this object. That seems to me like an obvious interpretation of the language used in the standard, perhaps because I understand that's what an RMW is in real CPUs. Interesting point that [atomics.order] p1 just says "load" and "store" without mentioning RMW, but it would be helpful for your question to link them so I could more easily see what else it said around that. – Wonderful 30/10, 2023 at 14:34

eel.is/c++draft/atomics.order#3 talks about a read (load) and a modification (store), and the qualification that "A and B are not the same atomic read-modify-write operation", which is an example of the standard potentially talking about the two sides of an RMW. – Wonderful 30/10, 2023 at 14:38

@PeterCordes However, a release operation must first be a store operation while a acquire operation must first be a load operation. The standard does not clearly define which are store operations and which are load operations. Is RWM operation a load operation or a store operation, or is both? – Sands 31/10, 2023 at 2:25

@PeterCordes To clarify what I meant in the updated example, I have added the assumption, that is, #2 only runs one iteration, which reads the value written by the RMW operation in thread 1. – Sands 31/10, 2023 at 2:29

The fact that it could be implemented by some CPUs as two operations (ex: in MIPS or Armv7 that don't provide CAS) is irrelevant to the point being made by the standard. In those CPUs that use Load/Store the two operations used are special and changes to the memory address that could affect the integrity of the atomic operation are tracked in hardware. – Lactose 31/10, 2023 at 2:42

An RMW operation is both a load and a store. It's right there in the name: read modify write. Names aren't meaningless labels; their normal English meaning is relevant here. In terms of the formalism, that's necessary from the fact that release and acquire are both usable with an RMW. (Or does the standard not explicitly say that?) I don't think there's a real question here, just perhaps a desire for more precision in the formalism to avoid reliance on what basic things are considered "obvious". – Wonderful 31/10, 2023 at 5:2

Or I guess the only real question surrounding whether an atomic RMW is truly a single operation vs. having load and store components (like your title phrasing) is For purposes of ordering, is atomic read-modify-write one operation or two? which pretty definitively shows that on some real implementations, it's two, in the sense that an operation on a different location can happen between the acquire-load and relaxed-store sides. – Wonderful 31/10, 2023 at 5:24

That's either compatible with the standard or a defect in the standard. Probably already compatible with since it doesn't guarantee the ordering which we'd get from RMWs being truly indivisible. – Wonderful 31/10, 2023 at 5:25

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Update

Recommended topics

Hot tags