reordering atomic operations in C++
Asked Answered
A

5

9

Suppose I have 2 threads:

int value = 0;
std::atomic<bool> ready = false;

thread 1:
value = 1
ready = true;

thread 2:
while (!ready);
std::cout << value;

Is this program able to output 0?

I read about the C++ memory model - specifically, sequential consistency, which I believe is the default, and it wasn't particularly clear. Is the compiler only required to put atomic operations in the correct order relative to each other, or is it required to put atomic operations in the right order relative to all other operations?

Aldon answered 29/10, 2016 at 14:46 Comment(5)
It's a bit more complex than that: There's in-thread and between-threads ordering to consider. Basically, the rules work "as expected" so that the code is correct and does what you think it should do.Airbrush
For sequential consistency, it's much like a barrier; stuff above it cannot be rearranged to go after, and stuff below cannot be rearranged to go before it. Doesn't matter if the other values are atomic or not. In this case, you're fine.Chert
@Chert There is no concept of "rearranged" in C/C++.Unclear
@curiousguy: I apologize for using synonyms? I don't know what you're looking for here.Chert
@Chert Synonym of what? What std term is syn of "rearrange" and is used to define C/C++ MT semantics? I'm simply saying that you can't reason in term of rearrange-able code. C/C++ don't work that way.Unclear
D
9

By default operations on atomic variables are done using the memory_order_seq_cst semantics, which guarantees that no reordering will be done.

Thus the line: value = 1 cannot be reordered below the atomic assignment: value = 1, so the line std::cout << value; will always print 1.

By the same rules, the line: std::cout << value; cannot be reordered
above the line: while (!ready);.

Dehart answered 31/10, 2016 at 8:38 Comment(1)
"which guarantees that no reordering will be done." That is not quite correct. Reordering IS allowed but all operations that in the code happen-before must be visible before the memory-barrier. If there is an assignment to another variable before the atomic, but it is not shared with any other thread, the compiler is free to move it past the atomic operation.Foretopsail
P
3

Is this program able to output 0?

No.

Your reasoning is correct. In ISO C++ seq_cst and acq_rel atomics can create happens-before / after relationships between threads, making it safe for one thread to write a non-atomic variable and then another thread to read it without data-race UB.

And you've correctly done so in this case: the spin-wait loop is a seq-cst load from the flag that only exits the loop upon seeing a true value. The evaluation of non-atomic value happens-after the load that sees the true. And in the writer, the sequential-release store makes sure that the flag store is not seen before the value store.


When compiling to asm for a normal ISA, compilers have to respect ordering of non-atomic stores before release-stores, and of non-atomic loads after acquire loads. Unless it can somehow prove that there would still be UB for any other possible thread observing this.

Polack answered 21/9, 2019 at 21:35 Comment(0)
A
1

It acts like a memory barrier, as per ShadowRanger's response. However, for more details on why it does that, I suggest looking at Herb Sutter's talk on atomic weapons. He goes into great detail about how and why atomics work.

Aldon answered 30/10, 2016 at 0:53 Comment(0)
A
-4

Yes, but only because you used a default.

Cst suffers because it uses a global scope for re-ordering; which is an artifact of old architectures. The newer architectures have a more granular scope for sequencing, so you might expect a standards update will invalidate your code in the near future.

Write queues can end up with many dozens of entries each with an egregious bus latency to resolution. Partitioning these into the ones that matter vs those that don't is an obvious step, one that new architectures already embody.

The C++ standards creation committee are clearly out of their depth, and should stop inventing useless crap.

Atropos answered 18/9, 2019 at 2:54 Comment(6)
1) Not sure what "artifact of old architectures" you are talking about. 2) The code in Q only relies on write is release and read is acquire guarantees. 3) Which std change could break that code? Why would the std committee make that change?Unclear
Take a look at something like risc-v. The ordering / coherency guarantees can be on a per word (really cache line) basis, and based on the instructions themselves. A barrier on a modern x86 could cause a wait of 64 cache lines, none of which are relevant to the synchronization itself. A whole pile of stack ops, for example, which are almost certainly local to the thread, then an atomic incr of some unrelated var, will stall the incr until all of those stack writes are globally visible. And for the cases where you really want global, there are the normal global barriers.Atropos
(2, 3) The C++ remark is probably a bit snide, but the standards invention committee doesn't seem to put much stock in compatibility; which is odd given the standards thing.Atropos
I agree that recently the committee has shown a strange lack of respect for not only source code compat but also spirit of C++ compat, such as allowing "information losing implicit conversion" in all contexts (in many case there is clearly no loss even though the compiler doesn't know that). This is clearly unconscionable.Unclear
Your comment about the C++ committee is strongly worded but I thing mine is stronger and also concerns both C and C++ committees.Unclear
ISO C++ does have mo_consume for dependency ordering between a load and another dependent load. The problem is that compilers gave up (for now) on implementing it after problems were found, and currently strengthen it to acquire. I'm not familiar with RISC-V specifically; does it have things other than the usual dependency-ordering that most weakly-ordered ISAs provide? Like maybe a StoreLoad barrier that only cares about certain stores, not all queued stores?Polack
U
-4

First note: it's absolutely impossible to reason about any C or C++ program in any version of C and C++ that supports thread, because of the possibility of UB (undefined behavior), because there is no well defined abstract thread semantics, or any semantic at all defined. This is yet another major theoretical as well as practical flaw in C and C++ semantics (on top of many other crippling flaws).

But you can reason in practical terms: compilers are very predictable in their implementation of threading primitives (that may not be the case in the future as compiler writers get fluent in thread semantics and begin breaking things using claims of UB).

When using thread communication primitives the compiler does the right thing to guarantee flow of information. while (!ready); guarantees that a thread exits the loop after the atomic objects is set: there is a well defined "past".

As a practical real world example of an ill defined "past", remember the Apollo audio exchanges with astronauts talking over Houston: there was no well defined notion of who began talking first as the astronauts were very far away, and the only recording we have (from Houston) shows an order but an hypothetical recording from the spaceship would show another, and neither order is the correct one. Astronauts and Houston began talking without order, neither were in the past of the others. Until you see that, you can't claim to understand relativity.

With multithreading, you can have memory operations that are not in a the past of others, and can't know what they will observe if they attempt to use objects manipulated in the non-past.

Unclear answered 21/9, 2019 at 14:32 Comment(17)
There's no UB here: simultaneous read/write of the non-atomic variable value is ruled out by the spin-wait loop.Polack
@PeterCordes yes, as usual, starting with the conclusion (no UB occurs) you can apply rules that show there is no UB (note that it isn't the trivial logical operation (hypothesis=>hypothesis), as you must apply "single thread" rules). But that's logically invalid. You can't exclude UB in any program that has MT semantics.Unclear
Your Apollo analogy is like IRIW reordering (where threads can disagree on the global order of stores by 2 other independent threads). Or more simply seeing your own stores first because of store forwarding. But that's only possible with orderings weaker than seq_cst: seq cst stores have to flush the store buffer, creating a global order that all threads agree on.Polack
Anyway, back to your claim that you can't exclude UB in the OP's program: please give an example of an ordering of things allowed by the C++ abstract machine that results in UB. Or if you need to consider some implementation details, then an example of exactly how UB can occur here. It seems obvious to me that UB can be ruled out, assuming the threads are started with std::thread and so on. Have you forgotten that C++11 introduced threads and a multi-threaded memory model into the language standard, no longer leaving those things as unstandardized extensions?Polack
@PeterCordes Any program has UB (potentially). By introducing MT, the std self destroyed. The example is what you want it to be. I know it's annoying to have to say that but there is no way out. You can't reason about a MT program, ever, as you can only make proof of correctness if you rule out UB, which you can't, because of UB. Well maybe a program with only atomic operations that is no non atomic non mutex objects and no destruction of any object might be provably correct. If any obj is destroyed, you might use that object destroyed, causing UB. Why isn't everybody screaming about that!Unclear
Any program has UB (potentially). This is nonsense. In real life most large program probably have at least some UB that happens to work, but we're talking about toy examples. It's easy to accidentally include UB for some inputs (especially integer overflow), but it is very possible to write programs that avoid any UB. Especially tiny trivial programs that don't take user input. Many potholes exist but it is possible to avoid them. e.g. your destroying-an-object example: it is possible to write correct programs that destroy objects and don't reference them afterwards.Polack
Why isn't everybody screaming about that? For the same reason people aren't screaming about trees at the sides of roads: cars usually drive right by them without hitting them, even though they're right there and a mistake could lead to hitting them.Polack
@PeterCordes Trees don't have UB. They follow laws. UB does not. UB wrecks everything. If you destroy any object, you can't prove correctness w/o assuming that you don't have UB from the fact you are using the object after (or during) destruction. (This is also the case in C which of course doesn't have dtors but has lifetime.) This conclusion is reinforced by this Q whose answers immediately imply you can't even have two distinct single linked lists in MT programs, unless you use atomic for next ptr.Unclear
UB only wrecks anything if it occurs along an actually-taken path of execution in any thread. For example, if(y) x=/y; is safe because divide-by-zero UB is guarded by a check. For destroying an object, it's totally normal to ensure that all uses of the object happen-before the destructor. Then no problem exists unless your program has use-after-free bugs. The fact that the language doesn't protect you from that bug doesn't make it impossible to avoid.Polack
IDK what you're trying to prove by linking a question about obvious UB from shared access to non-atomic vars without synchronization via mutexes or (like in this question) atomic variables. Of course you can't have a lockless singly-linked list without atomic members, unless all the accesses to it are from one thread. Then it's fine. (Or otherwise synchronized to make sure a node isn't being read and written at the same time). Or each thread can have its own non-atomic linked list if you want.Polack
@PeterCordes No, I said you can't have thread 1 manipulating linked list A and thread 2 ... list B not interacting at all and keeping the linked list distinct. You can't compose functions in diff threads. Composition is gone. Threads all can interact in huge memory orgy. Unless all your code uses only atomics. As the answers in the linked Q says (the Q is about flags but indexed accesses are no different, except it's data dep instead of contr dep; and linked list manips are essentially indexed accesses, again data dep). Either you agree with me, or you revisit that Q.Unclear
@PeterCordes "it's totally normal to ensure that all uses of the object happen-before the destructor" How do you do that? With mutexes and atomics? That's fine, until you have UB and then you can't rely on atomics establishing H-B so... you have UB.Unclear
That's incorrect. If only one thread ever reads/writes a data structure, it can be normal non-atomic without any problems. You only need atomics or mutexes to create synchronization (or make all the data atomic) if you want writes from one thread to be published to another thread (like in your linked question). You still haven't shown a specific ordering of C++ abstract machine operations for the code in the question that leads to UB, and that's allowed by logic and the standard. I think you're missing something fundamental about C++ memory ordering and UB rules.Polack
@PeterCordes How did that work in the linked Q? I don't have to show ordering or anything: I (I = the implementation) have UB, which is the ultimate joker to prove anything. You lose because you are playing chess w/ someone w/ teleportation! Ppl who try to formalize C/C++ MT and prove desirable properties carefully avoid the subject. They know they can't model opportunistic UB (UB you can't exclude). They manage to model necessary UB (UB that's certain for a program trace). C++ is at least a quadruple semantic failure (unneeded "launder", lifetime/lvalues/unions, base subobjects, MT).Unclear
Now I see ppl saying "it's so unfair, why do you use the UB card?" Well because I can. I don't have to avoid "silly" moves that kill the game. If you play chess w/ someone who can teleport stuff, why wouldn't he use that? It's illogical to assume the impl would not use any tool to beat the user. We know GCC did not use to do that and recently does more and more, whenever UB is detected. UB is just the ridiculously strong move I can do to win, but excluding UB as such is not possible, as UB is not a distinct semantics, it's a ultimate degeneracy. Any less significant degeneracy would do.Unclear
I had a more careful look at What formally guarantees that reads don't see a write (with race condition) in other threads? which you linked earlier. I had earlier in this thread missed the fact that both stores were behind false conditions so there's no obvious UB. The answers there were misleading. In fact I think there's no UB there at all. I posted an answer. If you were basing your reasoning on the previous answers to that Q, no wonder your claims here are so wrong and disconnected from reality. Yes you do need to show an order of operations!Polack
But also, that question is pretty different from this one. Here, atomics are used to create a happens-before / after relationship between the accesses to the non-atomic variable. Nothing like the circular weirdness in that other Q is possible even if those answers were right. Same for 2 threads each having their own private linked lists.Polack

© 2022 - 2024 — McMap. All rights reserved.