Is synchronization relationship necessary to avoid the duplicate invocation of a function?

Asked 27/8, 2024 at 8:24 Answered 27/8, 2024 at 20:24

Consider this example:

#include <iostream>
#include <atomic>
#include <random>
#include <thread>
int need_close(){
   random_device rd;
   std::mt19937 gen(rd());
   uniform_int_distribution<int> distribute(0, 2);
   return distribute(gen);
}
void invoke(std::atomic<bool>& is_close){
    if(is_close.load(std::memory_order::relaxed)){  // #1
        return;
    }
    auto r = need_close();
    if(r==0){
        is_close.store(true,std::memory_order::relaxed);
    }
}
int main(){
    std::atomic<bool> is_close{false};
    std::thread t1([&](){
        for(auto i = 0; i<100000;i++)
        invoke(is_close);
    });
    std::thread t2([&](){
        for(auto i = 0; i<100000;i++)
        invoke(is_close);
    });
    t1.join();
    t2.join();
}

In this example, Is the relaxed ordering sufficient to avoid the call to need_close once the thread sees that is_close == true(not immediately, but at some time once the thread can read is_close==true)? In this example, It seems that I don't need the synchronization to avoid data-race because there is no conflict action in this example. However, from the compiler implementation, the compiler may reorder #1 to any place following it because relaxed is used here, for example, if #1 is moved to some place after the call point of need_close, the need_close will be always called again even though is_close is set to be true, which is not expected. So, I wonder whether Acquire/Release ordering is necessary to avoid compiler reordering the code to make the logic be expected?

Ghostwrite answered 27/8, 2024 at 8:24 Comment(19)

No memory_order would prevent this code from calling need_close() more than once. Also I suggest replacing rand() with something else (something thread-safe), if your question is not specifically about it. – Moriyama 27/8, 2024 at 8:48

"avoid duplicate call" do you mean both threads entering the function at the same time, or simply "calling the function twice" ? For the latter there is std::call_once (with the necessary synchronisation baked in) – Hardship 27/8, 2024 at 8:52

@Moriyama Do you mean the compiler can reorder the code such that the #1 is useless? – Ghostwrite 27/8, 2024 at 9:4

It's possible even without reordering. Imagine one thread calling need_close() and pausing before setting the flag, then the second thread entering invoke() and also calling need_close(). – Moriyama 27/8, 2024 at 9:6

IOW, what @Moriyama is saying, this is broken even with memory_order_seq_cst (sequential consistency) – Rugging 27/8, 2024 at 9:7

@Moriyama If #1 is not reordered, the need_close won't be called at some time once after the thread reads close==true. Instead, if the compiler reorders #1, the need_close still is called even though the thread reads close==true, it's the difference here. – Ghostwrite 27/8, 2024 at 9:9

@Rugging If there is no reordering, the thread eventually sees the value close==true and won't call the need_close. – Ghostwrite 27/8, 2024 at 9:12

I don't understand what you're trying to say. It's easy to demonstrate that this can call need_close() twice by adding a delay after it: gcc.godbolt.org/z/68Kn9Khv8 – Moriyama 27/8, 2024 at 9:19

@Moriyama I meant, Is the relaxed ordering sufficient to make the need_close not be called once the thread sees is_close == true? If the compiler reordering the code, need_close can always be called. – Ghostwrite 27/8, 2024 at 9:28

@Moriyama gcc.godbolt.org/z/h6a7Ysfa9, in this example, the print is not called once the thread sees is_close == true. – Ghostwrite 27/8, 2024 at 9:35

if is_close.load() is true, the early-out return runs, so nothing else in the function happens. This is sequenced before the call to need_close. Compilers can't just arbitrarily shuffle source lines (which isn't how they optimize anyway), they have to respect sequencing to not break single-threaded programs. The CPU might fetch and decode the machine code for need_close() in the shadow of a mispredicted branch just like in single-threaded code, but the end result has to be as if the code ran in program order with is_close.load() having produced whatever value. – Xerosis 27/8, 2024 at 9:43

"once the thread sees is_close == true" But this isn't a useful question, is it? By definition when it sees it being true, it won't pass the first if. The question is when it sees it. – Moriyama 27/8, 2024 at 10:18

@Moriyama Peter read out my confusion. If the first if is reordered to some place after the call point of need_close() by the compiler, the need_close will keep invoking regardless of when the thread sees the is_close==true – Ghostwrite 27/8, 2024 at 13:19

@PeterCordes Oh, the compiler should also obey the sequenced-before principle, the first if cannot be reordered to some place after the call site of need_close() even though the relaxed ordering would permit the compiler to do, right? – Ghostwrite 27/8, 2024 at 13:28

Fairly sure this reordering is illegal regardless of the memory order, the compiler won't do it. The standard isn't worded in terms of reordering, but rather in terms of when a value becomes visible to a thread. When people talk about reordering, they usually mean situations where you do >1 reads/writes in a row, and values become visible to another thread in a weird order, and then they say that reads and/or writes were "reordered" relative to each other. But that's not what happens*, it's different values becoming visible in a weird order. [1/2] – Moriyama 27/8, 2024 at 13:34

[2/2] (*At least in the standard's abstract machine. Maybe the compiler DOES reorder on the assembly level, but due to the as-if rule you don't need to worry about it, as it won't happen if side effects are involved.) – Moriyama 27/8, 2024 at 13:34

Forget about threads for a moment, and let is_close be a plain bool. Would you now think that the compiler can move the check if (is_close) after need_close()? You certainly wouldn't. That in the posted case the type is different and the procedure runs in a thread does not change anything with regards to this kind of (forbidden) reordering. – Reflect 27/8, 2024 at 17:57

@xmh0511: Right, the CPU can start running need_close() while still waiting for a value for is_close.load(relaxed), and as long as it eventually sees false it can confirm this path of execution as the correct one, retiring those instructions. But if not, that speculatively executed work is actually a mis-speculation. The compiler could only reorder things by similarly speculating, like unconditionally doing some computation into a temporary but only actually committing visible side-effects after checking the is_close.load() result. – Xerosis 27/8, 2024 at 20:15

Multi-threading doesn't affect single-threaded coherence. In this case neither thread can store true and then read false on the next iteration. But either thread could read false many times after (in the real world sense) the other thread has stored true. There are some (vague) constraints that the other thread has to eventually see a store of true by the other but don't make any assumptions about how quickly that will happen particularly when using the relaxed memory order. – Balladmonger 27/8, 2024 at 20:40

Apparently what you really meant to ask is whether need_close can be called even if is_close.load(relaxed) returned true, due to the compiler moving the check later. Despite your edit leaving the title talking about "duplicate" invocation, as if there was something in this code that would otherwise prevent it from being called twice. (Which as comments and other answers discuss, there isn't, you can get multiple invocations even with seq_cst.)

The cardinal rule of out-of-order exec by CPUs, and of compile-time reordering, is that it must not break single-threaded code.

If is_close.load() is true, the early-out return runs, so nothing else in the function happens. This is sequenced before the call to need_close.

Compilers can't just arbitrarily shuffle source lines (which isn't how they optimize anyway), they have to respect sequencing to not break single-threaded programs, and the sequenced-before rules for multi-threaded programs.

The CPU might fetch and decode the machine code for need_close() in the shadow of a mispredicted branch (which can happen in a single-threaded program), but the end result has to be as if the code ran in program order with is_close.load() having produced whatever value.

relaxed means that value didn't have to be ready until after we've read and/or written some other things, but practically that will happen via out-of-order speculative execution by the CPU (which can discard mis-speculated work when it rolls back to a consistent state after a mispredict), not by the compiler statically reordering things in ways such that visible side-effects happen differently from the abstract machine if there are no other threads running.

Since your need_close() has visible side-effects (I/O in the form of constructing and reading from a std::random_device object), in this case there's not anything the compiler could usefully do unconditionally.

Who's afraid of a big bad optimizing compiler? on LWN talks about some practical examples of compile-time reordering, including some subtle and interesting ones, but also some of the limitations of what compilers must not do.

See also https://preshing.com/20120625/memory-ordering-at-compile-time/ and Preshing's other articles to get a better understanding of how to think about this stuff.

Xerosis answered 27/8, 2024 at 20:24 Comment(16)

Ha! I've just added a comment about how multi-threading can't break single-threaded code. – Balladmonger 27/8, 2024 at 20:42

@PeterCordes Please look at this example godbolt.org/z/s9G5WzW4K, Is it possible the need_close is still called several times in the first thread after is_close.store(true,std::memory_order::release) is set to be true in the second thread, even with the strongest memory_order seq_cst? – Ghostwrite 28/8, 2024 at 2:6

@xmh0511: What do you mean by "after"? After the store becomes globally visible? No, because that would mean invoke sees it as true. – Xerosis 28/8, 2024 at 5:19

@PeterCordes After the store becomes globally visible No, I meant after is_close.store(true,std::memory_order::release) is executed. – Ghostwrite 28/8, 2024 at 7:53

@xmh0511: "executed" isn't a meaningful point in time as far as memory ordering and visibility is concerned. If you mean "executed" as is the store instruction writes the address and data to the store buffer on a typical modern CPU, then no, other cores can still have the cache line in MESI Shared state and are able to read the old value. Only after the invalidate their copy (in response to a Read For Ownership (RFO) from the core trying to commit the store-buffer entry to its cache after the store retires from the Reorder Buffer) will their next read need to check with other cores. – Xerosis 28/8, 2024 at 11:1

@xmh0511: seq_cst allows any interleaving of program order between threads. If your program doesn't contain data-races, a delay in visibility due to a store buffer or whatever is indistinguishable from simply not happening until later. memory_order is about the order in which things become visible to other threads, and that's how you should think about it. Internal details of a CPU are only relevant if you're trying to understand how they work, and some of the mechanisms that can introduce reordering. – Xerosis 28/8, 2024 at 11:6

@petercordes so, it is possible that these load still read old value(i.e. false) after is_close.store(true,std::memory_order::release) is executed(i.e. the store instruction writes the address and data to the store buffer on a typical modern CP), right? – Ghostwrite 28/8, 2024 at 13:56

@xmh0511: At that point it's still the current value of is_close. The value of is_close doesn't change until a store to it commits to L1d cache. The timing of a write to the store buffer is not important, and can even happen as a result of a mis-speculation that will be rolled back, never becoming globally visible. That core might not even send an RFO until the store instruction retired from the ROB. (Stores in the store buffer after the associated store instruction retires are called "graduated" stores, and are ready to commit to cache once we have exclusive ownership of the line) – Xerosis 28/8, 2024 at 15:50

@PeterCordes From the standard perspective, saying we have two threads t1 and t2, the modification order is v0, v1 where v0 is produced by the initialization of the atomic object M and v1 is produced by the store of t2, the store operation in t2 does not happen-before the load in t1, t1 loads M for many times, it is possible that the load in t1 can always read v0 without violating any rules defined in [intro.races], is my understanding right? – Ghostwrite 29/8, 2024 at 4:45

@Ghostwrite The thing to understand is the classical Von Neumann Architecture Model (VNAM - en.wikipedia.org/wiki/Von_Neumann_architecture) is a long way past broken on any modern multi-core platform. Multi-level caching on multi-core CPUs along with speculative execution and hyper-threading mean that reasoning based on VNAM is often flawed and sometimes outright meaningless. The C++ Standard is very clear that relaxed memory operations are not synchronisation events. – Balladmonger 29/8, 2024 at 11:12

@xmh0511: the ISO C++ standard has two guarantees of eventual visibility of stores to loads in other threads, but neither are in [intro.races], so yes that's correct. (The guarantees are actually phrased as "should", no formal requirement, because it's hard to formally express best-effort on systems without hard real-time guarantees. Why set the stop flag using `memory_order_seq_cst`, if you check it with `memory_order_relaxed`? quotes them from [intro.multithread] and [atomics.order] – Xerosis 29/8, 2024 at 12:5

@PeterCordes So, this case can also happen even though we used seq_cst memory order for the load and store operations, right? – Ghostwrite 29/8, 2024 at 12:53

@xmh0511: Yes, of course. seq_cst doesn't make stores visible to loads any faster, just controls ordering of a thread's accesses to shared memory (or coherent cache). – Xerosis 29/8, 2024 at 13:3

@PeterCordes Thanks. I read your linked answers, iiuc, from the implementation perspective, a load didn't see the store that has been in the modification order(c++ standard concept) is due to the possible latency, which is the case that c++ standard uses modification order and no violation of the rules in [intro.races] to permit this happen or express/imply this situation. – Ghostwrite 29/8, 2024 at 13:34

@PeterCordes For the above question, compare a load operation with the load part of an RMW operation, saying we have a modification order {a0, a1} of an atomic object M, a load operation can either read a0, a1 or always read a0, all three cases are valid while M.exchange(2) must read a1 such that its write will make the modification order be {a0, a1, a2}, right? – Ghostwrite 5/9, 2024 at 8:36

@PeterCordes paging for this question #78952875 – Ghostwrite 8/9, 2024 at 10:6

How do you know if you can go to the branch that call need_close() ? By reading the value within is_close, it is impossible to do so without knowing that value so the compiler will not be able to reorder #1 after calling need_close() even with relaxed ordering constraints but this doesn't solve your problem.

Ordering won't prevent the function from being called twice by different threads as between the read and need_close(), another thread can perform both operations so need_close() will be called twice.

You simply cannot prevent need_close() to be called twice. It have to be safe to concurrent calls by making its logic concurrent friendly or using a mutex.

Epicure answered 27/8, 2024 at 10:2 Comment(2)

@DevilishSprits I didn't prevent need_close from being called twice, I want to prevent need_close from being called once is_close==true. My confusion here is whether the compiler can reorder the first if branch after some place following need_close() – Ghostwrite 27/8, 2024 at 13:22

Compiler will not reorder that, but there is a TOCTOU with your current code as is_close become true between those lines due to action of another thread. – Epicure 28/8, 2024 at 10:49

In comments by the OP:

I want to prevent need_close from being called once is_close==true.

There's nothing in this code to prevent that with any memory order with or without execution re-ordering. That's all a red-herring.

If that's the meaning of the Original Question, the answer is yes. Some form of synchronisation is necessary.

It is perfectly reasonable that both threads load() a value of false and proceed to the execution of need_close() before the other thread makes a store() of true without re-ordering anything.

It doesn't matter how long need_close() takes to execute but in this example it's outright likely both threads spend most of their time "chasing each other inside" need_close() because logically the work of need_close() is likely many times the work of checking an atomic flag and looping.

NB: Never make arguments that code X will always (or never) execute faster than Y because you should normally assume any thread can be suspended at any time by way of pre-emptive multi-tasking. Such arguments are the very notion of a race condition.

The simple solution is to introduce a mutex.

Here's some discussion of relaxed memory order and the potentially counter-intuitive behaviours it admits: https://en.cppreference.com/w/cpp/atomic/memory_order https://en.cppreference.com/w/cpp/atomic/memory_order#Relaxed_ordering

Balladmonger answered 27/8, 2024 at 16:38 Comment(3)

However, the other thread eventually can see the value(true) stored by another thread, isn't it? At that time, the need_close won't be called, I think. – Ghostwrite 28/8, 2024 at 1:22

see godbolt.org/z/YWeq36TbK, the need_close in the thread won't be called after some times. – Ghostwrite 28/8, 2024 at 1:30

@Ghostwrite Be very careful instrumenting with std::cout<< the library is almost certainly performing internal synchronisation operations that are likely to result in the true becoming visible to the other thread. There's no guarantee the behaviour of your program will be the same when you take them out! Valid uses of relaxed memory order are quite limited because you usually need to ensure a synchronisation takes place by other means or you're coding a race condition with no defined conclusion. – Balladmonger 28/8, 2024 at 8:20

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags