#include <iostream>
#include <thread>
#include <mutex>
#include <atomic>
using namespace std;
const int FLAG1 = 1, FLAG2 = 2, FLAG3 = 3;
int res = 0;
atomic<int> flagger;
void func1()
{
for (int i=1; i<=1000000; i++) {
while (flagger.load(std::memory_order_relaxed) != FLAG1) {}
res++; // maybe a bunch of other code here that don't modify flagger
// code here must not be moved outside the load/store (like mutex lock/unlock)
flagger.store(FLAG2, std::memory_order_relaxed);
}
cout << "Func1 finished\n";
}
void func2()
{
for (int i=1; i<=1000000; i++) {
while (flagger.load(std::memory_order_relaxed) != FLAG2) {}
res++; // same
flagger.store(FLAG3, std::memory_order_relaxed);
}
cout << "Func2 finished\n";
}
void func3() {
for (int i=1; i<=1000000; i++) {
while (flagger.load(std::memory_order_relaxed) != FLAG3) {}
res++; // same
flagger.store(FLAG1, std::memory_order_relaxed);
}
cout << "Func3 finished\n";
}
int main()
{
flagger = FLAG1;
std::thread first(func1);
std::thread second(func2);
std::thread third(func3);
first.join();
second.join();
third.join();
cout << "res = " << res << "\n";
return 0;
}
My program has a segment that is similar to this example. Basically, there are 3 threads: inputer, processor, and outputer. I found that busy wait using atomic is faster than putting threads to sleep using condition variable, such as:
std::mutex commonMtx;
std::condition_variable waiter1, waiter2, waiter3;
// then in func1()
unique_lock<std::mutex> uniquer1(commonMtx);
while (flagger != FLAG1) waiter1.wait(uniquer1);
However, is the code in the example safe? When I run it gives correct results (-std=c++17 -O3
flag). However, I don't know whether the compiler can reorder my instructions to outside the atomic check/set block, especially with std::memory_order_relaxed
. In case it's unsafe, then is there any way to make it safe while being faster than mutex?
Edit: it's guaranteed that the number of thread is < number of CPU cores
atomic_thread_fence
s?). – Wagtailres++
isn't ordered with respect to relaxed atomic stores. Your code is unsafe. Use acquire/release memory ordering. See Herb's excellent atomic<> weapons talk for more details. – Backspacestd::vector
?), processor thread reads from there and writes to output queue, output thread reads from output queue. In the processing thread, lock mutex,std::move
the entire queue to a local variable (this should be quick), unlock mutex, then start processing data until you need more. In the output thread, do the same. This way none of the threads have to stop to wait for the following thread to process the data. – Conceivable-fsanitize=thread
, and it says that it's a data race (and UB). If I use 'acquire' loads and 'release' stores, it stops complaining. I then expected to be able to replace the 'acquire' load with a 'relaxed' load followed by an 'acquire' fence, which didn't work for some reason. I asked a new question about it here. – Wagtailrelaxed
atomic operations can be freely reordered, either at the code generation or machine level, with unrelated loads and stores. So for instance, infunc1
, the machine could load the value ofres
before the test offlagger
, and only store back the incremented value after getting the correct value loaded fromflagger
. But some other thread could incrementres
in between, which would then be lost whenfunc1
stores back a stale value. – Precinct