I suppose that std::atomic
sometimes can replace usages of std::mutex
. But is it always safe to use atomic instead of mutex? Example code:
std::atomic_flag f, ready; // shared
// ..... Thread 1 (and others) ....
while (true) {
// ... Do some stuff in the beginning ...
while (f.test_and_set()); // spin, acquire system lock
if (ready.test()) {
UseSystem(); // .... use our system for 50-200 nanoseconds ....
}
f.clear(); // release lock
// ... Do some stuff at the end ...
}
// ...... Thread 2 .....
while (true) {
// ... Do some stuff in the beginning ...
InitSystem();
ready.test_and_set(); // signify system ready
// .... sleep for 10-30 milli-seconds ....
while (f.test_and_set()); // acquire system lock
ready.clear(); // signify system shutdown
f.clear(); // release lock
DeInitSystem(); // finalize/destroy system
// ... Do some stuff at the end ...
}
Here I use std::atomic_flag
to protect use of my system (some complex library). But is it safe code? Here I suppose that if ready
is false
then system is not available and I can't use it and if it is true then it is available and I can use it. For simplicity suppose that code above doesn't throw exceptions.
Of cause I can use std::mutex
to protect read/modify of my system. But right now I need very high performance code in Thread-1 that should use atomics very often instead of mutexes (Thread-2 can be slow and use mutexes if needed).
In Thread-1 system-usage code (inside while loop) is run very often, each iteration around 50-200 nano-seconds
. So using extra mutexes will be to heavy. But Thread-2 iterations are quite large, as you can see in each iteration of while loop when system is ready it sleeps for 10-30 milli-seconds
, so using mutexes only in Thread-2 is quite alright.
Thread-1 is example of one thread, there are several threads running same (or very similar) code as Thread-1 in my real project.
I'm concerned about memory operations ordering, meaning that it can probably happen somtimes that system is not yet in fully consistent state (not yet inited fully) when ready
becomes true
in Thread-1. Also it may happen that ready
becomes false
in Thread-1 too late, when system already made some destroying (deinit) operations. Also as you can see system can be inited/destroyed many times in a loop of Thread-2 and used many times in Thread-1 whenever it is ready
.
Can my task be solved somehow without std::mutex and other heavy stuff in Thread-1? Only using std::atomic (or std::atomic_flag). Thread-2 can use heavy synchronization stuff if needed, mutexes etc.
Basically Thread-2 should somehow propagate whole inited state of system to all cores and other threads before ready
becomes true
and also Thread-2 should propagate ready
equal to false
before any single small operation of system destruction (deinit) is done. By propagating state I mean that all system's inited data should be 100% written consistently to global memory and caches of other core, so that other threads see fully consistent system whenever ready
is true
.
It is even allowed to make small (milliseconds) pause after system init and before ready is set to true if it improves situation and guarantees. And also it is allowed to make pause after ready is set to false and before starting system destruction (deinit). Also doing some expensive CPU operations in Thread-2 is also alright if there exist some operations like "propagate all Thread-2 writes to global memory and caches to all other CPU cores and threads".
Update: As a solution for my question above right now in my project I decided to use next code with std::atomic_flag
to replace std::mutex
:
std::atomic_flag f = ATOMIC_FLAG_INIT; // shared
// .... Later in all threads ....
while (f.test_and_set(std::memory_order_acquire)) // try acquiring
std::this_thread::yield();
shared_value += 5; // Any code, it is lock-protected.
f.clear(std::memory_order_release); // release
This solution above runs 9 nanoseconds
on average (measured 2^25 operations) in single thread (release compiled) on my Windows 10 64-bit 2Ghz 2-core laptop. While using std::unique_lock<std::mutex> lock(mux);
for same protection purpose takes 100-120 nanoseconds
on same Windows PC. If it is needed for threads to spinlock instead of sleeping while waiting then instead of std::this_thread::yield();
in code above I just use semicolon ;
. Full online example of usage and time measurements.
std::unique_lock<std::mutex> lock(mux);
line works around100-120 nanoseconds
. On linux it is much faster, around20-30 nanoseconds
. And atomic<size_t>/atomic_flag are both17 nanoseconds
on both Windows and Linux. These all tested in CLang release-O3
. For me 17 nanoseconds is much more preferable than 100-120 nanoseconds. – Bo