std::atomic
has methods is_lock_free
(non-static) and is_always_lock_free
(static).
When is_lock_free
returns true, it means atomic does not have locks, and expected to perform better than the equivalent code with locks. When is_lock_free
returns false, it means that atomic has a lock, and equivalent performance with code with locks.
This does not mean that you should always use atomic instead of mutex-based approach, conversely, if you expect is_lock_free
to be always false, you should not use atomic:
- Use of atomic for such cases would be misleading first of all.
- Also the lock inside lock-based atomic may be suboptimal (it may be shared with another atomic due to sharing hash table entry (libc++) or may nor use OS wait (MSVC STL)).
- And atomic underlying type has some restrictions that you don't need when you use a mutex on your own.
CPUs that support multithreading provide atomic instructions of CPU register size. Often they provide atomic instructions of double CPU register size.
Currently most programs are run in 64-bit mode on a 64-bit CPU, but some programs still run in a 32-bit mode or on 32-bit CPU.
STL implementation can do an smaller-size atomic type using a larger sized CPU instructions, if there are no exact-sized instructions.
There are restricted set of operations that can be done on arbitrary type as underlying atomic type, there are more ops for integers (and floats).
Together it means that a type of 64-bit size or less is likely to be a good candidate for atomic, provided that it is either a native integer/floating type, or a small structure that is updated as a whole with a new version.
You should be looking into a bigger picture.
Acquiring a mutex in ideal (and normally frequent) scenario is nothing more than an atomic operation, ditto releasing a mutex.
On the other hand, atomic operations are often an order of magnitude slower than non-atomic equivalent. E.g. i++
will be way slower for atomic<int>
than for int
.
So, despite updating one counter using atomic is likely to be good, doing something more using atomics, like updating five counters, may be more complex and slower than using a single mutex to protect the whole operation.
x
is an instance variable. You can get fine-grained locking by making the mutex a class-member instead of having one big lock for all threads modifying all instances of class A. (That of course increases the size of each A object.) – Wileystd::shared_mutex
or equivalent in that case. That way, multiple threads can read at the same time, but any thread that wants to write must get exclusive access. – Mineralizeshared_mutex
or any other readers/writers lock still needs to do an atomic RMW on the cache line holding the lock. (And probably also to unlock, vs. just a release store which is sufficient for some locks. Although maybe just for spinlocks; a normal mutex with fallback tofutex
or other OS-assisted sleep/wake may need to exchange to unlock to avoid racing with threads putting themselves to sleep). Anyway,atomic<int>
can be read in parallel by any number of cores at once, vs.shared_mutex
bouncing a cache line around, still serializing (just avoiding wasted attempts). – Wileyatomic<>
where the readers are truly read-only so all cache lines they touch can stay hot in MESI Shared state. (Except a SeqLock, because that also make the readers truly read-only, but you'd have to roll your own with std::atomic so it only makes sense for rarely-modified objects a bit too large to be lock-free themselves, like a 64-bit counter on some 32-bit systems which can't do 64-bit atomic load / store. Implementing 64 bit atomic counter with 32 bit atomics) – Wiley