When should you use std::atomic instead of std::mutex?

Asked 21/9, 2016 at 12:56 Answered 13/10, 2023 at 8:45

Solved c++multithreading c++11 mutex stdatomic

In the question How to use std::atomic<>, obviously we can just use std::mutex to keep thread safety. I want to know when to use which one.

struct A {
    std::atomic<int> x{0};
    void Add() {
        x++;
    }
    void Sub() {
        x--;
    }
};

vs.

std::mutex mtx;
struct A {
    int x = 0;
    void Add() {
        std::lock_guard<std::mutex> guard(mtx);
        x++;
    }
    void Sub() {
        std::lock_guard<std::mutex> guard(mtx);
        x--;
    }     
};

Wooden answered 21/9, 2016 at 12:56 Comment(6)

x is an instance variable. You can get fine-grained locking by making the mutex a class-member instead of having one big lock for all threads modifying all instances of class A. (That of course increases the size of each A object.) – Wiley 21/9, 2016 at 19:36

Don't forget that even a read-only accessor function also needs to take the lock, at least in theory to avoid C++ UB. (This is a huge advantage for std::atomic: read-only access is much cheaper). – Wiley 21/9, 2016 at 19:36

@PeterCordes You could use both: a mutex for accessing all components of an object in a well defined state and atomic subparts for each property of the object whose value make sense alone, so accessing a single component doesn't go through the mutex (but updates and accessing all parts do). – Piranesi 2/11, 2019 at 1:34

@PeterCordes you can use std::shared_mutex or equivalent in that case. That way, multiple threads can read at the same time, but any thread that wants to write must get exclusive access. – Mineralize 12/10, 2023 at 21:49

@RemyLebeau: A shared_mutex or any other readers/writers lock still needs to do an atomic RMW on the cache line holding the lock. (And probably also to unlock, vs. just a release store which is sufficient for some locks. Although maybe just for spinlocks; a normal mutex with fallback to futex or other OS-assisted sleep/wake may need to exchange to unlock to avoid racing with threads putting themselves to sleep). Anyway, atomic<int> can be read in parallel by any number of cores at once, vs. shared_mutex bouncing a cache line around, still serializing (just avoiding wasted attempts). – Wiley 12/10, 2023 at 21:59

@RemyLebeau: No locking scheme can match the read-side scaling of a lock-free atomic<> where the readers are truly read-only so all cache lines they touch can stay hot in MESI Shared state. (Except a SeqLock, because that also make the readers truly read-only, but you'd have to roll your own with std::atomic so it only makes sense for rarely-modified objects a bit too large to be lock-free themselves, like a 64-bit counter on some 32-bit systems which can't do 64-bit atomic load / store. Implementing 64 bit atomic counter with 32 bit atomics) – Wiley 12/10, 2023 at 22:4

As a rule of thumb, use std::atomic for POD types where the underlying specialisation will be able to use something clever like a bus lock on the CPU (which will give you no more overhead than a pipeline dump), or even a spin lock. On some systems, an int might already be atomic, so std::atomic<int> will specialise out effectively to an int.

Use std::mutex for non-POD types, bearing in mind that acquiring a mutex is at least an order of magnitude slower than a bus lock.

If you're still unsure, measure the performance.

Jennifferjennilee answered 21/9, 2016 at 12:59 Comment(4)

int loads and int stores are usually atomic (e.g. they are on x86), but my_int++ is never atomic on multi-core systems. I'd agree with your overall point that std::atomic primitive types are probably useful, and anything else is likely to just do less efficient locking behind the scenes. – Wiley 21/9, 2016 at 19:29

std::atomic<small_struct> may be useful for objects that fit in 16 bytes, but only if you know exactly what you're doing, and are targeting a platform that you know has something like x86-64 lock cmpxchg16b, and you build with -mcx16 (since cmpxchg16b is an extension, unfortunately, not part of baseline x86-64 because it was missing from the first gen AMD64 CPUs.) See my answer here about compare-and-swap on an object the size of two pointers. – Wiley 21/9, 2016 at 19:31

Just to be clear, even if int is narrow enough that the compiler doesn't have to do any extra work to get atomicity for atomic<int>, you still need atomic<int> for thread-safety. My previous comment may have given a false impression. You can use std::atomic<int> with std::memory_order_relaxed if you don't want any extra ordering, just forcing access to cache-coherent memory (rather than holding a variable's value in a register): see MCU programming - C++ O2 optimization breaks while loop – Wiley 14/6, 2022 at 8:59

A bus lock would be very expensive, blocking memory access from all cores even to unrelated cache lines. But you only get that from misaligned atomic RMWs on x86. Compilers don't do that, they use alignas(sizeof(T)) for atomic<T>, so CPUs can just use a cache lock. And so pure-load and pure-store can be atomic as well, not just atomic RMWs. A cache lock doesn't even block out-of-order exec of ALU instructions on that core, although on x86 it is a full barrier, blocking later loads until the store buffer is drained. – Wiley 14/6, 2022 at 9:3

std::atomic has methods is_lock_free (non-static) and is_always_lock_free (static).

When is_lock_free returns true, it means atomic does not have locks, and expected to perform better than the equivalent code with locks. When is_lock_free returns false, it means that atomic has a lock, and equivalent performance with code with locks.

This does not mean that you should always use atomic instead of mutex-based approach, conversely, if you expect is_lock_free to be always false, you should not use atomic:

Use of atomic for such cases would be misleading first of all.
Also the lock inside lock-based atomic may be suboptimal (it may be shared with another atomic due to sharing hash table entry (libc++) or may nor use OS wait (MSVC STL)).
And atomic underlying type has some restrictions that you don't need when you use a mutex on your own.

CPUs that support multithreading provide atomic instructions of CPU register size. Often they provide atomic instructions of double CPU register size.

Currently most programs are run in 64-bit mode on a 64-bit CPU, but some programs still run in a 32-bit mode or on 32-bit CPU.

STL implementation can do an smaller-size atomic type using a larger sized CPU instructions, if there are no exact-sized instructions.

There are restricted set of operations that can be done on arbitrary type as underlying atomic type, there are more ops for integers (and floats).

Together it means that a type of 64-bit size or less is likely to be a good candidate for atomic, provided that it is either a native integer/floating type, or a small structure that is updated as a whole with a new version.

You should be looking into a bigger picture.

Acquiring a mutex in ideal (and normally frequent) scenario is nothing more than an atomic operation, ditto releasing a mutex.

On the other hand, atomic operations are often an order of magnitude slower than non-atomic equivalent. E.g. i++ will be way slower for atomic<int> than for int.

So, despite updating one counter using atomic is likely to be good, doing something more using atomics, like updating five counters, may be more complex and slower than using a single mutex to protect the whole operation.

Aerospace answered 13/10, 2023 at 8:45 Comment(0)

Recommended topics

Hot tags