Almost a duplicate: How C++ Standard prevents deadlock in spinlock mutex with memory_order_acquire and memory_order_release? - that's using hand-rolled std::atomic
spinlocks, but the same reasoning applies:
The compiler can't compile-time reorder mutex acquire and release in ways that could introduce a deadlock where the C++ abstract machine doesn't have one. That would violate the as-if rule.
It would effectively be introducing an infinite loop in a place the source doesn't have one, violating this rule:
ISO C++ current draft, section 6.9.2.3 Forward progress
18. An implementation should ensure that the last value (in modification order) assigned by an atomic or synchronization operation will become visible to all other threads in a finite period of time.
The ISO C++ standard doesn't distinguish compile-time vs. run-time reordering. In fact it doesn't say anything about reordering. It only says things about when you're guaranteed to see something because of synchronizes-with effects, and the existence of a modification order for each atomic object, and the total order of seq_cst operations. It's a misreading of the standard to take it as permission to nail things down into asm in a way that requires mutexes to be taken in a different order than source order.
Taking a mutex is essentially equivalent to an atomic RMW with memory_order_acquire
on the mutex object. (And in fact the ISO C++ standard even groups them together in 6.9.2.3 :: 18 quoted above.)
You're allowed to see an earlier release or relaxed store or even RMW appear inside a mutex lock/unlock critical section instead of before it. But the standard requires an atomic store (or sync operation) to be visible to other threads promptly, so compile-time reordering to force it to wait until after a lock had been acquired could violate that promptness guarantee. So even a relaxed store can't compile-time / source-level reorder with a mutex.lock()
, only as a run-time effect.
This same reasoning applies to mutex2.lock()
. You're allowed to see reordering, but the compiler can't create a situation where the code requires that reordering to always happen, if that makes execution different from the C++ abstract machine in any important / long-term observable ways. (e.g. reordering around an unbounded wait). Creating a deadlock counts as one of those ways, whether for this reason or another. (Every sane compiler developer would agree on that, even if C++ didn't have formal language to forbid it.)
Note that mutex unlock can't block, so compile-time reordering of two unlocks isn't forbidden for that reason. (If there are no slow or potentially blocking operations in between). But mutex unlock is a "release" operation, so that's ruled out: two release stores can't reorder with each other.
And BTW, the practical mechanism for preventing compile-time reordering of mutex.lock() operations is just to make them regular function calls that the compiler doesn't know how to inline. It has to assume that functions aren't "pure", i.e. that they have side effects on global state, and thus the order might be important. That's the same mechanism that keeps operations inside the critical section: How does a mutex lock and unlock functions prevents CPU reordering?
An inlinable std::mutex written with std::atomic would end up depending on the compiler actually applying the rules about making operations visible promptly and not introducing deadlocks by reordering things at compile-time. As described in How C++ Standard prevents deadlock in spinlock mutex with memory_order_acquire and memory_order_release?
mutex1.unlock()
would sequence beforemutex2.lock()
) isn'tmemory_order
most useful in multithread? – Garoldstd::lock
,std::lock
operates using an (unspecified) "deadlock avoidance algorithm". So maybe if you usemutex::lock
andmutex::unlock
directly your feared reordering might occur, but it won't if you use thestd::lock
family of objects to do the locking? (I'm not a standards document maven, so I hope someone will point out where in the standard this "deadlock avoidance" property ofstd::lock
is spelled out, thanks in advance!) – Reaganreagenmutex2.lock()
involves an acquire load (or rmw) in a loop until the lock is available, then we see it's okay if one or several of those loads is reordered before the release store ofmutex1.unlock()
, just so long as that store completes eventually. Which it should do on any usable machine. Whether the standard's language formally guarantees this is another question, but I am sure they intended this code to work and not deadlock. – Janeanjaneczka