What are the exact inter-thread reordering constraints on mutex.lock() and .unlock() in c++11 and up?
Asked Answered
M

2

9

According to https://en.cppreference.com/w/cpp/atomic/memory_order mutex.lock() and mutex.unlock() are acquire and release operations. An acquire operation makes it impossible to reorder later instructions in front of it. And release operations make it impossible to reorder earlier instructions after it. This makes it such that the following code:

[Thread 1]
mutex1.lock();
mutex1.unlock();
mutex2.lock();
mutex2.unlock();
[Thread 2]
mutex2.lock();
mutex2.unlock();
mutex1.lock();
mutex1.unlock();

Can be reordered into the following (possibly deadlocking) code:

[Thread 1]
mutex1.lock();
mutex2.lock();
mutex1.unlock();
mutex2.unlock();
[Thread 2]
mutex2.lock();
mutex1.lock();
mutex2.unlock();
mutex1.unlock();

Is it possible for this reordering to occur. Or is there a rule preventing it?

Mysterious answered 6/4, 2021 at 14:26 Comment(9)
Does this answer your question? At what point is code reordering in C++ optimization stopped?Alti
@Alti This unfortunately does not answer the question. Assuming the c++ standard is correctly defined, the "as-if" rule would of course hold. The question is whether the implied acquire/release reordering constraints on mutex operations are sufficient for this "as-if" rule to actually hold. From what I can find it does not.Mysterious
This might be an artefact of the simplification that cppreference applies to the ordering guarantees. The Standard mandates a synchronizes-with relation from unlock->lock on the same mutex, and from that, you can derive an inter-thread-happens-before relation to other operations.Windshield
Thanks for clarifying, retracted close vote. I agree "just the acquire and release" would not be enough to guarantee as-if behaviour. But it's pretty hard to prove either way. To me that's a case of "trust the compilers' engineers to have done their job well"Alti
I think normal sequence rule apply here? (mutex1.unlock() would sequence before mutex2.lock()) isn't memory_order most useful in multithread?Garold
But, also according to cppreference.com on std::lock, std::lock operates using an (unspecified) "deadlock avoidance algorithm". So maybe if you use mutex::lock and mutex::unlock directly your feared reordering might occur, but it won't if you use the std::lock family of objects to do the locking? (I'm not a standards document maven, so I hope someone will point out where in the standard this "deadlock avoidance" property of std::lock is spelled out, thanks in advance!)Reaganreagen
If we unwind the loads and stored that these operations are doing in real life, where mutex2.lock() involves an acquire load (or rmw) in a loop until the lock is available, then we see it's okay if one or several of those loads is reordered before the release store of mutex1.unlock(), just so long as that store completes eventually. Which it should do on any usable machine. Whether the standard's language formally guarantees this is another question, but I am sure they intended this code to work and not deadlock.Janeanjaneczka
The only significant difference between acquire/release atomics and mutexes is that mutexes guarantee "The lock and unlock operations on a single mutex appears to occur in a single total order."Windshield
"Assuming the c++ standard is correctly defined" a huuuuge assumption... There is nothing well defined in C/C++.Slumlord
D
5

Almost a duplicate: How C++ Standard prevents deadlock in spinlock mutex with memory_order_acquire and memory_order_release? - that's using hand-rolled std::atomic spinlocks, but the same reasoning applies:

The compiler can't compile-time reorder mutex acquire and release in ways that could introduce a deadlock where the C++ abstract machine doesn't have one. That would violate the as-if rule.
It would effectively be introducing an infinite loop in a place the source doesn't have one, violating this rule:

ISO C++ current draft, section 6.9.2.3 Forward progress

18. An implementation should ensure that the last value (in modification order) assigned by an atomic or synchronization operation will become visible to all other threads in a finite period of time.


The ISO C++ standard doesn't distinguish compile-time vs. run-time reordering. In fact it doesn't say anything about reordering. It only says things about when you're guaranteed to see something because of synchronizes-with effects, and the existence of a modification order for each atomic object, and the total order of seq_cst operations. It's a misreading of the standard to take it as permission to nail things down into asm in a way that requires mutexes to be taken in a different order than source order.

Taking a mutex is essentially equivalent to an atomic RMW with memory_order_acquire on the mutex object. (And in fact the ISO C++ standard even groups them together in 6.9.2.3 :: 18 quoted above.)

You're allowed to see an earlier release or relaxed store or even RMW appear inside a mutex lock/unlock critical section instead of before it. But the standard requires an atomic store (or sync operation) to be visible to other threads promptly, so compile-time reordering to force it to wait until after a lock had been acquired could violate that promptness guarantee. So even a relaxed store can't compile-time / source-level reorder with a mutex.lock(), only as a run-time effect.

This same reasoning applies to mutex2.lock(). You're allowed to see reordering, but the compiler can't create a situation where the code requires that reordering to always happen, if that makes execution different from the C++ abstract machine in any important / long-term observable ways. (e.g. reordering around an unbounded wait). Creating a deadlock counts as one of those ways, whether for this reason or another. (Every sane compiler developer would agree on that, even if C++ didn't have formal language to forbid it.)

Note that mutex unlock can't block, so compile-time reordering of two unlocks isn't forbidden for that reason. (If there are no slow or potentially blocking operations in between). But mutex unlock is a "release" operation, so that's ruled out: two release stores can't reorder with each other.


And BTW, the practical mechanism for preventing compile-time reordering of mutex.lock() operations is just to make them regular function calls that the compiler doesn't know how to inline. It has to assume that functions aren't "pure", i.e. that they have side effects on global state, and thus the order might be important. That's the same mechanism that keeps operations inside the critical section: How does a mutex lock and unlock functions prevents CPU reordering?

An inlinable std::mutex written with std::atomic would end up depending on the compiler actually applying the rules about making operations visible promptly and not introducing deadlocks by reordering things at compile-time. As described in How C++ Standard prevents deadlock in spinlock mutex with memory_order_acquire and memory_order_release?

Diarmid answered 6/4, 2021 at 16:57 Comment(5)
Even if it did know how to inline them, and even if the lock was something like a spinlock that the compiler could comprehend all of (without opaque system calls), it would see that unlock() involves a loop to try the lock until it becomes available. That loop may iterate an unbounded number of times, and the promptness guarantee should forbid the compiler from reordering a store to the other side of such a loop.Janeanjaneczka
@NateEldredge: Indeed, then it would be an exact duplicate of How C++ Standard prevents deadlock in spinlock mutex with memory_order_acquire and memory_order_release?, and depends on the compiler knowing about atomics and having sane optimization rules, rather than the usual mutex mechanism of just calling an opaque function.Diarmid
"so compile-time reordering to force it to wait until after a lock had been acquired could violate that promptness guarantee." When we replace the second lock on each thread with a try_lock, we avoid the deadlock/infinite loop. That seems to imply that we can observe the try_lock being reordered into the critical section...Windshield
Thank you. The rule "An implementation should ensure that the last value (in modification order) assigned by an atomic or synchronization operation will become visible to all other threads in a finite period of time." as is described in P0062R1 should indeed prevent this perceived reordering to happen.Mysterious
@Flipvs: Ah, thanks, hadn't realized synchronization ops were included in the exact same point I was thinking of for atomics. Updated to link + quote it.Diarmid
P
0

An acquire operation makes it impossible to reorder later instructions in front of it. And release operations make it impossible to reorder earlier instructions after it.

Locking a mutex is not a memory read, not a memory write, and not an instruction. It's an algorithm that has numerous internal ordering requirements. Operations that have ordering requirements themselves use some mechanism to ensure the requirements of that operation are followed regardless of what reordering is allowed by other operations that occur before or after it.

When considering whether two operations can be re-ordered, you have to obey the ordering constraints of both of the two operations. Mutex lock and unlock operations contain numerous internal operations that have their own ordering constraints. You can't just move the block of operations and assume you aren't violating the internal constraints of those operations.

If the mutex lock and unlock implementations on your platform don't have sufficient ordering constraints to ensure they work correctly when used as intended, those implementations are broken.

Pumice answered 6/4, 2021 at 16:45 Comment(3)
Right, but in the case of a release store followed (in program order) by an acquire load, neither have ordering constraints that would prevent swapping their order. But it seems to me the issue here is that in order to have a deadlock, we actually have to reorder infinitely many acquire loads before the release store, so that the release store is delayed indefinitely, and that's not supposed to happen.Janeanjaneczka
@NateEldredge You're not just reordering loads and stores through, you're reordering an entire algorithm that necessarily has sufficient ordering constraints internally in the operations it itself uses to ensure that it works for its intended purpose (otherwise it's broken).Pumice
I think Peter's answer says better the point I was trying to make. It doesn't really matter how the various internal steps of mutex1.unlock() and mutex2.lock() are reordered with one another, so long as mutex1.unlock() completes "promptly", and that's a separate guarantee.Janeanjaneczka

© 2022 - 2024 — McMap. All rights reserved.