Do the release-acquire visibility guarantees of std::mutex apply to only the critical section?

Asked 20/9, 2019 at 3:13 Answered 21/11, 2019 at 17:40

Solved c++multithreading thread-safety mutex memory-model

I'm trying to understand these sections under the heading Release-Acquire ordering https://en.cppreference.com/w/cpp/atomic/memory_order

They say regarding atomic load and stores:

If an atomic store in thread A is tagged memory_order_release and an atomic load in thread B from the same variable is tagged memory_order_acquire, all memory writes (non-atomic and relaxed atomic) that happened-before the atomic store from the point of view of thread A, become visible side-effects in thread B. That is, once the atomic load is completed, thread B is guaranteed to see everything thread A wrote to memory.

Then regarding mutexes:

Mutual exclusion locks, such as std::mutex or atomic spinlock, are an example of release-acquire synchronization: when the lock is released by thread A and acquired by thread B, everything that took place in the critical section (before the release) in the context of thread A has to be visible to thread B (after the acquire) which is executing the same critical section.

The first paragraph seems to say that an atomic load and store (with memory_order_release, memory_order_acquire) thread B is guaranteed to see everything thread A wrote. including non-atomic writes.

The second paragraph seems to suggest that a mutex works the same way, except the scope of what is visible to B is limited to whatever was wrapped in the critical section, is that an accurate interpretation? or would every write, even those before the critical section be visible to B?

Thereabouts answered 20/9, 2019 at 3:13 Comment(3)

Congratulations, you have found your way to the darkest corners of c++11 ! I recommend reading kernel.org/doc/Documentation/memory-barriers.txt (didn't finish it myself though) – Gnash 20/9, 2019 at 4:54

While I am curious about how this is handled at the OS and CPU level, I think the whole point of the C++ memory model is that we shouldn't have to understand those underlying implementations in order to write software that is correct. Understanding those details should only really be necessary when implementing optimizations. I'm trying to get a better grasp of this at the C++ level before I dig any deeper. – Thereabouts 20/9, 2019 at 21:57

@Thereabouts Not only that, but an advance compiler could compile a MT program in a much more subtle way than just emitting fences while avoiding the obviously redundant ones as current compilers do. – Pyoid 26/11, 2019 at 22:3

I think the reason the cppreference quote about mutexes is written that way is due to the fact that if you're using mutexes for synchronization, all shared variables used for communication should always be accessed inside the critical section.

The 2017 standard says in 4.7.1:

a call that acquires a mutex will perform an acquire operation on the locations comprising the mutex. Correspondingly, a call that releases the same mutex will perform a release operation on those same locations. Informally, performing a release operation on A forces prior side effects on other memory locations to become visible to other threads that later perform a consume or an acquire operation on A.

So everything before the unlock in the previous lock-holding thread happens-before everything after the lock in the next thread to take the lock.

This chains across threads, with each one taking the lock making previous lock-holder's operations visible to later lock-takers as well as its own.

Update: I want to make sure I have a solid post because it is surprisingly hard to find this information on the web. Thanks to @Davis Herring for pointing me in the right direction.

The standard says

in 33.4.3.2.11 and 33.4.3.2.25:

mutex unlock synchronizes with subsequent lock operations that obtain ownership on the same object

(https://en.cppreference.com/w/cpp/thread/mutex/lock, https://en.cppreference.com/w/cpp/thread/mutex/unlock)

in 4.6.16:

Every value computation and side effect associated with a full-expression is sequenced before every value computation and side effect associated with the next full-expression to be evaluated.

https://en.cppreference.com/w/cpp/language/eval_order

in 4.7.1.9:

An evaluation A inter-thread happens before evaluation B if

4.7.1.9.1) -- A synchronizes-with B, or

4.7.1.9.2) -- A is dependency-ordered before B, or

4.7.1.9.3) -- for some evaluation X

4.7.1.9.3.1) ------ A synchronizes with X and X is sequenced before B, or

4.7.1.9.3.2) ------ A is sequenced before X and X inter-thread happens before B, or

4.7.1.9.3.3) ------ A inter-thread happens before X and X inter-thread happens before B.

https://en.cppreference.com/w/cpp/atomic/memory_order

So a mutex unlock B inter-thread happens before a subsequent lock C by 4.7.1.9.1.
Any evaluation A that happens in program order before the mutex unlock B also inter-thread happens before C by 4.7.1.9.3.2
Therefore after an unlock() guarantees that all previous writes, even those outside the critical section, must be visible to a matching lock().

This conclusion is consistent with the way mutexes are implemented today (and were in the past) in that all program-order previous loads and stores are completed before unlocking. (More accurately, the stores have to be visible before the unlock is visible when observed by a matching lock operation in any thread.) There's no question that this is the accepted definition of release in theory and in practice. (For example https://preshing.com/20120913/acquire-and-release-semantics/). In fact that's why acquire and release have those names when generalized to lock-free atomics, from their origins in creating locks.

Mccarron answered 21/9, 2019 at 1:48 Comment(14)

The informal description is just a summary of the actual rules, which are binding. – Cynthea 21/9, 2019 at 6:20

Hmm. I think the sequenced before relation just clicked for me. – Mccarron 21/9, 2019 at 7:10

Gotcha, I guess if you’re using a mutex it’s assumed that you’re worried about more than just visibility (you’re worried about concurrent reads/writes). Otherwise an atomic would be a better fit, even though they can both cause a synchronizes-with, happens-before relation. – Thereabouts 21/9, 2019 at 15:33

Actually looks like it can be proven that all previous writes (even outside the critical section) must be made visible. Also, I would generally prefer mutexes over atomics. IMO this stuff is easier to learn by looking at the implementation (hardware) first, since the C++ model is pretty thin on details and it is designed with existing hardware technology in mind anyway. – Mccarron 22/9, 2019 at 9:45

Just watched a video by Herb Stutter where he clarifies this too youtu.be/A8eCGOqgvH4?t=2286 Might be more context starting from the beginning of that slide: youtu.be/A8eCGOqgvH4?t=2286 – Thereabouts 23/9, 2019 at 23:29

It's not clear from his talk how memory visibility is guaranteed by the standard, but in general, Sutter does a good job of explaining the concept of acquire/release (one way barriers). – Mccarron 23/9, 2019 at 23:54

What do you mean with "shared variables used for communication"? What defines that set of variables? – Pyoid 21/11, 2019 at 17:11

@HumphreyWinnebago It isn't clear that the std defines "visibility" either, or anything. I think ppl should follow patterns and not try to understand the semantics (which are probably unsound). – Pyoid 21/11, 2019 at 17:15

@HumphreyWinnebago "shared variables used for communication" Almost no HW maps to C/C++ MT primitives. What is a release operation? Few CPU have such thing. – Pyoid 21/11, 2019 at 17:16

@Pyoid When using mutexes, state shared between threads should be only be read or written while holding the given state's associated mutex. The "state" might be a number of variables that must updated together to remain valid (atomic transitions). This is the basic rule of locks--to work, every thread must use them correctly. – Mccarron 21/11, 2019 at 22:17

@Pyoid "Used for communication": in the sense that the information is disseminated and used by other threads. "Visibility": a modification done by one CPU (or thread) can be observed/detected by another. "Release": typically refers to guarantees of the order of visibility with regard to modifications on two or more objects. This maps to memory fences (commonly over-restrictively). – Mccarron 21/11, 2019 at 22:17

@HumphreyWinnebago "state shared between threads should be" That presupposes that these objects exist, which is only true in the history where objects were created and not yet destructed. The mutex protects access to something but it isn't sufficient. – Pyoid 21/11, 2019 at 22:59

@Pyoid I'm afraid I don't understand what you're getting at. Shared-memory multithreading is the paradigm we're discussing. If objects don't exist, there is nothing to share. That's still a valid state, though. Valid initialization is implied as well as valid destruction. There's always a syncing acquire/release when threads launch/join. – Mccarron 22/11, 2019 at 9:26

Let us continue this discussion in chat. – Pyoid 22/11, 2019 at 15:10

There’s no magic here: the mutex section is merely describing the common case, where (because every visit to the critical section might write the shared data) the writer in question protects all its access with the mutex. (Other, earlier writes are visible and might be relevant: consider creating and initializing an object without synchronization and then storing its address in a shared variable in the critical section.)

Cynthea answered 20/9, 2019 at 23:41 Comment(0)

The first paragraph seems to say that an atomic load and store (with memory_order_release, memory_order_acquire) thread B is guaranteed to see everything thread A wrote. including non-atomic writes.

Not just writes, all memory operations are done; you can see that reads are accomplished too: although of course a read doesn't produce a side effect, you can see that reads before the release never see a value written after the acquire.

All of https://en.cppreference.com/ insists on writes (easy to explain) and completely ignore the issue of reads being accomplished.

The second paragraph seems to suggest that a mutex works the same way, except the scope of what is visible to B is limited to whatever was wrapped in the critical section, is that an accurate interpretation? or would every write, even those before the critical section be visible to B?

But "in the critical section" isn't even a thing. Nothing you do can be separated from the memory state in which it's done. When you set an integer object "in the critical section", the object has to exist; it doesn't make sense to take "write to an object" is isolation as there would be no object to talk about. Interpreted strictly, "the critical section" would cover only object created inside it. But then none of these objects would be known by other threads so there would be nothing to protect.

So the result of "critical section" is by essence the whole history of the program, with some accesses to shared objects starting only after the mutex lock.

Pyoid answered 21/11, 2019 at 17:40 Comment(0)

Recommended topics

Hot tags