The C++11 standard defines a memory model (1.7, 1.10) which contains memory orderings, which are, roughly, "sequentially-consistent", "acquire", "consume", "release", and "relaxed". Equally roughly, a program is correct only if it is race-free, which happens if all actions can be put in some order in which one action happens-before another one. The way that an action X happens-before an action Y is that either X is sequenced before Y (within one thread), or X inter-thread-happens-before Y. The latter condition is given, among others, when
- X synchronizes with Y, or
- X is dependency-ordered before Y.
Synchronizing-with happens when X is an atomic store with "release" ordering on some atomic variable, and Y is an atomic load with "acquire" ordering on the same variable. Being dependency-ordered-before happens for the analogous situation where Y is load with "consume" ordering (and a suitable memory access). The notion of synchronizes-with extends the happens-before relationship transitively across actions being sequenced-before one another within a thread, but being dependency-ordered-before is extended transitively only through a strict subset of sequenced-before called carries-dependency, which follows a largish set of rules, and notably can be interrupted with std::kill_dependency
.
Now then, what is the purpose of the notion of "dependency ordering"? What advantage does it provide over the simpler sequenced-before / synchronizes-with ordering? Since the rules for it are stricter, I assume that can be implemented more efficiently.
Can you give an example of a program where switching from release/acquire to release/consume is both correct and provides a non-trivial advantage? And when would std::kill_dependency
provide an improvement? High-level arguments would be nice, but bonus points for hardware-specific differences.
atomic<>
Weapons talks, and he said that he won't discuss "consume" because "nobody understands it". – Endermemory_order_release
. During pop you want to have bothtail
andtail->value
, where load oftail
carries-a-dependency-totail->value
, but you don't care about anything else - so can usememory_order_consume
instead ofmemory_order_acquire
. – Entirememory_order_acquire
- can be superfluous. – Entirememory_order
. For example, all side effects which happen before an atomic operation withmemory_order_release
must be visible to any thread which does an atomic operation on the same atomic variable withmemory_order_acquire
. This can require the CPU to do things like flushing caches, though typically CPUs designers try to find more efficient solutions than that. For large supercomputers, it can be worth paying attention to the bandwidth of the connections used in this process. – Ivettelock
prefix to support read-modify-update patterns such asatomic<int>::operator++
which involve both a read and write. Simple reads or writes are indeed atomic (under reasonable circumstances). Here is someone else's answer going deeper into that. The behavior of architectures other than x86 is, obviously, a different question entirely. – Ivettevolatile
being one such feature). This blog post is my personal favorite way of explaining just how much the compiler can do unless you explicitly use these features to limit it's optimizations. – Ivette