[[carries_dependency]] what it means and how to implement
Asked Answered
S

1

4

I was reading about [[carries_dependency]] in this SO post.

But what I could not understand is the below sentences in the accepted answer :

"In particular, if a value read with memory_order_consume is passed in to a function, then without [[carries_dependency]], then the compiler may have to issue a memory fence instruction to guarantee that the appropriate memory ordering semantics are upheld. If the parameter is annotated with [[carries_dependency]] then the compiler can assume that the function body will correctly carry the dependency, and this fence may no longer be necessary.

Similarly, if a function returns a value loaded with memory_order_consume, or derived from such a value, then without [[carries_dependency]] the compiler may be required to insert a fence instruction to guarantee that the appropriate memory ordering semantics are upheld. With the [[carries_dependency]] annotation, this fence may no longer be necessary, as the caller is now responsible for maintaining the dependency tree."

Lets take it step by step:

"if a value read with memory_order_consume is passed in to a function, then without [[carries_dependency]], then the compiler may have to issue a memory fence instruction to guarantee that the appropriate memory ordering semantics are upheld."

So for an atomic variable in release-consume memory model when atomic variable is being passed as a parameter to the function the compiler will introduce a fence hardware instruction so that it always has the latest and updated value of the atomic variable provided to the function.

Next -

"If the parameter is annotated with [[carries_dependency]] then the compiler can assume that the function body will correctly carry the dependency, and this fence may no longer be necessary."

This is confusing me - the atomic variable value is already consumed and then what dependency the function is carried?

Similarly -

"if a function returns a value loaded with memory_order_consume, or derived from such a value, then without [[carries_dependency]] the compiler may be required to insert a fence instruction to guarantee that the appropriate memory ordering semantics are upheld. With the [[carries_dependency]] annotation, this fence may no longer be necessary, as the caller is now responsible for maintaining the dependency tree."

From the example its not clear what the point it is trying to state about carrying the dependency?

Squander answered 29/9, 2020 at 5:9 Comment(1)
Is Q C++11 specific or >11?Harv
E
7

Just FYI, memory_order_consume (and [[carries_dependency]]) is essentially deprecated because it's too hard for compilers to efficiently and correctly implement the rules the way C++11 designed them. (And/or because [[carries_dependency]] and/or kill_dependency would end up being needed all over the place.) See P0371R1: Temporarily discourage memory_order_consume.

Current compilers simply treat mo_consume as mo_acquire (and thus on ISAs that need one, put a barrier right after the consume load). If you want the performance of data dependency ordering without barriers, you have to trick the compiler by using mo_relaxed and code carefully to avoid things that would make it likely for the compiler to create asm without an actual dependency. (e.g. Linux RCU). See C++11: the difference between memory_order_relaxed and memory_order_consume for more details and links about that, and the asm feature that mo_consume was designed to expose.

Also Memory order consume usage in C11.
Understanding the concept of dependency ordering (in asm) is basically essential to understanding how this C++ feature is designed.

When [an] atomic variable is being passed as a parameter to the function the compiler will introduce a fence hardware instruction ...

You don't "pass an atomic variable" to a function in the first place; what would that even mean? If you were passing a pointer or reference to an atomic object, the function would be doing its own load from it, and the source code for that function would use memory_order_consume or not.

The relevant thing is passing a value loaded from an atomic variable with mo_consume. Like this:

    int tmp = shared_var.load(std::memory_order_consume);
    func(tmp);

func may use that arg as an index into an array of atomic<int> to do an mo_relaxed load. For that load to be dependency-ordered after the shared_var.load even without a memory barrier, code-gen for func has to make sure that load has an asm data dependency on the arg, even if the C++ code does something like tmp -= tmp; that compilers would normally just treat the same as tmp = 0; (killing the previous value).

But [[carries_dependency]] would make the compiler still reference that zeroed value with a data dependency in implementing something like array[idx+tmp].

the atomic variable value is already consumed and then what dependency the function is carried?

"Already consumed" is not a valid concept. The whole point of consume instead of acquire is that later loads are ordered correctly because they have a data dependency on the mo_consume load result, letting you avoid barriers. Every later load needs such a dependency if you want it ordered after the original load; there is no sense in which you can say a value is "already consumed".

If you do end up inserting a barrier to promote consume to acquire because of a missing carries_dependency on one function, later functions wouldn't need another barrier because you could say the value was "already acquired". (Although that's not standard terminology. You'd instead say code after the first barrier was ordered after the load.)


It might be useful to understand how the Linux kernel handles this, with their hand-rolled atomics and limited set of compilers they support. Search for "dependency" in https://github.com/torvalds/linux/blob/master/Documentation/memory-barriers.txt, and note the difference between a "control dependency" like if(flag) data.load() vs. a data dependency like data[idx].load.

IIRC, even C++ doesn't guarantee mo_consume dependency ordering when the dependency is a conditional like if(x.load(consume)) tmp=y.load();.

Note that compilers will sometimes turn a data dependency into a control dependency if there's only 2 possible values for example. This would break mo_consume, and be an optimization that wouldn't be allowed if the value came from a mo_consume load or a [[carries_dependency]] function arg. This is part of why it's hard to implement; it would require teaching lots of optimization passes about data dependency ordering instead of just expecting users to write code that doesn't do things which will normally optimize away. (Like tmp -= tmp;)

Expropriate answered 29/9, 2020 at 6:38 Comment(6)
This is discussing case 1 func(int arg) vs case 2 func(int arg [[carries-dependency]] ). Let's assume that this func is defined in func.cpp and already complied, so its definition is not seen by the user. Do you mean that when the complier is compiling the user code (the two-line snippet you provided), it will put a barrier for case 1 and not put a barrier for case 2? In case 2, there is no barrier, how the memory is synced? Does it mean when the compiler compiles func.obj, it aslo needs to do something special?Norbert
@doraemon: Keep in mind that old consume + [[carries_dependency]] semantics are deprecated, compilers treat consume as acquire. But in that original design (which proved too complicated to implement correctly and efficiently), that's correct, the caller doing func(tmp) will include a barrier if func wasn't declared with carries-dependency, but tmp was the result of a consume load. If it was weaker, no barrier at all, if the load was stronger then the load would already have had sufficient ordering.Expropriate
@doraemon: And yes, when compiling func(int arg [[carries-dependency]]), the compiler would have to respect the fact that arg is or could be the result of a consume load, so things like var[arg-arg] can't be optimized to var[0] when targeting ISAs with weakly-ordered memory models. e.g. ARM would need something like sub r1, r0, r0 / ldr r3, [r2, r1] where r2 holds a pointer. (Compilers for x86 could still optimize away the data dependency because every load is already as strong as acquire.)Expropriate
Thanks. For the second comment you made, (1) arg-arg looks to me should still always be zero. Since the two arg are not from two different loads. If this is not true, it looks like arg is some thing volatile and needs to be reloaded every time it is used. Does [[carries-dependency]] implies volatile-like behavior? (2) does it mean compiler should generate a barrier inside func?Norbert
@doraemon: Exactly, arg-arg is always going to be zero, but in a register that has a data dependency on the consume load. Unlike if the assembly just used a constant zero, removing the data dependency. That's the entire point of the consume and carries_dependency. The data dependency prevents out-of-order exec from starting var[arg-arg] load until after arg is ready, ordering that one load after the consume load without having to order all later loads. (Using a barrier and forgetting about data dependencies is a correct but slower alternative for a compiler).Expropriate
arg doesn't have to be reloaded from memory, that wouldn't help (or hurt). In architectures with weak memory models but which do guarantee dependency ordering, registers carry dependencies. Unlike in x86, where xor eax,eax is handled efficiently as a zeroing idiom without a false dependency on the old value (because historically people used that as a code-size optimization in x86's variable-length machine code), an ARM or MIPS CPU for example isn't allowed to break dependencies on subtracting or XORing a register with itself.Expropriate

© 2022 - 2024 — McMap. All rights reserved.