What does "strongly happens before" mean?
Asked Answered
B

1

17

The phrase "strongly happens before" is used several times in the C++ draft standard.

For example: Termination [basic.start.term]/5

If the completion of the initialization of an object with static storage duration strongly happens before a call to std​::​atexit (see , [support.start.term]), the call to the function passed to std​::​atexit is sequenced before the call to the destructor for the object. If a call to std​::​atexit strongly happens before the completion of the initialization of an object with static storage duration, the call to the destructor for the object is sequenced before the call to the function passed to std​::​atexit. If a call to std​::​atexit strongly happens before another call to std​::​atexit, the call to the function passed to the second std​::​atexit call is sequenced before the call to the function passed to the first std​::​atexit call.

And defined in Data races [intro.races]/12

An evaluation A strongly happens before an evaluation D if, either

(12.1) A is sequenced before D, or

(12.2) A synchronizes with D, and both A and D are sequentially consistent atomic operations ([atomics.order]), or

(12.3) there are evaluations B and C such that A is sequenced before B, B simply happens before C, and C is sequenced before D, or

(12.4) there is an evaluation B such that A strongly happens before B, and B strongly happens before D.

[ Note: Informally, if A strongly happens before B, then A appears to be evaluated before B in all contexts. Strongly happens before excludes consume operations. — end note ]

Why was "strongly happens before" introduced? Intuitively, what's its difference and relation with "happens before"?

What does the "A appears to be evaluated before B in all contexts" in the note mean?

(Note: the motivation for this question are Peter Cordes's comments under this answer.)

Additional draft standard quote (thanks to Peter Cordes)

Order and consistency [atomics.order]/4

There is a single total order S on all memory_­order​::​seq_­cst operations, including fences, that satisfies the following constraints. First, if A and B are memory_­order​::​seq_­cst operations and A strongly happens before B, then A precedes B in S. Second, for every pair of atomic operations A and B on an object M, where A is coherence-ordered before B, the following four conditions are required to be satisfied by S:

(4.1) if A and B are both memory_­order​::​seq_­cst operations, then A precedes B in S; and

(4.2) if A is a memory_­order​::​seq_­cst operation and B happens before a memory_­order​::​seq_­cst fence Y, then A precedes Y in S; and

(4.3) if a memory_­order​::​seq_­cst fence X happens before A and B is a memory_­order​::​seq_­cst operation, then X precedes B in S; and

(4.4) if a memory_­order​::​seq_­cst fence X happens before A and B happens before a memory_­order​::​seq_­cst fence Y, then X precedes Y in S.

Bonefish answered 22/11, 2019 at 1:23 Comment(8)
The current draft standard also references "A strongly happens before B" as a condition for a rule applying for seq_cst, in Atomics 31.4 Order and consistency: 4. That's not in the C++17 n4659 standard, where 32.4 - 3 defines the existence of a single total order of seq_cst ops consistent with the “happens before” order and modification orders for all affected locations; the "strongly" was added in a later draft.Hubsher
@PeterCordes I think the comment excluding consume, stating it’s HB “in all context”/“strong” and talking about calls to function pointers is something of a dead giveaway. If a multithreaded program calls atexit() in one thread and exit() in another, it’s not enough for initializers to carry only a consume-based dependency only because the results then differ from if exit() was invoked by the same thread. An older answer of mine concerned this difference.Overthrow
open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0668r5.htmlIconology
@IwillnotexistIdonotexist Can you even exit a MT program? Is it not fundamentally a broken idea?Bonefish
@Bonefish That’s the purpose of exit(). Any thread can kill the entire program by exiting, or the main thread can exit by return-ing. It results in the calling of atexit() handlers and the death of all threads whatever they were doing.Overthrow
@IwillnotexistIdonotexist I don't see how that possibly could be done in any way that doesn't randomly break!Bonefish
The Linux manpage for exit(3). The C library does some work like flushing buffers and executing atexit handlers. These handlers have to be carefully coded to avoid breaks, but they can accomplish useful things like deleting temporaries and logging stuff. Then, when a process has truly exited, the Linux kernel cleans up its threads and open file descriptors.Overthrow
@IwillnotexistIdonotexist The problem is concurrently running threads as f.ex. if you have a list of temp files to clean up, they might be adding temp files concurrently. The clean way would be stop threads first. This is getting of topic but I will probably ask another Q about that soon.Bonefish
W
14

Why was "strongly happens before" introduced? Intuitively, what's its difference and relation with "happens before"?

Brace yourself for "simply happens-before" as well! Take a look at this current snapshot of cppref https://en.cppreference.com/w/cpp/atomic/memory_order

enter image description here

It seems "simply happens-before" is added in C++20.

Simply happens-before

Regardless of threads, evaluation A simply happens-before evaluation B if any of the following is true:

  1. A is sequenced-before B

  2. A synchronizes-with B

  3. A simply happens-before X, and X simply happens-before B

Note: without consume operations, simply happens-before and happens-before relations are the same.

So Simply-HB and HB are the same except for how they handle consume operations. See HB

Happens-before

Regardless of threads, evaluation A happens-before evaluation B if any of the following is true:

  1. A is sequenced-before B

  2. A inter-thread happens before B

The implementation is required to ensure that the happens-before relation is acyclic, by introducing additional synchronization if necessary (it can only be necessary if a consume operation is involved, see Batty et al)

How do they differ with regard to consume? See Inter-Thread-HB

Inter-thread happens-before

Between threads, evaluation A inter-thread happens before evaluation B if any of the following is true

  1. A synchronizes-with B

  2. A is dependency-ordered before B

  3. ...

...

An operation that is dependency ordered (i.e. uses release/consume) is HB but not necessarily Simply-HB.

Consume is more relaxed than acquire, so if I understand correctly, HB is more relaxed than Simply-HB.

Strongly happens-before

Regardless of threads, evaluation A strongly happens-before evaluation B if any of the following is true:

  1. A is sequenced-before B

  2. A synchronizes with B, and both A and B are sequentially consistent atomic operations

  3. A is sequenced-before X, X simply happens-before Y, and Y is sequenced-before B

  4. A strongly happens-before X, and X strongly happens-before B

Note: informally, if A strongly happens-before B, then A appears to be evaluated before B in all contexts.

Note: strongly happens-before excludes consume operations.

So a release/consume operation cannot be Strongly-HB.

Release/acquire can be HB and Simply-HB (because release/acquire synchronizes-with) but is not necessarily Strongly-HB. Because Strongly-HB specifically says that A must synchronize-with B AND be a Sequentially Consistent operation.

Is happens-before guaranteed?

HB Simply-HB Strongly-HB
relaxed no no no
release/consume yes no no
release/acquire yes yes no
S.C. yes yes yes

What does the "A appears to be evaluated before B in all contexts" in the note mean?

All contexts: All threads / all CPUs see (or "will eventually agree on") the same order. This is the guarantee of sequential consistency--a global total modification order of all variables. Acquire/release chains only guarantee perceived modification order for threads participating in the chain. Threads outside the chain are theoretically allowed to see a different order.

I do not know why Strongly-HB and Simply-HB were introduced. Maybe to help clarify how to operate around consume? Strongly-HB has a nice properties--if one thread observes A strongly-happens-before B, it knows all threads will observe the same thing.

The history of consume:

Paul E. McKenney is responsible for consume being in the C and C++ standards. Consume guarantees ordering between pointer assignment and the memory it points to. It was invented because of the DEC Alpha. The DEC Alpha could speculatively dereference a pointer, thus it also had a memory fence to prevent this. The DEC Alpha is no longer made and no processors today have this behavior. Consume is intended to be very relaxed.

Wilterdink answered 22/11, 2019 at 12:17 Comment(9)
Good grief. I almost regret asking that question. I want to go back to tackling the easy C++ issues like the rules of iterator invalidation, argument dependent name lookup, template user defined conversion operators, template argument deduction, when name lookup looks in a base class in member of a template, and when you can convert to a virtual base in the beginning of construction of an object.Bonefish
Re: consume. Do you claim that the fate of consume ordering is linked to the fate of DEC Alpha and has no value outside that particular arch?Bonefish
That's a good question. Looking into it more now, it sounds like consume could theoretically give a performance boost for weakly ordered arches like ARM and PowerPC. Give me some more time to look into it.Wilterdink
I'd say consume exists because of all weakly-ordered ISAs other than Alpha. In Alpha asm the only options are relaxed and acquire (and seq-cst), not dependency ordering. mo_consume is intended to take advantage of data dependency ordering on real CPUs, and formalize that the compiler can't break the data dependency via branch prediction. e.g. int *p = load(); tmp = *p; could be broken by the compiler introducing if(p==known_address) tmp = *known_address; else tmp=*p; if it had some reason to expect a certain pointer value to be common. That's legal for relaxed but not consume.Hubsher
@PeterCordes right... arches with weak ordering have to emit a memory barrier for acquire, but (theoretically) not for consume. It sounds like you think that if the Alpha never existed we would still have consume? Also, you're basically saying that consume is a fancy (or "standard") compiler barrier.Wilterdink
You were thinking that if no real-life HW exists where asm data deps don't give consume ordering, the standard could just guarantee dependency ordering for relaxed? Unfortunately it's not that simple. The as-if rule allows compilers to break data dependencies, e.g. branch on a bool (control dependency) instead of using it as an index, or removing entirely if the source included x - x to depend on a ready-flag as an always-zero array index. So mo_consume does need special compiler support. It's not a "compiler barrier", it's blocking other kinds of optimizations that's needed.Hubsher
We want compilers to be able to fully optimize in general; if any function arg might carry a dependency (i.e. be from a relaxed load) then compilers would not be able to do some optimizations in most code. preshing.com/20141124/… has some details of old problems and attempts before GCC/clang gave up on implementing mo_consume efficiently (they treat it as acquire). The C++ committee considers it broken for now. I haven't watched isocpp.org/blog/2016/08/…Hubsher
So TL:DR: yes we'd need mo_consume even if Alpha had never existed. (In my first comment I thought you were arguing that Alpha gave us the idea for mo_consume or something, but I think you were actually arguing that it's only needed because Alpha compilers need an asm barrier for consume but others in theory don't. Yes that's true, but we still need consume because of other optimizations, and the possibility of other future HW that doesn't have dependency ordering for free. e.g. value prediction could break it. Undesirable for most use-cases to make RCU expensive, though.)Hubsher
I just watched Paul McKenney's talk I linked earlier. He spends a bunch of time on how RCU works, and finds out he only had 40 minutes for the talk not an hour so has to rush through the C++11 mo_consume standardization part. But you can see some of the slides, like youtu.be/ZrNQKpOypqU?t=2470 if you pause and single-step frames (forward/back with , and . keys) where he goes into some details about how hard it is to keep track of dependencies when passing values between functions, and some proposals to fix the standard.Hubsher

© 2022 - 2024 — McMap. All rights reserved.