How are MT programs proven correct with "non sequential" semantics?
Asked Answered
I

1

1

This could be a language neutral question, but in practice I'm interested with the C++ case: how are multithread programs written in C++ versions that support MT programming, that is modern C++ with a memory model, ever proven correct?

In old C++, MT programs were just written in term of pthread semantics and validated in term of the pthread rules which was conceptually easy: use primitives correctly and avoid data races.

Now, the C++ language semantic is defined in term of a memory model and not in term of sequential execution of primitive steps. (Also the standard mentions an "abstract machine" but I don't understand any more what it means.)

How are C++ programs proven correct with that non sequential semantic? How can anyone reason about a program that is not doing primitive steps one after the other?

Iver answered 6/11, 2019 at 3:53 Comment(4)
It almost boils down to 3 things. If you use the sequentially consistent semantics the MT primitives give you then you have a guaranteed total order. Then you have the manual version of that which if you do it right gives you a total order. Then there is the relaxed method where there is a set of possible outcomes and you'll get one of those, but you don't know which. Here is a good read about the semantics: en.cppreference.com/w/cpp/atomic/memory_orderBibliopegy
@NathanOliver-ReinstateMonica "If you use the sequentially consistent semantics the MT primitives give you then you have a guaranteed total order" You get the total order only on operations on atomics; what gives you a total order mutexes or on anything else?Iver
The standard. Sorry, but I'm not going to hash this out with you again. I've tried a couple/few times and you never believe me so I'm just going to disengage.Bibliopegy
@NathanOliver-ReinstateMonica So now it's a faith issue? Maybe you should "disengage" if you take it as such.Iver
S
5

It is "conceptually easier" with the C++ memory model than it was with pthreads prior to the C++ memory model. C++ prior to the memory model interacting with pthreads was loosely specified, and reasonable interpretations of the specification permitted the compiler to "introduce" data races, so it is extremely difficult (if possible at all) to reason about or prove correctness for MT algorithms in the context of older C++ with pthreads.

There seems to be a fundamental misunderstanding in the question in that C++ was never defined as a sequential execution of primitive steps. It has always been the case that there is a partial ordering between expression evaluations. And the compiler is allowed to move such expressions around unless constrained from doing so. This was unchanged by the introduction of the memory model. The memory model introduced a partial order for evaluations between separate threads of execution.

The advice "use the primitives correctly and avoid data races" still applies, but the C++ memory model more strictly and precisely constrains the interaction between the primitives and the rest of the language, allowing more precise reasoning.

In practice, it is not easy to prove correctness in either context. Most programs are not proven to be data race free. One tries to encapsulate as much as possible any synchronization so as to allow reasoning about smaller components, some of which can be proven correct. And one uses tools such as address sanitizer and thread sanitizer to catch data races.

On data races, POSIX says:

Applications shall ensure that access to any memory location by more than one thread of control (threads or processes) is restricted such that no thread of control can read or modify a memory location while another thread of control may be modifying it. Such access is restricted using functions that synchronize thread execution and also synchronize memory with respect to other threads.... Applications may allow more than one thread of control to read a memory location simultaneously.

On data races, C++ says:

The execution of a program contains a data race if it contains two potentially concurrent conflicting actions, at least one of which is not atomic, and neither happens before the other, except for the special case for signal handlers described below. Any such data race results in undefined behavior.

C++ defines more terms and tries to be more precise. The gist of this is that both forbid data races, which in both, are defined as conflicting accesses, without the use of the synchronization primitives.

POSIX says that the pthread functions synchronize memory with respect to other threads. That is underspecified. One could reasonably interpret that as (1) the compiler cannot move memory accesses across such a function call, and (2) after calling such a function in one thread, the prior actions to memory from that thread will be visible to another thread after it calls such a function. This was a common interpretation, and this is easily accomplished by treating the functions as opaque and potentially clobbering all of memory.

As an example of problems with this loose specification, the compiler is still allowed to introduce or remove memory accesses (e.g. through register promotion and spillage) and to make larger accesses than necessary (e.g. touching adjacent fields in a struct). Therefore, the compiler completely correctly could "introduce" data races that weren't written in the source code directly. The C++11 memory model stops it from doing so.

C++ says, with regard to mutex lock:

Synchronization: Prior unlock() operations on the same object shall synchronize with this operation.

So C++ is a little more specific. You have to lock and unlock the same mutex to have synchronization. But given this, C++ says that operations before the unlock are visible to the new locker:

An evaluation A strongly happens before an evaluation D if ... there are evaluations B and C such that A is sequenced before B, B simply happens before C, and C is sequenced before D. [ Note: Informally, if A strongly happens before B, then A appears to be evaluated before B in all contexts. Strongly happens before excludes consume operations. — end note ]

(With B = unlock, C = lock, B simply happens before C because B synchronizes with C. Sequenced before is a concept in a single thread of execution, so for example, one full expression is sequenced before the next.)

So, if you restrict yourself to the sorts of primitives (locks, condition variables, ...) that exist in pthread, and to the type of guarantees provided by pthread (sequential consistency), C++ should add no surprises. In fact, it removes some surprises, adds precision, and is more amenable to correctness proofs.

The article Foundations of the C++ Concurrency Memory Model is a great, expository read for anyone interested in this topic about the problems with the status quo at the time and the choices made to fix them in the C++11 memory model.


Edited to more clearly state that the premise of the question is flawed, that reasoning is easier with the memory model, and add a reference to the Boehm paper, which also shaped some of the exposition.

Squib answered 6/11, 2019 at 22:22 Comment(13)
You are missing the point. C and C++, before std threads, were purely sequential. And pthreads is a library built over that sequential behavior. Although a few corner cases of pthreads are not purely sequential like failed trylock, it's mostly ignorable (you can claim trylock fails early and randomly). C++ w/ threads is not defined in term of sequential evaluations. Can you provide an example of proof of an MT program? How do you even know any variable is not clobbered by another thread?Iver
C++ w/ threads is defined in terms of sequential evaluation. Sequenced-before is the C++ term that refers to the sequencing along a single thread of execution. Using pthreads, one had to rely on POSIX to say how multiple threads are created and interact, and with std::thread the C++ standard. How do you know a variable is't clobbered? The guarantees provided, like with locks as I mentioned, another thread's modification will be visible, just as pthread. Can you provide the sort of proof you want for a pthread-using program? Then we can discuss what (if anything) doesn't carry over.Squib
"C++ w/ threads is defined in terms of sequential evaluation" 1) Where do it say that? 2) Then what do you need the so called "memory model"?Iver
1) I'm referring to [intro.execution]/8 in the current draft. Both before C++11 and after C++11, evaluations in a single thread of execution were partially ordered. So within a single thread of execution, evaluation is now and always has been (quasi) sequential. Before the memory model, the C++ standard had no way to say or guarantee anything about multiple threads of execution, and using pthread relied on certain assumptions that each thread would still behave in this (quasi) sequential way, and POSIX says you can synchronize memory with the pthread functions.Squib
2) With the memory model, you don't have to assume/guess. The standard says that each thread still behaves (quasi) sequentially, it says that using mutexes, memory is synchronized (that "strongly happens before" quote above), similar to pthread. And more than that, it provides ordering guarantees between actions on different threads of execution, and weaker consistency options. Atomics were not available in pthreads, and the C++ memory model makes them possible.Squib
But I don't see any the so called "model" makes anything possible. Either threads are executing sequentially, and the weaker consistency can't exist, or threads don't really execute sequentially, in some unspecified way. Either way, how do you reason about MT programs? You have provided no example of any correct MT program and I don't think there is such thing. (There was under sequential pthread.)Iver
It makes atomics possible. Please provide an example of a correct pthread program, and the quotes above will help show exactly how that correctness proof applies to C++11 and above primitives.Squib
"the compiler is still allowed to introduce or remove memory accesses (e.g. through register promotion and spillage)" Do you have examples of compilers having done that? "to make larger accesses than necessary (e.g. touching adjacent fields in a struct)." Even when the compiler is not allowed to do that, it might be a good idea to move data used by diff threads farther away in memory, for performance.Iver
"C++ was never defined as a sequential execution of primitive steps" Can you give an example of some non threaded code that wasn't sequential?Iver
Regarding of examples of it happening, the paper states that it has and references [7]. Regarding sequential-ness, stuff like i += ++i; was never "sequential". There was a partial ordering of expression evaluations, and certain evaluations had no guaranteed relative ordering. The C++ memory model added to the partial ordering with expression evaluations on different threads.Squib
"stuff like i += ++i; was never "sequential" no, it was incorrect. "There was a partial ordering of expression evaluations" That has nothing to do with the question.Iver
Fair enough. I took that example too far. I should have said in f(a, b) there was no ordering for the evaluations of a, b, or f. And the partial ordering has everything to do with the question. Pre-memory model, there was a partial ordering of evaluations with in a single thread (certain things were unsequenced) and there was no relative ordering or evaluation guarantees between threads in C++. POSIX provided some guarantees but not enough. The memory model added this ordering of evaluations between threads. It is exactly what you are asking.Squib
"there was a partial ordering of evaluations with in a single thread (certain things were unsequenced)" Partial ordering means that some things are non deterministic, not that they don't happen in an order; at the end of the day, functions are called in an order; this was no different than with the order of locking of mutexes. Do you claim that mutexes are locked in a partial order?Iver

© 2022 - 2024 — McMap. All rights reserved.