Yes, the transformation is allowed, for the reasons you yourself gave.
For any given input, no matter how the rest of the code looks, the set of observable behaviors permitted with the transformed function will always be a subset of the observable behaviors permitted with the original function.
There is no requirement that unlocking the mutex will cause another thread blocked on acquiring a lock to immediately wake up and atomically with the unlocking acquire the lock. Therefore, it is perfectly fine to schedule the unlocking thread so that it will immediately reacquire the lock, regardless of the actions of any other thread.
Whether the lock was released is not an observable behavior. Observable behavior is only volatile
access of objects, the final state of files written and interactive IO in an implementation-defined manner. See [intro.abstract]/6.
So the release/reacquire can be skipped without affecting the observable behavior with this scheduling scheme, which is a permissible scheduling scheme and therefore defines one of the permissible observable behavior choices. The compiler only has to assure that any one choice of the permissible observable behaviors is realized. See [intro.abstract]/5.
On the other hand, it seems plausible this optimisation could slippery-slope to every program containing a mutex being optimised into: lock mutex, do entire program, unlock mutex. Such behaviour would be obviously unhelpful for concurrent applications, so it might be reasonably forbidden.
There is [intro.progress]/8, which recommends that the thread executing main
and the threads created with std::thread
/std::jthread
should provide the defined concurrent forward-progress guarantees, which means that the thread should, as long as it hasn't terminated, make progress ([intro.progress]/6) eventually.
For the purpose of making progress, per [intro.progress]/4, a blocking library call is considered to continuously check the condition causing it to block, each such check "making progress". Other execution steps that cause progress to be made are access through volatiles, completion of a call to a library IO function, of synchronization operations and of atomic operations.
However, I don't think that this even forbids the transformation of
while(true)
{
std::lock_guard lock{mutex};
//A
x++; // (assume x is unsigned here)
}
into
std::lock_guard lock{mutex};
while(true)
{
x++;
}
Even if another thread attempts to lock the mutex, the following scheduling behavior would not violate this forward progress guarantee: Repeatedly execute the loop until //A
, then switch to the other thread to "make progress" by checking the blocking condition, then switches back and repeat the above.
Forward progress and scheduling of threads is largely left as a quality-of-implementation decision to the implementation. See also N3209 discussing this from when multi-threaded execution was added to C++11.
I do not expect that any compiler will make any attempt at such transformations at the code level.
However, even common OS schedulers won't provide any strict guarantee that the scheduling scheme I described above won't happen. Generally it just becomes probabilistically unlikely over longer times because threads are preempted at more or less stochastically noisy intervals. If the scheduling happens to be as described above, then even without the compilers transformation the behavior of the program will be as if the transformation was made.
I could imagine other environments where the described scheduling behavior may occur indefinitely. Suppose for example that the mutex locking is implemented as a simple spin lock on a uniprocessor system and suppose that the scheduler is perfectly fair and deterministic in the number of instructions it will let each thread execute in any time slice. In such an environment you may be unlucky that the locked thread happens to be always able to reacquire the lock in the same time slice that it is released.
Or multithreading may be completely cooperative, in which case there may be an issue if the mutex is not implemented to yield after an unlock.
I think the standard can't make any stronger guarantees than it does, in part because it would make supporting such environments impossible.
[intro.progress]/1
lists both. – Aguiememory_order_acquire
"...no reads or writes in the current thread can be reordered before this load...." – Meshachstd::atomic
accesses, which I linked.) – Gratifystd::atomic
too, which we think can be coalesced: https://mcmap.net/q/14865/-why-don-39-t-compilers-merge-redundant-std-atomic-writes/7064452 – Gratify