Is std::mutex sequentially consistent?
Asked Answered
M

2

19

Say, I have two threads A and B writing to a global Boolean variables fA and fB respectively which are initially set to false and are protected by std::mutex objects mA and mB respectively:

// Thread A
mA.lock();
assert( fA == false );
fA = true;
mA.unlock();

// Thread B
mB.lock()
assert( fB == false );
fB = true;
mB.unlock()

Is it possible to observe the modifications on fA and fB in different orders in different threads C and D? In other words, can the following program

#include <atomic>
#include <cassert>
#include <iostream>
#include <mutex>
#include <thread>
using namespace std;

mutex mA, mB, coutMutex;
bool fA = false, fB = false;

int main()
{
    thread A{ []{
            lock_guard<mutex> lock{mA};
            fA = true;
        } };
    thread B{ [] {
            lock_guard<mutex> lock{mB};
            fB = true;
        } };
    thread C{ [] { // reads fA, then fB
            mA.lock();
            const auto _1 = fA;
            mA.unlock();
            mB.lock();
            const auto _2 = fB;
            mB.unlock();
            lock_guard<mutex> lock{coutMutex};
            cout << "Thread C: fA = " << _1 << ", fB = " << _2 << endl;
        } };
    thread D{ [] { // reads fB, then fA (i. e. vice versa)
            mB.lock();
            const auto _3 = fB;
            mB.unlock();
            mA.lock();
            const auto _4 = fA;
            mA.unlock();
            lock_guard<mutex> lock{coutMutex};
            cout << "Thread D: fA = " << _4 << ", fB = " << _3 << endl;
        } };
    A.join(); B.join(); C.join(); D.join();
}

legally print

Thread C: fA = 1, fB = 0
Thread D: fA = 0, fB = 1

according to the C++ standard?

Note: A spin-lock can be implemented using std::atomic<bool> variables using either sequential consistent memory order or acquire/release memory order. So the question is whether an std::mutex behaves like a sequentially consistent spin-lock or an acquire/release memory order spin-lock.

Metternich answered 25/1, 2017 at 9:12 Comment(1)
Edited my answer (not sure if you get notified for that) and now it's almost the opposite of before, std::mutex is like an acquire/release spinlock but also the output you gave is impossibleBeore
B
9

Yes, that is allowed That output isn't possible, but std::mutex is not necessarily sequentially consistent. Acquire/release is enough to rule out that behaviour.

std::mutex is not defined in the standard to be sequentially consistent, only that

30.4.1.2 Mutex types [thread.mutex.requirements.mutex]

11 Synchronization: Prior unlock() operations on the same object shall synchronize with (1.10) this operation [lock()].

Synchronize-with seems to be defined in the same was as std::memory_order::release/acquire (see this question).
As far as I can see, an acquire/release spinlock would satisfy the standards for std::mutex.

Big edit:

However, I don't think that means what you think (or what I thought). The output is still not possible, since acquire/release semantics are enough to rule it out. This is a kind of subtle point that is better explained here. It seems obviously impossible at first but I think it's right to be cautious with stuff like this.

From the standard, unlock() synchronises with lock(). That means anything that happens before unlock() is visible after lock(). Happens before (henceforth ->) is a slightly weird relation explained better in the above link, but because there's mutexes around everything in this example, everything works like you expect, i.e. const auto _1 = fA; happens before const auto _2 = fB;, and any changes visible to a thread when it unlock()s the mutex are visible to the next thread that lock()s the mutex. Also it has some expected properties, e.g. if X happens before Y and Y happens before Z, then X -> Z, also if X happens before Y then Y doesn't happen before X.

From here it's not hard to see the contradiction that seems intuitively correct.

In short, there's a well defined order of operations for each mutex - e.g. for mutex A, threads A, C, D hold the locks in some sequence. For thread D to print fA=0, it must lock mA before thread A, vice versa for thread C. So the lock sequence for mA is D(mA) -> A(mA) -> C(mA).

For mutex B the sequence must be C(mB) -> B(mB) -> D(mB).

But from the program we know C(mA) -> C(mB), so that lets us put both together to get D(mA) -> A(mA) -> C(mA) -> C(mB) -> B(mB) -> D(mB), which means D(mA) -> D(mB). But the code also gives us D(mB) -> D(mA), which is a contradiction, meaning your observed output is not possible.

This outcome is no different for an acquire/release spinlock, I think everyone was confusing regular acquire/release memory access on a variable with access to a variable protected by a spinlock. The difference is that with a spinlock, the reading threads also perform a compare/exchange and a release write, which is a completely different scenario to a single release write and acquire read.

If you used a sequentially consistent spinlock then this wouldn't affect the output. The only difference is that you could always categorically answer questions like "mutex A was locked before mutex B" from a separate thread that didn't acquire either lock. But for this example and most others, that kind of statement isn't useful, hence acquire/release being the standard.

Beore answered 25/1, 2017 at 11:50 Comment(9)
Doesn't seem that POSIX mutex are sequentially consistent "in practice". I have checked glibc implementations and they are using "acquire" release order. Afaik musl does the same thing.Bike
True, afaik ARM64 is the same (release is seq_cst). But 32-bit ARM is different and that might still be relevant to some.Bike
@curiousguy Looking at this again, I don't think there is one. Each mutex has a defined sequence of lock/unlock so it's impossible for the given sequence to happen. I looked into the specifics of the c++ memory model and, whilst I'm not confident enough to make a logical proof in that context, I'm pretty sure you could. Also, release for x86/64 is NOT seq_cst, I was misremembering that all writes are release and all reads are acquire. The lack of seq_cst just means it's impossible to say if mutex A or mutex B were locked first, but that doesn't affect the output in this programBeore
@JosephIreland These atomic "recipes" ( "C/C++11 mappings to processors" ) and many other resources describe all stores on Intel x86 as release store: "Store Release" is mapped to "MOV (into memory)" and the release fence is a NOP!Hague
@JosephIreland I think that Q: "lack of seq_cst just means it's impossible" is a complete distraction (as Pascal would say). Consider the ridiculous case of a MT program where every thread uses only relaxed atomic operations on thread private data. There is no "global order" defined by the C/C++ MT semantic of operation but execution clearly is sequential as there isn't a race condition. What matters is when "reordering" is observable, not that the std says "there is global order".Hague
@Hague I agree that it's a distraction, since it can't affect the output of this program. I've made a big edit to this answer.Beore
"x86 memory barriers are sequentially consistent anyway" A rel barrier is a NOP so I don't think it's what ppl call "sequentially consistent".Hague
@Hague "release for x86/64 is NOT seq_cst, I was misremembering the fact that all writes are release and all reads are acquire". I deleted the comment that you quoted in case anyone reads it without reading the rest of this.Beore
I think it would be good to mention "The lock and unlock operations on a single mutex appears to occur in a single total order." in the "From the standard" paragraph.Eneidaenema
H
0

Is it possible to observe the modifications on fA and fB in different orders in different threads C and D?

The basic idea of a lock "acquiring" the "released" state (and side effect history) of an unlock make that impossible: you promise to only access a shared object by acquiring the corresponding lock, and that lock will "synchronize" with all past modifications seen by the thread that did the unlock. So only one history, not only of lock-unlock operations, but of accesses to shared objects, can exist.

Hague answered 29/12, 2019 at 3:42 Comment(4)
The question is using 2 different locks for the different variables fA and fB. So there are 2 separate histories with no restrictions on how they interleave with each other. The problem is that simultaneous observation from 2 different readers is impossible because the readers also need to take the corresponding locks.Michail
@PeterCordes The hypothesis is that any access to a shared object X are done after lock(M_X). These different, incomplete histories are local on X. Any thread that wants access to another shared object will get a lock and "pull" another local history, creating a global history.Hague
Ok yes, that seems like a missing piece of the explanation that should be in your answer. BTW, you could use atomic_int with mo_relaxed inside the mutex, and then you could have both readers reading with mo_acquire in opposite orders, allowing both reads to actually happen simultaneously. There ISO C++ would I think allow the readers to disagree on writing order (the IRIW case; possible in practice on POWER. I think POWER only avoids it with seq_cst loads, not with strong barriers on the stores, so I don't think a standard library mutex would ever block it.)Michail
Hmm, on second thought reading with mo_acquire doesn't prove anything. IRIW reordering would still be allowed (and possible in practice on POWER) if the writers used mo_seq_cst atomic stores instead of mutex around mo_relaxed. So yes, the necessary lock-taking for reading is the key here, preventing both read orders from happening simultaneously.Michail

© 2022 - 2024 — McMap. All rights reserved.