Possible race condition in std::condition_variable?
Asked Answered
S

1

6

I've looked into the VC++ implementation of std::condition_variable(lock,pred), basically, it looks like this:

template<class _Predicate>
        void wait(unique_lock<mutex>& _Lck, _Predicate _Pred)
        {   // wait for signal and test predicate
        while (!_Pred())
            wait(_Lck);
        }

Basically , the naked wait calls _Cnd_waitX which calls _Cnd_waitwhich calls do_wait which calls cond->_get_cv()->wait(cs); (all of these are in the file cond.c).

cond->_get_cv() returns Concurrency::details::stl_condition_variable_interface .

If we go to the file primitives.h, we see that under windows 7 and above, we have the class stl_condition_variable_win7 which contains the old good win32 CONDITION_VARIABLE, and wait calls __crtSleepConditionVariableSRW.

Doing a bit of assembly debug, __crtSleepConditionVariableSRW just extract the the SleepConditionVariableSRW function pointer, and calls it.

Here's the thing: as far as I know, the win32 CONDITION_VARIABLE is not a kernel object, but a user mode one. Therefore, if some thread notifies this variable and no thread actually sleep on it, you lost the notification, and the thread will remain sleeping until timeout has reached or some other thread notifies it. A small program can actually prove it - if you miss the point of notification - your thread will remain sleeping although some other thread notified it.

My question goes like this:
one thread waits on a condition variable and the predicate returns false. Then, the whole chain of calls explained above takes place. In that time, another thread changed the environment so the predicate will return true and notifies the condition variable. We passed the predicate in the original thread, but we still didn't get into SleepConditionVariableSRW - the call chain is very long.

So, although we notified the condition variable and the predicate put on the condition variable will definitely return true (because the notifier made so), we are still blocking on the condition variable, possibly forever.

Is this how should it behave? It seems like a big ugly race condition waiting to happen. If you notify a condition variable and it's predicate returns true - the thread should unblock. But if we're in the limbo between checking the predicate and going to sleep - we are blocked forever. std::condition_variable::wait is not an atomic function.

What does the standard says about it and is it a really race condition?

Submultiple answered 26/2, 2017 at 16:2 Comment(0)
A
3

You've violated the contract so all bets are off. See: http://en.cppreference.com/w/cpp/thread/condition_variable

TLDR: It's impossible for the predicate to change by someone else while you're holding the mutex.

You're supposed to change the underlying variable of the predicate while holding a mutex and you have to acquire that mutex before calling std::condition_variable::wait (both because wait releases the mutex, and because that's the contract).

In the scenario you described the change happened after the while (!_Pred()) saw that the predicate doesn't hold but before wait(_Lck) had a chance to release the mutex. This means that you changed the thing the predicate checks without holding the mutex. You have violated the rules and a race condition or an infinite wait are still not the worst kinds of UB you can get. At least these are local and related to the rules you violated so you can find the error...

If you play by the rules, either:

  1. The waiter takes hold of the mutex first
  2. Goes into std::condition_variable::wait. (Recall the notifier still waits on the mutex.)
  3. Checks the predicate and sees that it doesn't hold. (Recall the notifier still waits on the mutex.)
  4. Call some implementation defined magic to release the mutex and wait, and only now may the notifier proceed.
  5. The notifier finally managed to take the mutex.
  6. The notifier changes whatever needs to change for the predicate to hold true.
  7. The notifier calls std::condition_variable::notify_one.

or:

  1. The notifier acquires the mutex. (Recall that the waiter is blocked on trying to acquire the mutex.)
  2. The notifier changes whatever needs to change for the predicate to hold true. (Recall that the waiter is still blocked.)
  3. The notifier releases the mutex. (Somewhere along the way the waiter will call std::condition_variable::notify_one, but once the mutex is released...)
  4. The waiter acquires the mutex.
  5. The waiter calls std::condition_variable::wait.
  6. The waiter checks while (!_Pred()) and viola! the predicate is true.
  7. The waiter doesn't even go into the internal wait, so whether or not the notifier managed to call std::condition_variable::notify_one or didn't manage to do that yet is irrelevant.

That's the rationale behind the requirement on cppreference.com:

Even if the shared variable is atomic, it must be modified under the mutex in order to correctly publish the modification to the waiting thread.

Note that this is a general rule for condition variables rather than a special requirements for std::condition_variabless (including Windows CONDITION_VARIABLEs, POSIX pthread_cond_ts, etc.).


Recall that the wait overload that takes a predicate is just a convenience function so that the caller doesn't have to deal with spurious wakeups. The standard (§30.5.1/15) explicitly says that this overload is equivalent to the while loop in Microsoft's implementation:

Effects: Equivalent to:

while (!pred())
    wait(lock);

Does the simple wait work? Do you test the predicate before and after calling wait? Great. You're doing the same. Or are you questioning void std::condition_variable::wait( std::unique_lock<std::mutex>& lock ); too?


Windows Critical Sections and Slim Reader/Writer Locks being user-mode facilities rather than kernel objects is immaterial and irrelevant to the question. There are alternative implementations. If you're interested to know how Windows manages to atomically release a CS/SRWL and enter a wait state (what naive pre-Vista user-mode implementations with Mutexes and Events did wrong) that's a different question.

Arathorn answered 26/2, 2017 at 21:12 Comment(2)
aha, I missed the "Even if the shared variable is atomic..." which is what I had in mind.Submultiple
@DavidHaim: Note that this is a hint given by cppreference. The standard says nothing of the sort. std::condition_variable (as well as condition variables in general) creates a relationship between waits and notifications and holding a mutex. It is assumed that you have a valid reason to acquire a mutex in that way. Or not. What to do with that mutex is your business. The only thing condition variables really do is release a mutex and enter a wait state atomically. The "condition" part is totally up to you.Arathorn

© 2022 - 2024 — McMap. All rights reserved.