signal on condition variable without holding lock
Asked Answered
P

2

9

So I just found out that it's legal to signal a condition variable if you're not holding the lock in c++11. That seems to open the door to some nasty race condition:

std::mutex m_mutex;
std::condition_variable m_cv;

T1: 
  std::unique_lock<std::mutex> lock(m_mutex);
  m_cv.wait(lock, []{ return !is_empty(); });

T2:
  generate_data();
  m_cv.notify();

Is it guaranteed that T1 will never end up in a situation where we check is_empty() first (it returning true), then getting preempted by T2 which creates some data and signals the condition variable before we can actually wait on it?

If this is guaranteed to work (I'd guess so, otherwise it would seem like an intentionally bad API design), how is this actually implemented for say linux and stdlibc++? Seems we'd need another lock to avoid this situation.

Pompei answered 29/1, 2014 at 18:20 Comment(1)
Yep, that is an exact recipe for a race condition, just like https://mcmap.net/q/195818/-sync-is-unreliable-using-std-atomic-and-std-condition_variable/412080Zacharia
W
13

Checking the predicate and waiting are not performed atomically in std::condition_variable::wait (unlocking the lock and sleeping are performed atomically). If it is possible for another thread to change the value of the predicate while this thread holds the mutex, then it is possible for notifications to occur between the predicate check and going to sleep, and effectively be lost.

In your example, if generate_data() in T2 can alter the result of is_empty() without holding m_mutex, it's possible for a notification to happen between T1 checking is_empty() and sleeping on m_cv. Holding the mutex at any time between the change to the predicate and the notification is sufficient to guarantee the atomicity of the predicate check and wait call in the other thread. That could look like:

{
  std::lock_guard<std::mutex> lk(m_mutex);
  generate_data();
}
m_cv.notify();

or even

generate_data();
std::lock_guard<std::mutex>(m_mutex); // Lock the mutex and drop it immediately
m_cv.notify();
Whereabouts answered 29/1, 2014 at 18:32 Comment(9)
I was a bit sloppy with the formulation here and assumed everybody would be familiar with the general idiom where you lock data publishing and notifying. But you're right that just locking the actual publishing is enough, which I guess is a good reason in some situations (I doubt though that it makes any difference for the shown code whether notify is in the lock too or outside) to allow this scenario.Pompei
@Pompei Notifying inside vs. outside the lock doesn't affect correctness, notifying inside the lock wastes a bit of performance if the waiting thread (or several threads in the case of notify_all) wakes up only to immediately block on the mutex.Whereabouts
@Casey: Why will your first solution not work? i.e. taking a lock before making the call to notify()? If T2 gets the lock, T1 won't enter the wait, and when it does, the predicate is true, and hence it doesn't wait which is the expected behavior.Portraiture
You are correct - taking the lock any time between generate_data and the notify assures that the notify cannot happen between the other threads check of the predicate and wait.Whereabouts
So, is "lock and immediate release" makes sure that T1 (consumer) is waiting (on m_cv)? Also consider, if #0) T2 generates data #1) T2 Locks and releases lock #2) T1 gets spurious wake-up #3) Even though notify is not called by T2 yet, T1 uses generated data and exits #4) T2 calls notify (which will be no-op as T1 is not waiting). And this is a false notify as data is already used (empty again). Is this valid scenario and will there be any issue in your code @casey? (I don't think any issue, just want to verify).Mulholland
Your last scenario seems incorrect. If generate_data() is called without holding the mutex, the changes (presumably what is_empty() is looking at) are not synchronized. That is undefined behavior. Acquiring and releasing the lock after that does not make it valid.Flub
@Casey, I have similar question: #54226220. It seems that the mutex is actually not needed in case you do not need to transfer the data between threads, right?Furnace
@Flub I'm assuming that generate_data handles any necessary synchronization or requires none since the sample code in the question doesn't depict any locking around the call to generate_data in T2.Whereabouts
@Furnace If you simply want to wake up any thread that happens to be waiting, the mutex is unnecessary. I can't tell you if that solves your problem without knowing what problem you're solving.Whereabouts
I
3

It is not guaranteed - if you don't want to miss the signal then you must lock the mutex prior to notifying. Some applications may be agnostic about missing signals.

From man pthread_signal:

The pthread_cond_signal() or pthread_cond_broadcast() functions may be called by a thread whether or not it currently owns the mutex that threads calling pthread_cond_wait() or pthread_cond_timedwait() have associated with the condition variable during their waits; however, if predictable scheduling behaviour is required, then that mutex is locked by the thread calling pthread_cond_signal() or pthread_cond_broadcast().

Intoxicative answered 29/1, 2014 at 18:28 Comment(2)
Locking the mutex prior to signaling is not sufficient to guarantee that signals cannot be missed; it's necessary to ensure that the value of the condition (the associated predicate) cannot change while the lock is held.Whereabouts
What is the predicate is based on atomic counters? Doesn't it mean that now we have to take a lock before modifying atomic variables thereby defeating the whole purpose of them being atomic?Portraiture

© 2022 - 2024 — McMap. All rights reserved.