Why do we need an empty std::lock_guard before doing condition variable notify?
Asked Answered
C

1

9

I am currently studying Google's Filament job system. You can find the source code here. The part that confuses me is this requestExit() method:

void JobSystem::requestExit() noexcept {
    mExitRequested.store(true);

    { std::lock_guard<Mutex> lock(mLooperLock); }
    mLooperCondition.notify_all();

    { std::lock_guard<Mutex> lock(mWaiterLock); }
    mWaiterCondition.notify_all();
}

I am confused why we need to lock and unlock even though there is no action in between the lock and unlock. Are there any cases where this empty lock and unlock is necessary?

Concavity answered 10/6, 2019 at 4:52 Comment(0)
U
9

This is a bit of a hack. First, let's look at the code without that:

mExitRequested.store(true);
mLooperCondition.notify_all();

There's a possible race condition here. Some other code might have noticed that mExitRequested was false and started waiting for mLooperCondition right after we called notify_all.

The race would be:

  1. Other thread checks mExitRequested, it's false.
  2. We set mExitRequested to true.
  3. We call mLooperCondition.notify_all.
  4. Other thread waits for mLooperCondition.
  5. Oops. Waiting for notify that already happened.

But in order to wait for a condition variable, you must hold the associated mutex. So that can only happen if some other thread held the mLooperLock mutex. In fact, step 4 would really be: "Other thread releases mLooperLock and waits for mLooperCondition.

So, for this race to happen, it must happen precisely like this:

  1. Other thread acquires mLooperLock.
  2. Other thread checks mExitRequested, it's false.
  3. We set mExitRequested to true.
  4. We call mLooperCondition.notify_all.
  5. Other thread waits for mLooperCondition, releasing mLooperLock.
  6. Oops. Waiting for notify that already happened.

So, if we change the code to:

mExitRequested.store(true);
{ std::lock_guard<Mutex> lock(mLooperLock); }
mLooperCondition.notify_all();

That ensures that no other thread could check mExitRequested and see false and then wait for mLooperCondition. Because the other thread would have to hold the mLooperLock lock through the whole process, which can't happen since we acquired it in the middle of that process.

Trying it again:

  1. Other thread acquires mLooperLock.
  2. Other thread checks mExitRequested, it's false.
  3. We set mExitRequested to true.
  4. By acquiring and releasing nLooperLock, we do not make any forward progress until the other thread releases mLooperLock.
  5. We call mLooperCondition.notify_all.

Now, either the other thread blocks on the condition or it doesn't. If it doesn't, there's no problem. If it does, there's still no problem because the unlocking of mLooperLock is the condition variable's atomic "unlock and wait" operation, guaranteeing that it sees our notify.

Urbanus answered 10/6, 2019 at 4:59 Comment(3)
Thanks for the clear answer. Is there any reason to use atomic variable here? Why not just use regular variable and change it inside lock. Is it faster to use atomic variable in this case?Concavity
Regarding my previous comment, I think I understand the reason to use atomic variable. So that we can use only one variable for both mLooperLock and mWaiterLock. We don't need to do double locking before changing mExitRequested to true.Concavity
There might be other considerations too, but I think that's the most likely explanation. One alternative would be to have two mExitRequested variables, one protected by each lock. Another would be to acquire both locks while changing mExitRequested (which I presume is a rare operation used only on shutdown). I consider the approach used in this code fragile, confusing, and hacky and would avoid it unless there were some significant disadvantages to those other approaches.Urbanus

© 2022 - 2024 — McMap. All rights reserved.