C++20 thread possibly waiting on std::atomic forever
Asked Answered
H

2

8

Consider the following example code where thread A pushes functions on a queue and thread B executes those when popping from the queue:

std::atomic<uint32_t> itemCount;

//Executed by thread A
void run(std::function<void()> function) {

    if (queue.push(std::move(function))) {
        itemCount.fetch_add(1, std::memory_order_acq_rel);
        itemCount.notify_one();
    }

}


//Executed by thread B
void threadMain(){

    std::function<void()> function;

    while(true){

        if (queue.pop(function)) {

            itemCount.fetch_sub(1, std::memory_order_acq_rel);
            function();

        }else{
            itemCount.wait(0, std::memory_order_acquire);
        }

    }

}

where queue is a concurrent queue which has a push and a pop function, each returning a bool indicating whether the given operation was successful. So push returns false if it's full and pop returns false if it's empty.

Now I was wondering if the code is thread-safe under all circumstances. Let's suppose thread B's pop fails and is about to invoke std::atomic<T>::wait. At the same time, thread A pushes a new element while thread B checks the initial wait condition. Since itemCount hasn't changed yet, it fails.

Immediately after that, thread A increments the counter and tries to notify one waiting thread (although thread B doesn't wait internally yet). Thread B finally waits on the atomic, causing the thread to never wake up again due to the lost signal despite there being an element in the queue. That only stops as soon as a new element is being pushed on the queue, notifying B to continue execution.

I wasn't able to reproduce this situation manually since the timing is close to impossible to get right.

Is this a serious concern or impossible to happen? What (preferably atomic) alternatives do exist in order to account for such rare situations?

EDIT: Just to mention, the queue is not blocking and only utilizes atomic operations.


The reason I'm asking is I don't understand how it's possible to implement an atomic wait operation. Although the standard says the whole operation is atomic (consisting of a load + predicate check + wait), in the implementation I'm using std::atomic<T>::wait is implemented roughly as follows:

void wait(const _TVal _Expected, const memory_order _Order = memory_order_seq_cst) const noexcept {
    _Atomic_wait_direct(this, _Atomic_reinterpret_as<long>(_Expected), _Order);
}

where _Atomic_wait_direct is defined as

template <class _Ty, class _Value_type>
void _Atomic_wait_direct(
    const _Atomic_storage<_Ty>* const _This, _Value_type _Expected_bytes, const memory_order _Order) noexcept {
    const auto _Storage_ptr = _STD addressof(_This->_Storage);
    for (;;) {
        const _Value_type _Observed_bytes = _Atomic_reinterpret_as<_Value_type>(_This->load(_Order));
        if (_Expected_bytes != _Observed_bytes) {
            return;
        }

        __std_atomic_wait_direct(_Storage_ptr, &_Expected_bytes, sizeof(_Value_type), _Atomic_wait_no_timeout);
    }
}

We can clearly see that there is an atomic load with the specified memory order to check the state of the atomic itself. However, I don't see how the whole operation can be considered atomic since there is a comparision right before the call to __std_atomic_wait_direct.

With condition variables, the predicate itself is secured by a mutex but how is the atomic itself secured here?

Heid answered 4/1, 2021 at 1:13 Comment(18)
The memory_order_acquire and memory_order_acq_rel are the possible problem here. Without them, your described problem does not happen. With them, I can't tell you. And if you cannot prove your code correct without those flags, don't use those flags.Eldwin
I already suspected this to be an issue, but why exactly is memory_order_seq_cst safe in this case?Heid
If there are others like me who wonder in what alternate universe C++ atomic has condition variable like apis the answer is this universe in C++20 Difference between std::atomic and std::condition_variable wait, notify_* methodsOmor
Seems to me you're trying to use an std::atomic<uint32_t> as a counting semaphore. I'd just use an std::counting_semaphore> instead. I'd also build that into the queue itself, so push and pop deal with it, and the outside world doesn't. As an aside, I'd also pass a timeout to the pop, so it'll automatically wait up to som specified amount of time before failing.Adorl
@Heid I can reason about memory_order_seq_cst, so I can show that your scenario cannot happen; I mean, read how wait works. I cannot reliably reason about acquire or acq_rel, so I cannot say anything of note (I do not fully understand how they work). I claim that if you cannot see that memory_order_seq_cst is safe, then maybe you shouldn't be using the more exotic memory model flags.Eldwin
I am aware of how memory_order_seq_cst works, but still looking at the implementation of wait I don't think it can guarantee it can't happen. Actually, this is not a matter of atomicity itself; as I said the first load in wait (the predicate) could receive 0, then fetch_add sets it to 1 so that still the notify could be issued before the wait to the OS. I might be horribly off, but I want to justify at least why I think memory ordering isn't the main issue.Heid
@Heid Without sequential consistency, do you know if you'll see the +1 being set before you see the notification arriving? I don't know. With it, I do.Eldwin
Exactly. But that misses the point. Like I said, it's not necessarily about ordering here. Consider the case I proposed. Take the strongest ordering, no optimizations at all. Can you guarantee that the notify happens after the wait call into the OS considering all possible timings between operations in the two threads?Heid
wait is an atomic operation, and so is fetch_add. They don't interleave, but execute in some global order. If wait happens to execute first, then it would be followed by fetch_add and notify_all, which would wake it up. If fetch_add happens to execute first, then by the time wait is executed, itemCount.load() != 0 and wait will return immediately. You think of wait as executing in steps - check the value of the atomic, then enter the wait - but that's as wrong as thinking that fetch_add fetches first then adds, and worrying that another operation would squeeze in between.Ally
If that's the case, then indeed there is no way the described situation could ever happen. Seems like I got confused by the standard library's code because the whole function itself didn't look like one atomic operation. Oh well.Heid
@IgorTandetnik That is definitely true with sequential consistency. But without sequential consistency, what guarantee do you have that the thread doing the wait doesn't "see" the notify before the increment? The OP explicitly asked things are allowed to happen out of sequence. Do you know that the aquire/release and notify rules still provide sufficient guarantees? I don't.Eldwin
@IgorTandetnik: I'm skeptical that it's that simple. I've been doing some tests that suggest that (on a common implementation), if you have a x.fetch_add(1, memory_order_seq_cst) followed by a y.load(memory_order_relaxed), the load can be reordered in between the load and store of the fetch_add(). The x.fetch_add() is atomic in the sense that no stores to x can come between its load and store, and that other observers will not see intermediate values for x, but AFAIK the atomicity isn't related to its ordering.Discerning
In that case, it's reasonable to wonder whether itemCount.notify_one() could also be reordered in between the load and store of itemCount.fetch_add(), which would appear to cause this algorithm to deadlock.Discerning
@NateEldredge I added an answer with the detailed analysis. See if you find it convincing.Ally
@Yakk-AdamNevraumont I added an answer with the detailed analysis. See if you find it convincing.Ally
If you just spin until queue.pop returns true you don't need that itemCount at all.Hautegaronne
@MaximEgorushkin I really want to avoid spinning if possible. Some implementations of std::atomic default to a short spin, then revert to something less-busy (e.g. libstdc++ from gcc). Of course there is no guarantee but I prefer a solution that at least allows non-spinning too.Heid
If you'd like to avoid spinning you should use a blocking queue, possibly with a spin-mutex, rather than trying to invent a blocking pop for a non-blocking queue.Hautegaronne
A
9

Here's what the standard has to say:

[intro.races]/4 All modifications to a particular atomic object M occur in some particular total order, called the modification order of M.

[atomics.wait]/4 A call to an atomic waiting operation on an atomic object M is eligible to be unblocked by a call to an atomic notifying operation on M if there exist side effects X and Y on M such that:
(4.1) — the atomic waiting operation has blocked after observing the result of X,
(4.2) — X precedes Y in the modification order of M, and
(4.3) — Y happens before the call to the atomic notifying operation.

You posit the following scenario:

  1. The current value of itemCount is zero, either from original initialization or a previous fetch_sub.
  2. wait loads itemCount and observes the value of 0.
  3. The other thread calls fetch_add and notify_one
  4. wait goes blocking, as it believes the now-stale value of 0.

In this scenario, M is itemCount, X is the old fetch_sub that brought the value to 0 (we assume that it happened long ago and is properly visible to all threads) and Y is the fetch_add that changes the value to 1.

The standard says that the wait call (spanning steps 2 and 4) is in fact eligible to be unblocked by notify_one call on step 3. Indeed:

(4.1) - wait has blocked after observing itemCount == 0 (if it has not, then the problem fails to arise).
(4.2) - fetch_sub precedes fetch_add in the modification order of itemCount (by assumption that fetch_sub happened long ago).
(4.3) - fetch_add happens before (in fact, is sequenced before) notify_one; they are called one after the other by the same thread.

Therefore, a conforming implementation must either not allow wait to block in the first place, or have notify_one wake it up; it cannot allow the notification to be missed.


The only place where memory orders figure in this discussion is in (4.1), "the atomic waiting operation has blocked after observing the result of X". Perhaps fetch_add actually occurred before wait (by wall clock), but was not visible to wait, so it went blocking anyway. But it doesn't matter to the outcome - either wait observes the result of fetch_add and doesn't block at all; or it observes the result of the old fetch_sub and blocks, but then notfiy_one is required to wake it up.

Ally answered 4/1, 2021 at 13:34 Comment(2)
So, I guess the point is that eligible to be unblocked by permits fairly weak ordering. Loading the result of X, where X precedes Y in modification order, does not prove that your load happened-before Y, but happens-before isn't what eligible to be unlocked by requires.Discerning
Such a clarifying answer -- thank you!Adalai
D
5

Just to point out what may have led to the confusion in the first place: the function __std_atomic_wait_direct, all by itself, does the compare and block in an atomic fashion. It boils down to either a call to WaitOnAddress, or else uses a condition variable, where the compare is done while holding the associated mutex.

The if (_Expected_bytes != _Observed_bytes) is merely an optimization. If the variable is already different from the expected value, we can return right away without making the relatively expensive call to an underlying atomic function. It doesn't need to be atomic with the block, and could be removed altogether without affecting the semantics.

Discerning answered 4/12, 2021 at 22:39 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.