There exist universally accepted definitions of lock-freedom and wait-freedom, and the definitions provided in your question are consistent with those. And I would strongly assume that the C++ standard committee certainly sticks to definitions that are universally accepted in the scientific community.
In general, publications on lock-free/wait-free algorithms assume that CPU instructions are wait-free. Instead, the arguments about progress guarantees focus on the software algorithm.
Based on this assumption I would argue that any std::atomic
method that can be translated to a single atomic instruction for some architecture is wait-free on that specific architecture. Whether such a translation is possible sometimes depends on how the method is used though. Take for example fetch_or
. On x86 this can be translated to lock or
, but only if you do not use its return value, because this instruction does not provide the original value. If you use the return value, then the compiler will create a CAS-loop, which is lock-free, but not wait-free. (And the same goes for fetch_and
/fetch_xor
.)
So which methods are actually wait-free depends not only on the compiler, but mostly on the target architecture.
Whether it is technically correct to assume that a single instruction is actually wait-free or not is a rather philosophical one IMHO. True, it might not be guaranteed that an instruction finishes execution within a bounded number of "steps" (whatever such a step might be), but the machine instruction is still the smallest unit on the lowest level that we can see and control. Actually, if you cannot assume that a single instruction is wait-free, then strictly speaking it is not possible to run any real-time code on that architecture, because real-time also requires strict bounds on time/the number of steps.
This is what the C++17 standard states in [intro.progress]
:
Executions of atomic functions that are either defined to be lock-free (32.8) or indicated as lock-free (32.5)
are lock-free executions.
- If there is only one thread that is not blocked (3.6) in a standard library function, a lock-free execution in that thread shall complete. [ Note: Concurrently executing threads may prevent progress of a lock-free
execution. For example, this situation can occur with load-locked store-conditional implementations. This property is sometimes termed obstruction-free. — end note ]
- When one or more lock-free executions run concurrently, at least one should complete. [ Note: It is difficult for some implementations to provide absolute guarantees to this effect, since repeated and particularly inopportune interference from other threads may prevent forward progress, e.g., by repeatedly stealing a cache line for unrelated purposes between load-locked and store-conditional instructions. Implementations should ensure that such effects cannot indefinitely delay progress under expected operating conditions, and that such anomalies can therefore safely be ignored by programmers. Outside this document, this property is sometimes termed lock-free. — end note ]
The other answer correctly pointed out that my original answer was a bit imprecise, since there exist two stronger subtypes of wait-freedom.
- wait-free - A method is wait-free if it guarantees that every call finishes its execution in a finite number of steps, i.e., it is not possible to determine an upper bound, but it must still be guaranteed that the number of steps is finite.
- wait-free bounded - A method is wait-free bounded if it guarantees that every call finishes its execution in a bounded number of steps, where this bound may depend on the number of threads.
- wait-free population oblivious - A method is wait-free population oblivious if it guarantees that every call finishes its execution in a bounded number of steps, and this bound does not depend on the number of threads.
So strictly speaking, the definition in the question is consistent with the definition of wait-free bounded.
In practice, most wait-free algorithms are actually wait-free bounded or even wait-free population oblivious, i.e., it is possible to determine an upper bound on the number of steps.
lock add dword [rdi], 1
wait-free? Depends what you mean by a "step". One instruction, but HW arbitration for access to the cache line means that only one core can own the cache line at once, so if "step = clock cycle" it's still more like lock-free than wait-free. This is similar to Are X86 atomic RMW instructions wait free. And BTW,load()
is wait-free on everything, except maybe 16-byte loads on x86 implemented withlock cmpxchg16b
which old GCC called lock-free – Pistachiois_lock_free()
means in terms of being wait-free vs. lock-free in the computer science / en.wikipedia.org/wiki/Non-blocking_algorithm sense. I think C++ is just using it in the sense of "lockless", no hidden mutex. Anyway, I tagged this computer-science because it's about terminology and putting a label on what real implementations actually do (the pure ISO C++ part is unanswerable). cpu-architecture could fit instead... – Pistachiostd::atomic<double>::fetch_add()
is lock-free but not wait-free; it does a compare-and-swap loop, so if another thread is modifying the variable very frequently, we may have to loop arbitrarily many times. See on godbolt – Sandpaperwait-free
guarantee from the C++ Standard. However, the question (right after where you quoted from) says "... and what is implemented in Clang and/or GCC (your version of choice)." So if the C++ Standard does not ask for it, what is the situation on GCC and Clang (over the architectures that you are more familiar with)? – Deedradeeds