The definition of lock-free

Asked 28/5, 2022 at 13:14 Answered 28/5, 2022 at 19:27

Solved c++multithreading computer-science lock-free lockless

There are three different types of "lock-free" algorithms. The definitions given in Concurrency in Action are:

Obstruction-Free: If all other threads are paused, then any given thread will complete its operation in a bounded number of steps.
Lock-Free: If multiple threads are operating on a data structure, then after a bounded number of steps one of them will complete its operation.
Wait-Free: Every thread operating on a data structure will complete its operation in a bounded number of steps, even if other threads are also operating on the data structure.

Herb Sutter says in his talk Lock-Free Programming:

Informally, "lock-free" ≈ "doesn't use mutexes" == any of these.

I do not see why lock-based algorithms can't fall into the lock-free definition given above. Here is a simple lock-based program:

#include <iostream>
#include <mutex>
#include <thread>

std::mutex x_mut;

void print(int& x) {
    std::lock_guard<std::mutex> lk(x_mut);
    std::cout << x;
}

void add(int& x, int y) {
    std::lock_guard<std::mutex> lk(x_mut);
    x += y;
}

int main() {

    int i = 3;

    std::thread thread1{print, std::ref(i)};

    std::thread thread2(add, std::ref(i), 4);

    thread1.join();

    thread2.join();

}

If both of these threads are operating, then after a bounded number of steps, one of them must complete. Why does my program not satisfy the definition of "Lock-free"?

Palawan answered 28/5, 2022 at 13:14 Comment(15)

If you pause one of your threads (eg by adding while(1) sleep(1); to the bottom of its thread-function) then the other thread will likely block indefinitely inside the lock-guard constructor. Thus definition 1 doesn’t apply. – Inactivate 28/5, 2022 at 13:21

Bounded number of steps is a rather vague term. What is a step anyway? Typically an algorithm is called non-blocking if failure of one thread does not affect failure of another thread. And then obstruction-free, lock-free and wait-free are assumed to be non-blocking. Note that in your case if one of the thread is killed before releasing the lock (for whatever reason), then you are deadlocked and cannot progress. In other words, your code theoretically does not have stop property (even though in practice it does). – Nagel 28/5, 2022 at 13:22

"Concurrency in action", while being a pretty good book, doesn't try to be too academic and w/o it a rigorous notion is hard to achieve. Sutter's talks are even more informal so even less expectations should be made. If you want a better theory around lock/wait free you are better off consulting with "The Art of Multiprocessor Programming". – Inclinometer 28/5, 2022 at 13:44

@freakish, Re, "What is a step anyway?" It's an abstraction for "thing that a computing machine can do." When a computer scientist says, "in a bounded number of steps," they aren't interested in knowing how many steps or what the steps actually are. They only are interested in the proof that the number of them either is or is not bounded. – Clementia 28/5, 2022 at 13:50

@SolomonSlow "that thing that a computing machine can do" is hand waving. This is not computer science. And in this particular case it matters. Because waiting for a lock can be interpreted as a single step. Is the number of steps bounded? Sure, even when an algorithm can't make progress because it is deadlocked. I don't like it when people don't care about the meaning of the words they use. – Nagel 28/5, 2022 at 13:55

Because the number of steps (an ambiguous description, but may be measured in units of CPU time, clock cycles, number of machine instructions executed by the system as a whole, etc) a thread has to wait for a mutex is unbounded. If the system scheduler assigns another thread (whether it attempts to lock the mutex or not) a higher priority, a thread trying to lock the mutex may be starved of resources, and NEVER lock the mutex. – Brandling 28/5, 2022 at 14:55

@Brandling waiting for a lock doesn't eat any resources except for time. There is no such thing as "trying to lock" and there is no CPU activity involved here. And measuring time itself is of course not a good way. Because depending on the machine, instructions can take arbitrarily long time. Even on a single machine. Meaning no algorithm is time bounded in that sense. – Nagel 28/5, 2022 at 15:39

@Nagel - Nothing is zero cost. A thread waiting for a lock relies on something outside the thread or its containing process (e.g. the operating system scheduler) explicitly allowing that thread to succeed in grabbing the lock. That something uses some system resources, even if that usage is not measurable by any of the threads that are using that mutex. – Brandling 28/5, 2022 at 15:58

@Brandling sure, nothing is for free. The underlying OS will keep track of waiting threads via some kind of queue. My point is that this can be done in wait-free manner. And AFAIK at least Linux kernel does this. Meaning both operations of acquiring and waiting for a lock are bounded in the number of steps, depending on implementation. Also this depends too much on implementation. It's just those steps can be arbitrarily long (e.g. deadlock during which there is no cpu activity). It is not convenient to think like this about locks. It is easier to abandon this "bounded number of steps" idea... – Nagel 28/5, 2022 at 16:8

...and take wikis approach: en.m.wikipedia.org/wiki/Non-blocking_algorithm and talk about failure/suspension of thread and system-wide/per-thread progress. – Nagel 28/5, 2022 at 16:12

@freakish: The definitions of categories of non-blocking algorithms quoted by the question, (obstruction|lock|wait) free, are abstract computer science definitions. en.wikipedia.org/wiki/Non-blocking_algorithm It's very non-obvious how to map those "steps" to real-world CPUs like x86 where lock xadd [rdi], eax implements fetch_add with one asm instruction (no SW-visible retrying), but which can take thousands of clock cycles, or much more if all cores in a large machine are trying to do it on the same cache line at once. – Molina 28/5, 2022 at 19:25

@Brandling (and others): Anything in std::atomic is wait-free? has a couple answers that take a stab at defining terminology and how it maps to CPUs. One notable point is that ISO C++ won't guarantee anything stronger than lock-free because atomic RMWs need LL/SC retry loops on some machines. .load() does scale perfectly with parallel readers, though, for types where .is_lock_free() is true. – Molina 28/5, 2022 at 19:29

@PeterCordes I don't see how this abstract "steps" distinguish between locks and lock-free. Or in other words: why is acquire/release lock operations not a single step, if you say that lock-prefixed instruction is? Or a bounded sequence of steps (like most implementations of locks are)? My point is that we need to look at other properties (thread suspension and system-wide/per-thread progress) for those terms (obstruction-free, lock-free, wait-free) to be meaningful. – Nagel 28/5, 2022 at 19:32

@freakish: Oh I see, yes you need some sane definition for "step" that rules out blocking forever on a mutex as one step. Sounds a lot like a computer science problem to me: formal definitions, where you can't assume common sense and an understanding of the point of the exercise (like a thread sleeping forever not blocking other threads' progress, and avoidance of livelock) – Molina 28/5, 2022 at 19:44

@Nagel I continued watching Herb Sutter's talk and he gave a "lock-free" algorithm for the producer-consumer problem that did satisfy the non-blocking condition you gave. I believe your condition is correct. Since SO rules require one question per post, I have asked it here. – Palawan 28/5, 2022 at 19:56

Your quote from Concurrency in Action is taken out of context.

In fact, what the book actually says is:

7.1 Definitions and consequences

Algorithms and data structures that use mutexes, condition variables, and futures to synchronize the data are called blocking data structures and algorithms.

Data structures and algorithms that don’t use blocking library functions are said to be nonblocking. Not all these data structures are lock-free ...

Then it proceeds to further break down nonblocking algorithms into Obstruction-Free, Lock-Free and Wait-Free.

So a Lock-Free algorithm is

a nonblocking algorithm (it does not use locks like a mutex) and
it is able to make progress in at least one thread in a bounded number of steps.

So both Herb Sutter and Anthony Williams are correct.

Cayser answered 28/5, 2022 at 19:27 Comment(5)

An algorithm using only std::atomic doesn't automatically make it non-blocking, though. You can (accidentally) implement a lock, and that's not always a bad thing when limited to corner cases. As discussed in Lock-free Progress Guarantees in a circular buffer queue - a writer that sleeps at the wrong time will eventually lead to all other writers seeing it as full, and all readers seeing the queue as empty. Even if they return failure so a thread can work on something else before trying again, no progress is being made on the queue algorithm. – Molina 28/5, 2022 at 19:36

Herb Sutter is correct that "lock-free" is also used to mean "lockless", using atomic operations manually, whether or not the resulting algorithm fits the computer-science definition of non-blocking. (That one thread sleeping forever wouldn't get the rest of them stuck.) Since real systems don't have threads sleep forever, a sufficiently large buffer can insulate against likely sleep times and give you the desired performance, only blocking in corner cases which are rarely or never encountered. – Molina 28/5, 2022 at 19:39

@Cayser Thank you. I have asked a follow up question on how this applies to Herb Sutter's purportedly lock-free producer-consumer algorithm. He uses semaphores, which I believe you would stick in the category of blocking. – Palawan 28/5, 2022 at 20:4

@PeterCordes Yes, of course one could essentially implement a mutex using a busy wait. That would put the algorithm back into the blocking category, even if all it ever uses is std::atomic. The book also addresses this, IIRC. – Cayser 28/5, 2022 at 21:51

This is not the correct answer. 1 is a result of 2, and should be not part of the definition. Specifically, Herlihy mentions that using using a lock blocks the system-wide progress, ‘because an unexpected delay by one thread can prevent others from making progress’. – Reggiereggis 5/10, 2022 at 14:59

I would be careful about saying "bounded" without defining by what.

The canonical lock-free primitives - CAS loops do not give any bounds, if there is heavy contention, a thread can be repeatedly unlucky and wait forever, that is allowed. The defining property of lock-free algorithms is there is always system-wide progress. At any point of time, some thread will make progress.

Stronger guarantee of some progress for each thread at each point of time is called wait-free.

In other words, lock-free guarantees that a misbehaving thread cannot fatally impact all other threads, wait-free cannot fatally impact any thread.

If both of these threads are operating, then after a bounded number of steps, one of them must complete. Why does my program not satisfy the definition of "Lock-free"?

Because an (unfair) scheduler must be taken into account.* If a thread holding the lock is put to sleep, no other thread will be able to make any progress -> not-lock-free and there is certainly no bound. That won't happen with lock-free programming, resources are always available and unfortunate scheduling of one thread can make other threads' operations complete only faster, not slower.

* In particular, where the suspension time for any thread is not limited in frequency or duration. If it was, any algorithm would be wait-free (with some huge constant) by definition.

Bathelda answered 28/5, 2022 at 14:2 Comment(5)

So the basics of lock-free algorithm then is that a thread can either cut the queue and finish before a slower/sleeping thread or take over the work of a said thread finishing its work. Either ability means there is system wide progress. – Recrudesce 28/5, 2022 at 18:37

@GoswinvonBrederlow I'm not 100 percent sure I understand what you mean. E.g. in case of CAS loops for queue's push a failure for thread #1 means that #2 was quicker and actually got its push done. Thus some progress has been made. The main difference compared to locks is that if during push #2 is put to sleep, it cannot harm other threads because it is holding no locks. – Bathelda 28/5, 2022 at 18:43

which then leads to the new thread either cutting ahead or finishing the other threads work, depending on the algorithm. – Recrudesce 28/5, 2022 at 18:54

@GoswinvonBrederlow Yes, that is true. – Bathelda 28/5, 2022 at 19:37

Herlihy and Shavit 2012 also mentions that cache misses, page faults, etc. can cause unexpected delays.... – Reggiereggis 5/10, 2022 at 15:1