What does the is_lock_free property of std::atomic mean?

M

4

5

I wanted to understand what does one mean by lock_free property of atomic variables in c++11. I did googled out and saw the other relevant questions on this forum but still having partial understanding. Appreciate if someone can explain it end-to-end and in simple way.

Mcnary answered 5/9, 2012 at 5:52 Comment(0)

P

6

Jerry already mentioned common correctness problems with locks, i.e. they're hard to understand and program correctly.

Another danger with locks is that you lose determinism regarding your execution time: if a thread that has acquired a lock gets delayed (e.g. descheduled by the operating system, or "swapped out"), then it is possible that the entire program is delayed because it is waiting for the lock. By contrast, a lock-free algorithm is always guaranteed to make some progress, even if any number of threads are held up somewhere else.

By and large, lock-free programming is often slower (sometimes significantly so) than locked programming using non-atomic operations, because atomic operations cause a significant hit on caching and pipelining; however, it offers determinism and upper bounds on latency (at least overall latency of your process; as @J99 observed, individual threads may still be starved as long as enough other threads are making progress). Your program may get a lot slower, but it never locks up entirely and always makes some progress.

The very nature of hardware architectures allows for certain small operations to be intrinsically atomic. In fact, this is a very necessity of any hardware that supports multitasking and multithreading. At the very heart of any synchronisation primitive, such as a mutex, you need some sort of atomic instruction that guarantees correct locking behaviour.

So, with that in mind, we now know that it is possible for certain types like booleans and machine-sized integers to be loaded, stored and exchanged atomically. Thus when we wrap such a type into an std::atomic template, we can expect that the resulting data type will indeed offer load, store and exchange operations that do not use locks. By contrast, your library implementation is always allowed to implement an atomic Foo as an ordinary Foo guarded by a lock.

To test whether an atomic object is lock-free, you can use the is_lock_free member function. Additionally, there are ATOMIC_*_LOCK_FREE macros that tell you whether atomic primitive types potentially have a lock-free instantiation. If you are writing concurrent algorithms that you want to be lock-free, you should thus include an assertion that your atomic objects are indeed lock-free, or a static assertion on the macro to have value 2 (meaning that every object of the corresponding type is always lock-free).

Physiology answered 5/9, 2012 at 7:2 Comment(3)

Is it even true that lock free programming has an upper bound on latency? The classic lock free operation is a compare and swap. Yes, the operation is bounded, but you have generally to wrap it in a loop until it suceeds, and that loop could loop forever if some other operation on the data just happens to always occur at just the wrong moment. – Iand 5/9, 2012 at 8:46

@J99: If the loop fails and wraps, it means that another thread made progress. Though indeed any given thread could be starved. I'll change the wording. – Physiology 5/9, 2012 at 8:51

Ok I see your point. It's starvation rather than blocking. Fair enough. – Iand 5/9, 2012 at 8:53

M

10

It's probably easiest to start by talking about what would happen if it was not lock-free.

The most obvious way to handle most atomic tasks is by locking. For example, to ensure that only one thread writes to a variable at a time, you can protect it with a mutex. Any code that's going to write to the variable needs obtain the mutex before doing the write (and release it afterwards). Only one thread can own the mutex at a time, so as long as all the threads follow the protocol, no more than one can write at any given time.

If you're not careful, however, this can be open to deadlock. For example, let's assume you need to write to two different variables (each protected by a mutex) as an atomic operation -- i.e., you need to ensure that when you write to one, you also write to the other). In such a case, if you aren't careful, you can cause a deadlock. For example, let's call the two mutexes A and B. Thread 1 obtains mutex A, then tries to get mutex B. At the same time, thread 2 obtains mutex B, and then tries to get mutex A. Since each is holding one mutex, neither can obtain both, and neither can progress toward its goal.

There are various strategies to avoid them (e.g., all threads always try to obtain the mutexes in the same order, or upon failure to obtain a mutex within a reasonable period of time, each thread releases the mutex it holds, waits some random amount of time, and then tries again).

With lock-free programming, however, we (obviously enough) don't use locks. This means that a deadlock like above simply cannot happen. When done properly, you can guarantee that all threads continuously progress toward their goal. Contrary to popular belief, it does not mean the code will necessarily run any faster than well written code using locks. It does, however, mean that deadlocks like the above (and some other types of problems like livelocks and some types of race conditions) are eliminated.

Now, as to exactly how you do that: the answer is short and simple: it varies -- widely. In a lot of cases, you're looking at specific hardware support for doing certain specific operations atomically. Your code either uses those directly, or extends them to give higher level operations that are still atomic and lock free. It's even possible (though only rarely practical) to implement lock-free atomic operations without hardware support (but given its impracticality, I'll pass on trying to go into more detail about it, at least for now).

Medwin answered 5/9, 2012 at 6:21 Comment(2)

Thanks a lot, Jerry. Does that mean when atomic variables achieves atomicity by implementing locks (i.e with help of mutex), is_lock_free will return false and true when its not using locks to achieve atomicity? Just little confused about return value of is_lock_free function and what that return value means? – Mcnary 5/9, 2012 at 6:42

@tshah06: Yes, if you're using something like a mutex to assure atomicity when accessing an atomic variable, then is_lock_free should/will return false (and if you're not, it should return true). – Medwin 5/9, 2012 at 13:33

P

6

Jerry already mentioned common correctness problems with locks, i.e. they're hard to understand and program correctly.

Another danger with locks is that you lose determinism regarding your execution time: if a thread that has acquired a lock gets delayed (e.g. descheduled by the operating system, or "swapped out"), then it is possible that the entire program is delayed because it is waiting for the lock. By contrast, a lock-free algorithm is always guaranteed to make some progress, even if any number of threads are held up somewhere else.

By and large, lock-free programming is often slower (sometimes significantly so) than locked programming using non-atomic operations, because atomic operations cause a significant hit on caching and pipelining; however, it offers determinism and upper bounds on latency (at least overall latency of your process; as @J99 observed, individual threads may still be starved as long as enough other threads are making progress). Your program may get a lot slower, but it never locks up entirely and always makes some progress.

The very nature of hardware architectures allows for certain small operations to be intrinsically atomic. In fact, this is a very necessity of any hardware that supports multitasking and multithreading. At the very heart of any synchronisation primitive, such as a mutex, you need some sort of atomic instruction that guarantees correct locking behaviour.

So, with that in mind, we now know that it is possible for certain types like booleans and machine-sized integers to be loaded, stored and exchanged atomically. Thus when we wrap such a type into an std::atomic template, we can expect that the resulting data type will indeed offer load, store and exchange operations that do not use locks. By contrast, your library implementation is always allowed to implement an atomic Foo as an ordinary Foo guarded by a lock.

To test whether an atomic object is lock-free, you can use the is_lock_free member function. Additionally, there are ATOMIC_*_LOCK_FREE macros that tell you whether atomic primitive types potentially have a lock-free instantiation. If you are writing concurrent algorithms that you want to be lock-free, you should thus include an assertion that your atomic objects are indeed lock-free, or a static assertion on the macro to have value 2 (meaning that every object of the corresponding type is always lock-free).

Physiology answered 5/9, 2012 at 7:2 Comment(3)

Is it even true that lock free programming has an upper bound on latency? The classic lock free operation is a compare and swap. Yes, the operation is bounded, but you have generally to wrap it in a loop until it suceeds, and that loop could loop forever if some other operation on the data just happens to always occur at just the wrong moment. – Iand 5/9, 2012 at 8:46

@J99: If the loop fails and wraps, it means that another thread made progress. Though indeed any given thread could be starved. I'll change the wording. – Physiology 5/9, 2012 at 8:51

Ok I see your point. It's starvation rather than blocking. Fair enough. – Iand 5/9, 2012 at 8:53

U

2

Lock-free is one of the non-blocking techniques. For an algorithm, it involves a global progress property: whenever a thread of the program is active, it can make a forward step in its action, for itself or eventually for the other.

Lock-free algorithms are supposed to have a better behavior under heavy contentions where threads acting on a shared resources may spent a lot of time waiting for their next active time slice. They are also almost mandatory in context where you can't lock, like interrupt handlers.

Implementations of lock-free algorithms almost always rely on Compare-and-Swap (some may used things like ll/sc) and strategy where visible modification can be simplified to one value (mostly pointer) change, making it a linearization point, and looping over this modification if the value has change. Most of the time, these algorithms try to complete jobs of other threads when possible. A good example is the lock-free queue of Micheal&Scott (http://www.cs.rochester.edu/research/synchronization/pseudocode/queues.html).

For lower-level instructions like Compare-and-Swap, it means that the implementation (probably the micro-code of the corresponding instruction) is wait-free (see http://www.diku.dk/OLD/undervisning/2005f/dat-os/skrifter/lockfree.pdf)

For the sake of completeness, wait-free algorithm enforces progression for each threads: each operations are guaranteed to terminate in a finite number of steps.

Umber answered 22/5, 2014 at 15:25 Comment(0)

B

2

There are some detailed answer that dive into computer science about lock-free vs lock based algorithm, yet some practical aspects are not covered, I'll try to address them:

Non is_lock_free atomics are meant to make code with std::atomic portable, but it is not an efficient mode of operation.
Specifically, regarding lock-free isn't necessarily faster. The faster lock-based algorithm means that it uses mutexes or other primitives that can do wait instead of any atomics. When a lock-free algorithm is implemented using non-is_lock_free-atomics, it is not faster than the same algorithm implemented using is_lock_free-atomics.
Some exceptions from this are possible. On x86-64, there is 128-bit compare-exchange, but plain atomic 128-bit loads and stores were only recently documented by Intel as properly atomic. As a result, lock-free 128-bit atomic may rely on compare-exchange even for plain loads, which result in that lock-based implementation may scale for simultaneous loads better by using a shared mutex for its lock.
Also it may matter for inter-process. lock-free atomic is also inherently address free, i.e. their representation is self-contained, and does not depend on its own memory address. So they may reside in a shared memory. Lock based atomic is not necessarily address-free, as the lock may reside in per-process hash table, plus the lock itself may be not address-free.

Babb answered 28/9, 2023 at 8:47 Comment(2)

I got here by googling for the meaning of the phrase "address-free" in the sense you're using it — but after reading your answer I still don't know what "address-free" means. Could you update your answer with an explanation and/or a link to some explanation of that term? – Irrespective 25/1, 2024 at 17:20

@Quuxplusone, I tried to cover that briefly, if that does not help, then this topic probably deserves a separate Q&A – Babb 25/1, 2024 at 17:27

Recommended topics

Hot tags