I wanted to understand what does one mean by lock_free property of atomic variables in c++11. I did googled out and saw the other relevant questions on this forum but still having partial understanding. Appreciate if someone can explain it end-to-end and in simple way.
Jerry already mentioned common correctness problems with locks, i.e. they're hard to understand and program correctly.
Another danger with locks is that you lose determinism regarding your execution time: if a thread that has acquired a lock gets delayed (e.g. descheduled by the operating system, or "swapped out"), then it is possible that the entire program is delayed because it is waiting for the lock. By contrast, a lock-free algorithm is always guaranteed to make some progress, even if any number of threads are held up somewhere else.
By and large, lock-free programming is often slower (sometimes significantly so) than locked programming using non-atomic operations, because atomic operations cause a significant hit on caching and pipelining; however, it offers determinism and upper bounds on latency (at least overall latency of your process; as @J99 observed, individual threads may still be starved as long as enough other threads are making progress). Your program may get a lot slower, but it never locks up entirely and always makes some progress.
The very nature of hardware architectures allows for certain small operations to be intrinsically atomic. In fact, this is a very necessity of any hardware that supports multitasking and multithreading. At the very heart of any synchronisation primitive, such as a mutex, you need some sort of atomic instruction that guarantees correct locking behaviour.
So, with that in mind, we now know that it is possible for certain types like booleans and machine-sized integers to be loaded, stored and exchanged atomically. Thus when we wrap such a type into an std::atomic
template, we can expect that the resulting data type will indeed offer load, store and exchange operations that do not use locks. By contrast, your library implementation is always allowed to implement an atomic Foo
as an ordinary Foo
guarded by a lock.
To test whether an atomic object is lock-free, you can use the is_lock_free
member function. Additionally, there are ATOMIC_*_LOCK_FREE
macros that tell you whether atomic primitive types potentially have a lock-free instantiation. If you are writing concurrent algorithms that you want to be lock-free, you should thus include an assertion that your atomic objects are indeed lock-free, or a static assertion on the macro to have value 2
(meaning that every object of the corresponding type is always lock-free).
It's probably easiest to start by talking about what would happen if it was not lock-free.
The most obvious way to handle most atomic tasks is by locking. For example, to ensure that only one thread writes to a variable at a time, you can protect it with a mutex. Any code that's going to write to the variable needs obtain the mutex before doing the write (and release it afterwards). Only one thread can own the mutex at a time, so as long as all the threads follow the protocol, no more than one can write at any given time.
If you're not careful, however, this can be open to deadlock. For example, let's assume you need to write to two different variables (each protected by a mutex) as an atomic operation -- i.e., you need to ensure that when you write to one, you also write to the other). In such a case, if you aren't careful, you can cause a deadlock. For example, let's call the two mutexes A and B. Thread 1 obtains mutex A, then tries to get mutex B. At the same time, thread 2 obtains mutex B, and then tries to get mutex A. Since each is holding one mutex, neither can obtain both, and neither can progress toward its goal.
There are various strategies to avoid them (e.g., all threads always try to obtain the mutexes in the same order, or upon failure to obtain a mutex within a reasonable period of time, each thread releases the mutex it holds, waits some random amount of time, and then tries again).
With lock-free programming, however, we (obviously enough) don't use locks. This means that a deadlock like above simply cannot happen. When done properly, you can guarantee that all threads continuously progress toward their goal. Contrary to popular belief, it does not mean the code will necessarily run any faster than well written code using locks. It does, however, mean that deadlocks like the above (and some other types of problems like livelocks and some types of race conditions) are eliminated.
Now, as to exactly how you do that: the answer is short and simple: it varies -- widely. In a lot of cases, you're looking at specific hardware support for doing certain specific operations atomically. Your code either uses those directly, or extends them to give higher level operations that are still atomic and lock free. It's even possible (though only rarely practical) to implement lock-free atomic operations without hardware support (but given its impracticality, I'll pass on trying to go into more detail about it, at least for now).
is_lock_free
should/will return false
(and if you're not, it should return true). –
Medwin Jerry already mentioned common correctness problems with locks, i.e. they're hard to understand and program correctly.
Another danger with locks is that you lose determinism regarding your execution time: if a thread that has acquired a lock gets delayed (e.g. descheduled by the operating system, or "swapped out"), then it is possible that the entire program is delayed because it is waiting for the lock. By contrast, a lock-free algorithm is always guaranteed to make some progress, even if any number of threads are held up somewhere else.
By and large, lock-free programming is often slower (sometimes significantly so) than locked programming using non-atomic operations, because atomic operations cause a significant hit on caching and pipelining; however, it offers determinism and upper bounds on latency (at least overall latency of your process; as @J99 observed, individual threads may still be starved as long as enough other threads are making progress). Your program may get a lot slower, but it never locks up entirely and always makes some progress.
The very nature of hardware architectures allows for certain small operations to be intrinsically atomic. In fact, this is a very necessity of any hardware that supports multitasking and multithreading. At the very heart of any synchronisation primitive, such as a mutex, you need some sort of atomic instruction that guarantees correct locking behaviour.
So, with that in mind, we now know that it is possible for certain types like booleans and machine-sized integers to be loaded, stored and exchanged atomically. Thus when we wrap such a type into an std::atomic
template, we can expect that the resulting data type will indeed offer load, store and exchange operations that do not use locks. By contrast, your library implementation is always allowed to implement an atomic Foo
as an ordinary Foo
guarded by a lock.
To test whether an atomic object is lock-free, you can use the is_lock_free
member function. Additionally, there are ATOMIC_*_LOCK_FREE
macros that tell you whether atomic primitive types potentially have a lock-free instantiation. If you are writing concurrent algorithms that you want to be lock-free, you should thus include an assertion that your atomic objects are indeed lock-free, or a static assertion on the macro to have value 2
(meaning that every object of the corresponding type is always lock-free).
Lock-free is one of the non-blocking techniques. For an algorithm, it involves a global progress property: whenever a thread of the program is active, it can make a forward step in its action, for itself or eventually for the other.
Lock-free algorithms are supposed to have a better behavior under heavy contentions where threads acting on a shared resources may spent a lot of time waiting for their next active time slice. They are also almost mandatory in context where you can't lock, like interrupt handlers.
Implementations of lock-free algorithms almost always rely on Compare-and-Swap (some may used things like ll/sc) and strategy where visible modification can be simplified to one value (mostly pointer) change, making it a linearization point, and looping over this modification if the value has change. Most of the time, these algorithms try to complete jobs of other threads when possible. A good example is the lock-free queue of Micheal&Scott (http://www.cs.rochester.edu/research/synchronization/pseudocode/queues.html).
For lower-level instructions like Compare-and-Swap, it means that the implementation (probably the micro-code of the corresponding instruction) is wait-free (see http://www.diku.dk/OLD/undervisning/2005f/dat-os/skrifter/lockfree.pdf)
For the sake of completeness, wait-free algorithm enforces progression for each threads: each operations are guaranteed to terminate in a finite number of steps.
There are some detailed answer that dive into computer science about lock-free vs lock based algorithm, yet some practical aspects are not covered, I'll try to address them:
- Non
is_lock_free
atomics are meant to make code withstd::atomic
portable, but it is not an efficient mode of operation. - Specifically, regarding lock-free isn't necessarily faster. The faster lock-based algorithm means that it uses mutexes or other primitives that can do wait instead of any atomics. When a lock-free algorithm is implemented using non-
is_lock_free
-atomics, it is not faster than the same algorithm implemented usingis_lock_free
-atomics. - Some exceptions from this are possible. On x86-64, there is 128-bit compare-exchange, but plain atomic 128-bit loads and stores were only recently documented by Intel as properly atomic. As a result, lock-free 128-bit atomic may rely on compare-exchange even for plain loads, which result in that lock-based implementation may scale for simultaneous loads better by using a shared mutex for its lock.
- Also it may matter for inter-process. lock-free atomic is also inherently address free, i.e. their representation is self-contained, and does not depend on its own memory address. So they may reside in a shared memory. Lock based atomic is not necessarily address-free, as the lock may reside in per-process hash table, plus the lock itself may be not address-free.
© 2022 - 2024 — McMap. All rights reserved.