Lock-free memory reclamation with hazard pointers
Asked Answered
E

2

14

Hazard pointers are a technique for safely reclaiming memory in lock-free code without garbage-collection.

The idea is that before accessing an object that can be deleted concurrently, a thread sets its hazard pointer to point to that object. A thread that wants to delete an object will first check whether any hazard pointers are set to point to that object. If so, deletion will be postponed, so that the accessing thread does not end up reading deleted data.

Now, imagine our deleting thread starts to iterate the list of hazard pointers and at the i+1 element it gets preempted. Now another thread sets the hazard pointer at i to the object that the deleting thread is currently trying to delete. Afterwards, the deleting thread resumes, checks the rest of the list, and deletes the object, even though there is now a hazard pointer at position i pointing to the object.

So clearly just setting the hazard pointer is not enough, as a deleting thread might already have checked our hazard pointer and decided that our thread does not want to access the object. How can I make sure, after setting a hazard pointer, that the object I'm trying to access won't be deleted from under my hands?

Elea answered 8/8, 2014 at 13:20 Comment(5)
It seems that, at the very least, the deleting thread would have to set its own hazard pointer before checking the hazard pointer of other threads.Laodicean
@VaughnCato What would you accomplish by that? Setting a hazard pointer is done without checking the list first, so you cannot synchronize with accessing threads this way. You can also not synchronize with deleting threads to avoid double-deletion, because you would run into the exact same race described in the question.Elea
You might find it helpful to read the example code in Maged Michael's original paper.Shipentine
Wouldn't any other thread that had a pointer to the object being deleted already have it in its hazard list? I think that before an object was retired, nothing would be pointing to it except the thread that is going to delete it.Laodicean
Thanks for the suggestions everyone. I think I got it now. I added a community wiki answer to flesh out how it is supposed to work. I hope you find it useful.Elea
E
17

The Authoritative Answer

The original paper by Maged M. Michael places this important restriction on algorithms using hazard pointers:

The methodology requires lock-free algorithms to guarantee that no thread can access a dynamic node at a time when it is possibly removed from the object, unless at least one of the thread’s associated hazard pointers has been pointing to that node continuously, from a time when the node was guaranteed to be reachable from the object’s roots. The methodology prevents the freeing of any retired node continuously pointed to by one or more hazard pointers of one or more threads from a point prior to its removal.

What it means for the deleting thread

As pointed out in Anton's answer, deletion is a two-phase operation: First you have to 'unpublish' the node, remove it from the data structure so that it can no longer be accessed from the public interface.

At this point, the node is possibly removed, in Michael's terms. It is no longer safe for concurrent threads to access it (unless they already have been holding a hazard pointer to it throughout).

Thus, once a node is possibly removed, it is safe for the deleting thread to iterate the list of hazard pointers. Even if the deleting thread gets preempted, a concurrent thread may not access the node anymore. After verifying that no hazard pointers are set to the node, the deleting thread can safely proceed to the second phase of deletion: The actual deallocation.

In summary, the order of operations for the deleting thread is

D-1. Remove the node from the data structure.
D-2. Iterate the list of hazard pointers.
D-3. If no hazards were found, delete the node.

The real algorithm is slightly more involved, as we need to maintain a list of those nodes that cannot be reclaimed and ensure that they get deleted eventually. This has been skipped here, as it is not relevant to explain the issue raised in the question.

What it means for the accessing thread(s)

Setting the hazard pointer is not enough to guarantee safe access to it. After all, the node might be possibly removed by the time we set our hazard pointer.

The only way to ensure safe access is if we can guarantee that our hazard pointer has been pointing to that node continuously, from a time when the node was guaranteed to be reachable from the object’s roots.

Since the code is supposed to be lock-free, there is only one way to achieve this: We optimistically set our hazard pointer to the node and then check whether that node has been marked as possibly deleted (that is, it is no longer reachable from the public root) afterwards.

Thus the order of operations for the accessing thread is

A-1. Obtain a pointer to the node by traversing the data structure.
A-2. Set the hazard pointer to point to the node.
A-3. Check that the node is still part of the data structure.
     That is, it has not been possibly removed in the meantime.
A-4. If the node is still valid, access it.

Potential Races affecting the deleting thread

After a node has been possibly removed (D-1), the deleting thread could be preempted. Thus it is still possible for concurrent threads to optimistically set their hazard pointer to it (even though they are not allowed to access it) (A-2).

Therefore, the deleting thread might detect a spurious hazard, preventing it from deleting the node right away, even though none of the other threads will access the node anymore. This will simply delay deletion of the node in the same way a legitimate hazard would.

The important point is that the node will still be deleted eventually.

Potential Races affecting the accessing thread

An accessing thread may get preempted by a deleting thread before verifying that the node has not been potentially removed (A-3). In such a case, it is no longer allowed to access the object.

Note that in case the preemption occurs after A-2, it would even be safe for the accessing thread to access the node (since there was a hazard pointer pointing to the node throughout), but since it is impossible for the accessing thread to distinguish this case, it must fail spuriously.

The important point is that a node will only ever be accessed if it has not been deleted.

Elea answered 8/8, 2014 at 13:20 Comment(1)
I find it weird to use "preempted" to talk about cases where multiple threads are doing things at the same time. If you only had a uniprocessor system, so pre-emption was the only way for threads to interleave their operations asynchronously, synchronization would be much cheaper.Shifrah
Z
4

A thread that wants to delete an object will first check whether any hazard pointers are set to point to that object.

Here is the problem. 'delete' actually is two-phase operation:

  1. remove from a container or any other public structure. Generally speaking, unpublish it.
  2. deallocate the memory

So, the iteration through the hazard pointers must go between them to prevent the situation you described as:

another thread sets the hazard pointer at i to the object that the deleting thread is currently trying to delete

because there must be no way for another thread to acquire the object being deleted.

Zaria answered 8/8, 2014 at 13:59 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.