The Authoritative Answer
The original paper by Maged M. Michael places this important restriction on algorithms using hazard pointers:
The methodology requires lock-free algorithms to guarantee that no
thread can access a dynamic node at a time when it is possibly removed
from the object, unless at least one of the thread’s associated hazard
pointers has been pointing to that node continuously, from a time when
the node was guaranteed to be reachable from the object’s roots. The
methodology prevents the freeing of any retired node continuously
pointed to by one or more hazard pointers of one or more threads from
a point prior to its removal.
What it means for the deleting thread
As pointed out in Anton's answer, deletion is a two-phase operation: First you have to 'unpublish' the node, remove it from the data structure so that it can no longer be accessed from the public interface.
At this point, the node is possibly removed, in Michael's terms. It is no longer safe for concurrent threads to access it (unless they already have been holding a hazard pointer to it throughout).
Thus, once a node is possibly removed, it is safe for the deleting thread to iterate the list of hazard pointers. Even if the deleting thread gets preempted, a concurrent thread may not access the node anymore. After verifying that no hazard pointers are set to the node, the deleting thread can safely proceed to the second phase of deletion: The actual deallocation.
In summary, the order of operations for the deleting thread is
D-1. Remove the node from the data structure.
D-2. Iterate the list of hazard pointers.
D-3. If no hazards were found, delete the node.
The real algorithm is slightly more involved, as we need to maintain a list of those nodes that cannot be reclaimed and ensure that they get deleted eventually. This has been skipped here, as it is not relevant to explain the issue raised in the question.
What it means for the accessing thread(s)
Setting the hazard pointer is not enough to guarantee safe access to it. After all, the node might be possibly removed by the time we set our hazard pointer.
The only way to ensure safe access is if we can guarantee that our hazard pointer has been pointing to that node continuously, from a time when the node was guaranteed to be reachable from the object’s roots.
Since the code is supposed to be lock-free, there is only one way to achieve this: We optimistically set our hazard pointer to the node and then check whether that node has been marked as possibly deleted (that is, it is no longer reachable from the public root) afterwards.
Thus the order of operations for the accessing thread is
A-1. Obtain a pointer to the node by traversing the data structure.
A-2. Set the hazard pointer to point to the node.
A-3. Check that the node is still part of the data structure.
That is, it has not been possibly removed in the meantime.
A-4. If the node is still valid, access it.
Potential Races affecting the deleting thread
After a node has been possibly removed (D-1
), the deleting thread could be preempted. Thus it is still possible for concurrent threads to optimistically set their hazard pointer to it (even though they are not allowed to access it) (A-2
).
Therefore, the deleting thread might detect a spurious hazard, preventing it from deleting the node right away, even though none of the other threads will access the node anymore. This will simply delay deletion of the node in the same way a legitimate hazard would.
The important point is that the node will still be deleted eventually.
Potential Races affecting the accessing thread
An accessing thread may get preempted by a deleting thread before verifying that the node has not been potentially removed (A-3
). In such a case, it is no longer allowed to access the object.
Note that in case the preemption occurs after A-2
, it would even be safe for the accessing thread to access the node (since there was a hazard pointer pointing to the node throughout), but since it is impossible for the accessing thread to distinguish this case, it must fail spuriously.
The important point is that a node will only ever be accessed if it has not been deleted.