A lock-free stack can be implemented as a singly linked list. This seems simple until we have to think about what to do with nodes after they've been popped. One strategy is to simply move them to a per-stack LIFO freelist (from which nodes can be reused by subsequent push operations) until eventually all threads are done with the stack, at which point a single thread destroys all nodes in the stack and all nodes in the freelist. Boost.Lockfree uses this strategy. So does Chris Wellons's C11 implementation. I will refer to the latter, because it's easier to read and the details are essentially the same, since C11 atomics are very similar to C++11 atomics.
In Wellons's implementation, which can be found on GitHub here, all lstack_node
objects are non-atomic. In particular, this means that all accesses to the next
member of an lstack_node
object are non-atomic. What I am unable to understand is: why is it that such accesses never race with each other?
The next
member is read at lstack.c:30. It is written at lstack.c:39. If these two lines can execute concurrently on the same lstack_node
object, then the program contains a race. Is this possible? It seems possible to me:
- Thread 1 calls
lstack_pop
, which callspop
. It atomically loads the head node's value into the local variableorig
. Now,orig.node
is a pointer to the node that was at the top of the stack just now. (Note that up until this point, only local variables have been modified, so it is impossible for anything that thread 1 has done so far to make a CAS fail in any other thread.) Meanwhile... - Thread 2 calls
lstack_pop
.pop
succeeds and returnsnode
, a pointer to the node that has just been excised from the stack; this is the same node thatorig.node
points to in thread 1. It then begins to callpush
in order to addnode
to the freelist. The freelist head node is atomically loaded, andnode->next
is set to point to the first node in the freelist. - Oops. This races with the read to
orig.node->next
in thread 1.
Could Wellons's implementation simply be wrong? I doubt it. If his implementation is wrong, then so is the Boost one, because the only way to fix (what appears to me to be) the race condition is to make next
atomic. But I don't think the Boost implementation could be wrong in such a basic way without it having been noticed and fixed by now. So I must have made a mistake in my reasoning.