Brief Description:
The code below is an implementation of a lock-free queue from C++ Concurrency in Action 2nd Edition. The queue uses a linked list as the underlying data structure and is implemented with split reference counting technology.
What confuses me is that there are multiple compare_exchange_strong
in the code that use the acquire
memory order, but I don't understand why this order is used.
For example, among all accesses to the count
member of a node
object (see below), there are no store operations with release
order, but there are many compare_exchange_strong
operations with acquire
order.
For the complete code for this lock-free queue, my explanation of the code, and a detailed description of my problem, please read the full description section.
Full description:
I am reading the second edition of C++ Concurrency in Action by Anthony Williams. The book describes how to implement a lock-free queue using split reference counting technique. I'll start by briefly explaining how the code works based on my understanding to help you read the code quickly. The full implementation of this queue will be given later.
The implementation uses a singly linked list to implement the queue. The queue holds pointers to the head node and tail node of the linked list, which are head
and tail
in the code respectively, and tail
points to a dummy node.
When pushing an element into the queue, we need to put the element value in the dummy node pointed to by tail
, then add a dummy node after the node, and let tail
point to the new dummy node.
When popping an element from the queue, we should pop the element pointed to by head
, and then set head
to head->next
. When head
and tail
point to the same node (i.e. a dummy node), the queue is empty.
The code uses reference counting to manage the lifetime of deleted nodes. The reference count associated with a node is divided into two parts, the external reference count and the internal reference count. The external count plus the internal count is the full reference count value for that node. The external count is stored in the pointer to the node (i.e. counted_node_ptr
in the code), while the internal count is stored inside the node object (i.e. count
in the code).
In order to prevent the object pointed to by a thread from being deleted by another thread before dereferencing the pointer, the external counter must be incremented first to ensure that the node is not deleted. This is done by increase_external_count()
in the code.
When an external reference no longer refers to a node, the external count stored in it must be added to the node's internal count, which is done by free_external_counter()
. The internal count is stored in count.internal_count
in the node object. And count.external_counters
represents the number of external references referencing this node. Because these external references each hold their own external counts, these counts must all be taken into account. The node can only be safely deleted if and only if the internal count is 0 and the number of the external counter is also 0.
For references in pop
that did not successfully point to the node being popped (usually because another thread has already popped it), those references are simply discarded, but their reference counts must be subtracted. This is done by release_ref()
. This function decrements the internal count of the node by one. Every time the reference count is decremented, or the inner count and the external count are merged, a check is made to see if the condition for the node to be deleted is met.
Here is the complete implementation of the lock-free queue in the book:
template<typename T>
class lock_free_queue
{
private:
struct node;
struct counted_node_ptr
{
int external_count;
node* ptr;
};
std::atomic<counted_node_ptr> head;
std::atomic<counted_node_ptr> tail;
struct node_counter
{
unsigned internal_count:30;
unsigned external_counters:2;
};
struct node
{
std::atomic<T*> data;
std::atomic<node_counter> count;
counted_node_ptr next;
node()
{
node_counter new_count;
new_count.internal_count=0;
new_count.external_counters=2;
count.store(new_count);
next.ptr=nullptr;
next.external_count=0;
data.store(nullptr);
}
void release_ref()
{
node_counter old_counter=
count.load(std::memory_order_relaxed);
node_counter new_counter;
do
{
new_counter=old_counter;
--new_counter.internal_count;
}
while(!count.compare_exchange_strong(
old_counter,new_counter,
std::memory_order_acquire,std::memory_order_relaxed));
if(!new_counter.internal_count &&
!new_counter.external_counters)
{
delete this;
}
}
};
static void increase_external_count(
std::atomic<counted_node_ptr>& counter,
counted_node_ptr& old_counter)
{
counted_node_ptr new_counter;
do
{
new_counter=old_counter;
++new_counter.external_count;
}
while(!counter.compare_exchange_strong(
old_counter,new_counter,
std::memory_order_acquire,std::memory_order_relaxed));
old_counter.external_count=new_counter.external_count;
}
static void free_external_counter(counted_node_ptr &old_node_ptr)
{
node* const ptr=old_node_ptr.ptr;
int const count_increase=old_node_ptr.external_count-2;
node_counter old_counter=
ptr->count.load(std::memory_order_relaxed);
node_counter new_counter;
do
{
new_counter=old_counter;
--new_counter.external_counters;
new_counter.internal_count+=count_increase;
}
while(!ptr->count.compare_exchange_strong(
old_counter,new_counter,
std::memory_order_acquire,std::memory_order_relaxed));
if(!new_counter.internal_count &&
!new_counter.external_counters)
{
delete ptr;
}
}
public:
lock_free_queue() {
counted_node_ptr h;
h.ptr = new node;
h.external_count = 1;
head.store(h);
tail.store(h);
}
lock_free_queue(const lock_free_queue&) = delete;
lock_free_queue& operator=(const lock_free_queue&) = delete;
~lock_free_queue() {
while (pop())
;
}
void push(T new_value)
{
std::unique_ptr<T> new_data(new T(new_value));
counted_node_ptr new_next;
new_next.ptr=new node;
new_next.external_count=1;
counted_node_ptr old_tail=tail.load();
for(;;)
{
increase_external_count(tail,old_tail);
T* old_data=nullptr;
if(old_tail.ptr->data.compare_exchange_strong(
old_data,new_data.get()))
{
old_tail.ptr->next=new_next;
old_tail=tail.exchange(new_next);
free_external_counter(old_tail);
new_data.release();
break;
}
old_tail.ptr->release_ref();
}
}
std::unique_ptr<T> pop()
{
counted_node_ptr old_head=head.load(std::memory_order_relaxed);
for(;;)
{
increase_external_count(head,old_head);
node* const ptr=old_head.ptr;
if(ptr==tail.load().ptr)
{
ptr->release_ref();
return std::unique_ptr<T>();
}
if(head.compare_exchange_strong(old_head,ptr->next))
{
T* const res=ptr->data.exchange(nullptr);
free_external_counter(old_head);
return std::unique_ptr<T>(res);
}
ptr->release_ref();
}
}
};
I think I fully understand how push
and pop
work and think they are correct. But what I don't understand is why some atomic operations in the code use that memory order, and the book doesn't give a reason.
The count
member of the node
object stores the node's internal count value, as well as the number of external references to the node. The count
member is used only a few times in the code:
- In the constructor of
node
, there is a seq-cststore
forcount
:
node()
{
node_counter new_count;
new_count.internal_count=0;
new_count.external_counters=2;
count.store(new_count); // seq-cst store
next.ptr=nullptr;
next.external_count=0;
}
- In
release_ref
, there arerelaxed
load
andcompare_exchange_strong
forcount
:
void release_ref()
{
node_counter old_counter=
count.load(std::memory_order_relaxed); // relaxed load
node_counter new_counter;
do
{
new_counter=old_counter;
--new_counter.internal_count;
}
while(!count.compare_exchange_strong(
old_counter,new_counter,
std::memory_order_acquire,std::memory_order_relaxed)); // acquire RMW operation
// ...
}
- In
free_external_counter
, there are alsorelaxed
load
andacquire
compare_exchange_strong
:
static void free_external_counter(counted_node_ptr &old_node_ptr)
{
node* const ptr=old_node_ptr.ptr;
int const count_increase=old_node_ptr.external_count-2;
node_counter old_counter=
ptr->count.load(std::memory_order_relaxed); // relaxed load
node_counter new_counter;
do
{
new_counter=old_counter;
--new_counter.external_counters;
new_counter.internal_count+=count_increase;
}
while(!ptr->count.compare_exchange_strong(
old_counter,new_counter,
std::memory_order_acquire,std::memory_order_relaxed)); // acquire RMW
// ...
}
It can be seen that, except for a seq-cst store performed in the constructor of node
, other operations are load operations with relaxed
or acquire
order, but no store operations with release
order.
So what is the purpose of using acquire
memory order here? What operation should it be synchronized with?
In addition, increase_external_count
also uses acquire
order, what is the reason here? While other atomic operations use the default seq-cst ordering, some of these operations seem to be able to be replaced with weaker memory ordering as well.
queue
, I'll edit the question right away. – Filagreefree_external_counter
, the dereference ofptr
can theoretically be reordered after thecompare_exchange_strong
. That doesn't make sense in real life, since you had to finish dereferencingptr
in order to do thecompare_exchange
in the first place, but the memory model allows it. So if thecompare_exchange
decrements the count to 1, another thread can be deletingptr
in a race with the delayed dereference. – Crosscurrentcompare_exchange_strong
infree_external_counter
fromacquire
toseq-cst
, the sanitizer will not report errors. – Filagreerelease_ref
has a similar race that the sanitizer does not detect. In both cases, at first glance it looks likeacq_rel
would suffice, but I'd want to sit down and write a proof. – Crosscurrentfree_external_counter
does here) has a release order, and once the reference count reaches 0, an acquire load operation should be added to establish synchronization. I just tested the same code by modifyingcompare_exchange_strong
to release order, and adding a load with acquire order to the success branch of theif
below, and now Sanitizer doesn't report an error. – Filagreedata
bug in the question. – Crosscurrentcompare_exchange
inincrease_external_count
can be weakened to relaxed. When it's called, the counter must already be positive, or else accessing the counter itself is unsafe. So code that follows thecompare_exchange
would be equally well protected if it were reordered before it. (This is not a proof, of course.) – Crosscurrent