How does the split reference counting work in a lock free stack?
Asked Answered
U

1

8

I am reading C++ concurrency in action 2nd. It introduces split reference counts for a lock free stack.

One possible technique involves the use of not one but two reference counts for each node: an internal count and an external count. The sum of these values is the total number of references to the node. The external count is kept alongside the pointer to the node and is increased every time the pointer is read. When the reader is finished with the node, it decreases the internal count. A simple operation that reads the pointer will leave the external count increased by one and the internal count decreased by one when it’s finished.

Above phrase explains the brief concept of split reference count. It sounds like that external count is always increased and internal count is always decreased.

When the external count/pointer pairing is no longer required (the node is no longer accessible from a location accessible to multiple threads), the internal count is increased by the value of the external count minus one and the external counter is discarded. Once the internal count is equal to zero, there are no outstanding references to the node and it can be safely deleted. It’s still important to use atomic operations for updates of shared data. Let’s now look at an implementation of a lock-free stack that uses this technique to ensure that the nodes are reclaimed only when it’s safe to do so.

But in the above phrase, the internal count should be increased by the value of the external count minus one when the node is no longer accessible. I'm very confused about this explanation.

(1) What is the exact purpose of using internal and external count?

(2) Why does it require two reference counts instead of one?

Edit: I added example code below from the book.

template <typename T>
class lock_free_stack {
 private:
  struct node;
  struct counted_node_ptr {
    int external_count;
    node* ptr;
  };
  struct node {
    std::shared_ptr<T> data;
    std::atomic<int> internal_count;
    counted_node_ptr next;
    node(T const& data_)
        : data(std::make_shared<T>(data_)), internal_count(0) {}
  };
  std::atomic<counted_node_ptr> head;

  void increase_head_count(counted_node_ptr& old_counter) {
    counted_node_ptr new_counter;
    do {
      new_counter = old_counter;
      ++new_counter.external_count;
    } while (!head.compare_exchange_strong(old_counter, new_counter,
                                           std::memory_order_acquire,
                                           std::memory_order_relaxed));
    old_counter.external_count = new_counter.external_count;
  }

 public:
  ~lock_free_stack() {
    while (pop())
      ;
  }

  void push(T const& data) {
    counted_node_ptr new_node;
    new_node.ptr = new node(data);
    new_node.external_count = 1;
    new_node.ptr->next = head.load();
    while (!head.atomic_compare_exchange_weak(new_node.ptr->next, new_node,
                                              std::memory_order_release,
                                              std::memory_order_relaxed))
      ;
  }

  std::shared_ptr<T> pop() {
    counted_node_ptr old_head = head.load(std::memory_order_relaxed);
    for (;;) {
      increase_head_count(old_head);
      node* const ptr = old_head.ptr;

      if (!ptr) {
        return std::shared_ptr<T>();
      }

      if (head.compare_exchange_strong(old_head, ptr->next,
                                       std::memory_order_relaxed)) {
        std::shared_ptr<T> res;
        res.swap(ptr->data);
        int const count_increase = old_head.external_count - 2;
        if (ptr->internal_count.fetch_add(
                count_increase, std::memory_order_release) == -count_increase) {
          delete ptr;
        }
        return res;
      } else if (ptr->internal_count.fetch_add(-1, std::memory_order_relaxed) ==
                 1) {
        ptr->internal_count.load(std::memory_order_acquire);
        delete ptr;
      }
    }
  }
};
Unwitting answered 3/5, 2021 at 14:51 Comment(5)
Less contention when increment and decrement operations are split?Mousetail
I don't see how the internal count ever gets to zero. Is there some detail left out that tells what finally increments the internal counter by 1 to get it back to zero? Seems like it would be stuck at -1 by explanation contained in the post. Does the discarding of the external counter increment the internal counter???Cheslie
@Cheslie I added the example code. Can you understand it?Unwitting
@Cheslie When adding external_count to internal_count, the absolute value of external_count is always greater than absolute value of internal_count (because we protect head and old_head pointers by incrementing external_count and at the moment of adding we have not decremented internal_count to stop protecting them). Please see my answer below.Catholicism
This relies on that your dw-atomic is lock-free, that's not true with most compilers.Suprematism
C
6

Short explanation

Split reference count decreases contention. Adding external_count-2 to internal_count separates two phases of reference counting in this implementation: in the first phase reference count is split and in the second phase reference count is in internal_count only.

Please read below for detailed explanation.

Reference counting

Let's start with why we need reference counting in this case of lock-free stack from the book. We need to prevent accessing next field of node pointed by head, if this node has been deleted. And we want to delete node pointed by head when it is not used anymore. To achieve this, we delay deleting this node using reference counting.

It is important that node can be referenced only by head (when node is first in the list) or next pointer of previous node (when node is in the middle or in the end of the list), but not both at the same time - because there are no situations when the head points to the node in the middle or in the end of the list. This is why we do not need a separate control block to be pointed from multiple counted node pointers (as in shared_ptr). So, we can put counter directly into the head pointer.

Although we need to protect access to fields of node pointed by head.ptr only (actually to field next only), putting counter into head pointer is not enough. We also have to put counter into the next pointer - to continue protecting the node from being deleted even if the node is pushed and popped before it. We save and restore the head pointer counter into next pointer if we push the node B before the node A that is being popped by other thread Tb and then popping node B before thread Tb.

Incrementing counter and loading current pointer should be a single atomic operation to ensure that pointer that is dereferenced gets its counter increased.

Why split reference counting?

We split counter into internal and external for two reasons:

  1. We do not need an atomic operation on both counter and pointer when decrementing counter. It is enough to atomically change counter. If we had a single counter, we would have to decrement pointer in a loop like we do in increase_head_count.

  2. Using two atomic counters reduces contention, although it increases code complexity, memory usage and requires to add two counters for checking.

To protect node from being deleted we increment external_count. internal_count is inside the node that counted pointer is pointing to.

Back to non-split reference counting

As soon as header pointer is moved to the next node in compare_exchange operation during pop operation, no thread can increment pointer counter, because threads which have read previous head will fail compare_exchange operation, and threads that have not read head yet will never read this head, as head has moved to the next node. This is why external count/pointer pairing is no longer needed. Its value is added to internal_count. After this, counter is no more split: we are working with a single counter internal_count now, which has aggregated all counter increases and decreases in a single variable.

We can think about this as two phases: in the first phase our counter is split (into external_count and internal_count) and in the second phase our counter is merged into a single variable - internal_count. In the first phase we need to add external_count to internal_count to get real counter value. In the second phase we have to use internal_count only, because external_count has some old value that does not make any sense now (as the book says, now "external counter is discarded").

Why subtract 2 from external_count?

When adding external_count to internal_count, external_count value is decreased by 2. This operation can be divided into three for simplicity:

  1. Add external_count to internal_count moving to the second phase (now we do not use external_count anymore).
  2. Decrement internal_count by 1 because head no longer points to this node (remember that we set external_count to 1 for every new node because head points to this node).
  3. Decrement internal_count by 1 to signal that this pointer protection is no longer needed in the current thread (remember that we increased external_count by 1 in increase_head_count to protect pointer)

Comparing result of internal_count.fetch_add(external_count - 2) to 2 - external_count is just the atomic way of comparing resulting internal_count to zero.

In the code external_count is decremented by 2 before adding to internal_count, and in the book text "internal count is increased by the value of the external count minus one". I think this is a mistake in the text (probably, author was initially planning to additionally decrement internal_count by 1 separately from adding external_count-1).

Catholicism answered 28/8, 2021 at 11:9 Comment(7)
It's very kind of you to post such a detailed answer. But still I'm not clear. I want to ask my questions step by step. From my understanding internal_count should be always decreased, but internal_count.fetch_add(external_count - 2) can increase internal_count in case of external_count is greater than 2. Why is this ?Unwitting
And you said the meaning of split is moving external count to internal_count. This is also not matched with the fact that internal_count is the count which is decreased by one when this node(which has internal_count) is disconnected from other node(which references to this node). At this point, I'm very confused even if you explain very nicely.Unwitting
@Unwitting I actually answered your questions 1 and 2 and explained why internal_counter is not always decreasing. I did not mention your "It sound like..." phrase directly, because it is correct if we talk about the first splitted phase. I think here is the cause of miunderstanding: you are right that internal_count is always decreased if we talk about splitted counter phase. When we move to the second phase, internal_count can be increased, this is by design.Catholicism
@Unwitting adding external_count to internal_count is the movement from splitted phase to internal_count-only phase.Catholicism
@Unwitting I think I have an idea that will really help you: try to explore every line of code in the sample and think what can happen in other threads when current thread is at this particular line. Also please read my explanation sentence by sentence (returning to the book as needed) and tell me exactly the sentence where you are lost - this will help me to explain better.Catholicism
@Unwitting Another idea: do you understand all the previous examples in this book? If you skipped some of them, you should return to them. Also, here is one more book will help a lot to understand the idea: The Art of Multiprocessor Programming by Maurice Herlihy et al. It has examples in Java, but language is not important that much for understanding the concepts.Catholicism
@Unwitting Please mark this answer as correct or ask additional questionsCatholicism

© 2022 - 2024 — McMap. All rights reserved.