How many ABA tag bits are needed in lock-free data structures?

Asked 28/2, 2017 at 16:58 Answered 10/3, 2017 at 17:58

Solved multithreading lock-free thread-synchronization

One popular solution to the ABA problem in lock-free data structures is to tag pointers with an additional monotonically incrementing tag.

 struct aba {
      void *ptr;
      uint32_t tag;
 };

However, this approach has a problem. It is really slow and has huge cache problems. I can obtain a speed-up of twice as much if I ditch the tag field. But this is unsafe?

So my next attempt stuff for 64 bit platforms stuffs bits in the ptr field.

struct aba {
    uintptr __ptr;
};
uint32_t get_tag(struct aba aba) { return aba.__ptr >> 48U; }

But someone said to me that only 16 bits for the tag is unsafe. My new plan is to use pointer alignment to cache-lines to stuff more tag bits in but I want to know if that'll work.

If that fails to work my next plan is to use Linux's MAP_32BIT mmap flag to allocated data so I only need 32 bits of pointer space.

How many bits do I need for the ABA tag in lock-free data-structures?

Appointment answered 28/2, 2017 at 16:58 Comment(2)

I know you started with a monotonically increasing tag assignment strategy, and I confess I don't know much about the problem, but in general, wouldn't a cheap hash-like function (say, hyperlog-distributed numeric buckets) guarantee that the tags don't collide? – Cantle 5/3, 2017 at 19:49

@Cantle I was considering using a hash function myself but I can't construct a good argument for using one over just incrementing the tag. It does seem like a very interesting idea though. – Appointment 8/3, 2017 at 1:22

The amount of tag bits that is practically safe can be estimated based on the preemption time and the frequency of pointer modifications.

To remind, the ABA problem happens when a thread reads the value it wants to change with compare-and-swap, gets preempted, and when it resumes the actual value of the pointer happens to be equal to what the thread read before. Therefore the compare-and-swap operation may succeed despite data structure modifications possibly done by other threads during the preemption time.

The idea of adding the monotonically incremented tag is to make each modification of the pointer unique. For it to succeed, increments must produce unique tag values during the time when a modifying thread might be preempted; i.e. for guaranteed correctness the tag may not wraparound during the whole preemption time.

Let's assume that preemption lasts a single OS scheduling time slice, which is typically tens to hundreds of milliseconds. The latency of CAS on modern systems is tens to hundreds of nanoseconds. So rough worst-case estimate is that there might be millions of pointer modifications while a thread is preempted, and so there should be 20+ bits in the tag in order for it to not wraparound.

In practice it can be possible to make a better estimate for a particular real use case, based on known frequency of CAS operations. One also need to estimate the worst-case preemption time more accurately; for example, a low-priority thread preempted by a higher-priority job might end up with much longer preemption time.

Griffis answered 5/3, 2017 at 18:9 Comment(1)

On a subjective note, I find the approach with using spare address bits for tag values quite fragile, and rather non-portable and not future-proof (e.g. what if future processor generations will use more than 48 bits for memory addressing) - and therefore dangerous for practical use. – Griffis 5/3, 2017 at 18:10

According to the paper

http://web.cecs.pdx.edu/~walpole/class/cs510/papers/11.pdf Hazard Pointers: Safe Memory Reclamation for Lock-Free Objects (IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 15, NO. 6, JUNE 2004 p. 491) by PhD Maged M. Michael

tag bits should be sized to make wraparound impossible in real lockfree scenarios (I can read this as if you may have N threads running and each may access the structure, you should have N+1 different states for tags at least):

6.1.1 IBM ABA-Prevention Tags

The earliest and simplest lock-free method for node reuse is the tag (update counter) method introduced with the documentation of CAS on the IBM System 370 [11]. It requires associating a tag with each location that is the target of ABA-prone comparison operations. By incrementing the tag when the value of the associated location is written, comparison operations (e.g., CAS) can determine if the location was written since it was last accessed by the same thread, thus preventing the ABA problem. The method requires that the tag contains enough bits to make full wraparound impossible during the execution of any single lock-free attempt. This method is very efficient and allows the immediate reuse of retired nodes.

Ferebee answered 4/3, 2017 at 8:34 Comment(3)

There is no guarantee that another thread can only modify the value once, so I believe any limit based on the number of threads would not be safe. – Griffis 5/3, 2017 at 17:2

The paper lists the condition. Do you have any other paper with different condition? – Ferebee 5/3, 2017 at 17:12

The stated condition is correct, but I interpret it differently. In particular, "the execution of any single lock-free attempt" includes the time during which a thread is preempted, and other threads might do much more than a single operation during that time. I've composed an answer to clarify this. – Griffis 5/3, 2017 at 18:12

Depending on your data structure you could be able to steal some extra bits from the pointers. For example if the objects are 64 bytes and always aligned on 64 byte boundaries, the lower 6 bits of each pointer could be used for the tags (but that's probably what you already suggested for your new plan).

Another option would be to use an index into your objects instead of pointers.

In case of contiguous objects that would of course simply be an index into an array or vector. In case of lists or trees with objects allocated on the heap, you could use a custom allocator and use an index into your allocated block(s).

For say 17M objects you would only need 24 bits, leaving 40 bits for the tags.

This would need some (small and fast) extra calculation to get the address, but if the alignment is a power of 2 only a shift and an addition are needed.

Pennate answered 10/3, 2017 at 17:58 Comment(0)

Recommended topics

Hot tags