C code of lock-free queue

Asked 23/5, 2011 at 2:24 Answered 22/2, 2017 at 1:17

How could I implement this lock-free queue pseudocode in C?

ENQUEUE(x)
    q ← new record
    q^.value ← x
    q^.next ← NULL
    repeat
        p ← tail
        succ ← COMPARE&SWAP(p^.next, NULL, q)
        if succ ≠ TRUE
            COMPARE&SWAP(tail, p, p^.next)
    until succ = TRUE
    COMPARE&SWAP(tail,p,q)
end

DEQUEUE()
    repeat
        p ← head
        if p^.next = NULL
            error queue empty
    until COMPARE&SWAP(head, p, p^.next)
    return p^.next^.value
end

How would be using the Built-in functions for atomic memory access

__sync_bool_compare_and_swap (type *ptr, type oldval type newval, ...)

I currently have

typedef struct queueelem {
    queuedata_t data;
    struct queueelem *next;
} queueelem_t;

typedef struct queue {
    int capacity;
    int size;
    queueelem_t *head;
    queueelem_t *tail;
} queue_t;

queue_t *
queue_init(int capacity)
{
    queue_t *q = (queue_t *) malloc(sizeof(queue_t));
    q->head = q->tail = NULL;
    q->size = 0;
    q->capacity = capacity;
    return q;
}

Carliecarlile answered 23/5, 2011 at 2:24 Comment(6)

The only "convenience" of using global variables is that it lets you be lazy and avoid some typing. Since you're doing concurrency programming, you should be ashamed that you even asked that question. :-) – Maundy 23/5, 2011 at 2:37

It would help if you first implement a complete non-threadsafe variant first and try to make it thread safe later. As for global, read the comment above. – Circumambulate 23/5, 2011 at 2:41

How is this different (e.g. not a duplicate) of the last question you asked? – Marsupium 23/5, 2011 at 5:2

Note that names ending in _t are reserved in some standards (POSIX, not sure about the actual standard) and shouldn't be used in your code. You should use a different naming convention. – Maidenhead 23/5, 2011 at 6:7

possible duplicate of Lock-free queue – Latona 23/5, 2011 at 6:24

some rules to live by. 1) Always add/delete from the HEAD of the list. Faster! Simpler! A single CAS will add/remove items from list, and you eliminate a ton of race conditions. If you have to do more than one CAS, you will have to deal with race condtions. Changing the next ptr on the tail item and then changing the head or the tail at the same time is not possible with a single CAS. changing the head pointer of the list is! 2) only worry about ABA if you are removing an item and putting it back. If you add new ptrs to the list only, you wont have ABA. – Mouthful 15/6, 2011 at 20:11

https://www.liblfds.org/

Public domain, no license, portable implementation of lock-free algorithms in C.

Builds out of the box for Windows and Linux.

Uses GCC on Linux, so uses the intrinsics (well, apart from 128 bit CAS, there's no intrinsic - uses inline assembly for that).

Contains the M&S queue. Have a look at the source code and see how it's done.

Waterworks answered 26/5, 2011 at 15:1 Comment(1)

Looks like the site is down – Stamina 13/6, 2017 at 18:37

If your goal is production code, simply don't do that; use locks.

In your previous question, you have got enough information explaining why. Correct lock-free implementations of even simple data structures such as queue and stack in the absence of garbage collector are tricky and sophisticated due to the (in)famous ABA problem. Unfortunately some research papers do not take ABA into account for whatever reasons; your pseudo-code seems taken from one of such papers. If you translate it to C and use heap allocated memory for nodes, it will cause undeterministic bugs if used in real code.

If you are doing this stuff to gain experience, then don't expect SO fellows to solve it for you. You have to read all the cited materials and much more, make sure you really understand all nuances of lock-free algorithms such as ABA, study various techniques intended to address the issue, study existing lock-free implementations, etc.

Finally, little guidance for translating the given pseudo-code into C:

q^.value ← x means q_elem->data = x;
repeat ... until COMPARE&SWAP(head, p, p^.next) is equivalent to do {...} while (!__sync_bool_compare_and_swap(q_obj->head, q_elem, q_elem->next);

where q_obj is an instance of type queue_t (i.e. a queue) and q_elem is an instance of type queueelem_t (i.e. a queue node).

Towering answered 23/5, 2011 at 6:39 Comment(3)

So what do I do if I need more performance in production? IMHO Locks are overkill for almost every race condition. – Mouthful 15/6, 2011 at 20:7

If I can get a lock free queue I'd prefer getting that, especially in production code. – Poverty 5/7, 2017 at 8:54

You will need to go lock-free if you have a hard realtime requirement. It depends on the application. – Pressurize 1/9, 2022 at 0:55

While not exactly C, check out the proposed Boost.Lockfree library. The internals are pretty easy to grok and could be ported to C, or, conversely, you could wrap Boost.Lockfree in a C API and use that.

Similarly, Boostcon 2010 had lots of discussion about lockfree programming and STM which is worth looking at if you're interested in this subject. I can't find a link to the videos, but the talks from Intel, IBM and AMD were worth watching since they're dealing with STM at the CPU level.

Toulon answered 23/5, 2011 at 5:3 Comment(0)

It sounds like what you want is called an MCS queue lock (although deceptively named, it's really lock-free, just not wait-free), and there is some good pseudo-code available here: http://www.cs.rochester.edu/research/synchronization/pseudocode/ss.html#mcs

Grim answered 23/5, 2011 at 5:30 Comment(1)

It's not deceptively named because it implements a lock, i.e. a mutual exclusion algorithm. – Towering 23/5, 2011 at 6:43

-2

I use C to write a minimize lockfree queue implementation.

lfq.

It support many producer, many consumer.

Dee answered 22/2, 2017 at 1:17 Comment(13)

You don't need asm volatile( "lfence" ) for a load barrier. On x86, asm("" ::: "memory") is sufficient. You seem to be confusing C++'s compile-time memory model with x86's runtime memory model. See When should I use _mm_sfence _mm_lfence and _mm_mfence. You really don't need any hand-rolled asm stuff or volatile, just use C11 stdatomic. – Solicit 4/9, 2018 at 6:56

Also your writer and reader contend with each other for ctx->count. In lfq_enqueue, shouldn't p->next = insert_node; be inside the CAS loop or something? Your current CAS loop in that function could be optimized to an unconditional ctx->tail = insert_node, so it's probably not doing what you intend. The mb() is also unnecessary; you've only modified private objects at that point. – Solicit 4/9, 2018 at 7:8

Anyway, writing your own lock-free queue is a good way to learn stuff, but yours doesn't look ready for other people to use it yet. (And normally you'd use a fixed-size circular buffer to avoid allocate/free on every queue operation. On some implementations, there's a single global free pool, so calloc/free contend with each other for access to the same data structure!) – Solicit 4/9, 2018 at 7:25

How to use ctx-> tail = insert_node to prevent ABA problem? Could you show me any sample code? – Dee 5/9, 2018 at 8:0

The mb() is line 120? If you comment out line 120, the writer will get insert_node befor you assigned insert_node into ctx->tail. I cannot test pass 1 hour test if i comment out line 120.I'm glad someone can help me to improve knowledge of lock-free structure. But i cannot remove these memory barrier without crash. – Dee 5/9, 2018 at 8:8

This is my speed test: lmb() version real 0m12.473s user 1m24.386s sys 0m5.008s mb() version real 0m15.719s user 1m25.282s sys 0m3.778s mb version always spend more time. Could you tell me the reason why don't use lfence? – Dee 5/9, 2018 at 8:19

You don't need lfence because you aren't doing movntdqa loads from video RAM (or other WC memory region). That's basically the only use memory-ordering use-case. I already linked When should I use _mm_sfence _mm_lfence and _mm_mfence which explains this, and what you do need for an acquire-load in C vs. x86 asm. I don't see why you're using a load barrier in inHP() anyway. I could see maybe once outside the loop to make it an acquire, if you're rolling your own atomics with volatile. – Solicit 5/9, 2018 at 8:31

mfence is slower than lfence, so of course it's slower with mb(). But you sometimes do need mfence for correctness (or better an xchg store). If your mb() is avoiding bugs, it's maybe from that function inlining into the caller and optimizing, not from runtime memory ordering with it in that position. None of the assignments before mb() in lfq_enqueue are to objects that other threads can currently read. So you just need to make sure you have release semantics on the store the publishes. See also en.wikipedia.org/wiki/Non-blocking_linked_list – Solicit 5/9, 2018 at 8:46

I got crash if i comment out line lfq.c:8 lmb() + remove lfq.h:16 volatile. Even though you said it makes sense, but it's not work. – Dee 5/9, 2018 at 8:47

I think your CAS in enqueue isn't actually useless; you're using it to do an atomic exchange. You're loading, then CAS with that value to make sure you have the correct old value. It would be much more efficient to just exchange directly to update ctx->tail to point at your new node, with an xchg instruction (there's a __sync builtin for that, too). But yeah, I think it is actually that simple to append to the end of a linked list, and it's ok to only update the old_tail->next pointer after updating tail. – Solicit 5/9, 2018 at 8:51

If you get crashes from removing barriers, it's probably because something somewhere else isn't actually safe; maybe in dequeue. I haven't looked at it; it looks complicated and weird. IDK why you don't just CAS(&ctx->head, old_head, next) to atomically advance head and "claim" the old head for this thread, after checking inside the retry loop that next is non-null. CAS failure indicates that another reader won the race for that node. – Solicit 5/9, 2018 at 8:57

You said old commit? github.com/darkautism/lfqueue/blob/… Sorry, this version code not really lock free queue. Because you would swap empty node out and must swap in at line 56. This version code actually is a mini spin lock. So CAS(&ctx->head, old_head, next) is not really lock-free queue. – Dee 5/9, 2018 at 9:3

This code use CAS zero to prevent swap out empty struct. If you just use CAS(&ctx->head, old_head, next), you will swap out empty struct. – Dee 5/9, 2018 at 9:37

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags