Looking for critique of my thread safe, lock-free queue implementation

Asked 4/2, 2014 at 19:4 Answered 4/2, 2014 at 21:14

So, I've written a queue, after a bit of research. It uses a fixed-size buffer, so it's a circular queue. It has to be thread-safe, and I've tried to make it lock-free. I'd like to know what's wrong with it, because these kinds of things are difficult to predict on my own.

Here's the header:

template <class T>
class LockFreeQueue
{
public:
    LockFreeQueue(uint buffersize) : buffer(NULL), ifront1(0), ifront2(0), iback1(0), iback2(0), size(buffersize) { buffer = new atomic <T>[buffersize]; }
    ~LockFreeQueue(void) { if (buffer) delete[] buffer; }

    bool pop(T* output);
    bool push(T input);

private:
    uint incr(const uint val)
        {return (val + 1) % size;}

    atomic <T>* buffer;
    atomic <uint> ifront1, ifront2, iback1, iback2;
    uint size;
};

And here's the implementation:

template <class T>
bool LockFreeQueue<T>::pop(T* output)
{
    while (true)
    {
        /* Fetch ifront and store it in i. */
        uint i = ifront1;

        /* If ifront == iback, the queue is empty. */
        if (i == iback2)
            return false;

        /* If i still equals ifront, increment ifront, */
        /* Incrememnting ifront1 notifies pop() that it can read the next element. */
        if (ifront1.compare_exchange_weak(i, incr(i)))
        {
            /* then fetch the output. */
            *output = buffer[i];
            /* Incrememnting ifront2 notifies push() that it's safe to write. */
            ++ifront2;
            return true;
        }

        /* If i no longer equals ifront, we loop around and try again. */
    }
}

template <class T>
bool LockFreeQueue<T>::push(T input)
{
    while (true)
    {
        /* Fetch iback and store it in i. */
        uint i = iback1;

        /* If ifront == (iback +1), the queue is full. */
        if (ifront2 == incr(i))
            return false;

        /* If i still equals iback, increment iback, */
        /* Incrememnting iback1 notifies push() that it can write a new element. */
        if (iback1.compare_exchange_weak(i, incr(i)))
        {
            /* then store the input. */
            buffer[i] = input;
            /* Incrementing iback2 notifies pop() that it's safe to read. */
            ++iback2;
            return true;
        }

        /* If i no longer equals iback, we loop around and try again. */
    }
}

EDIT: I made some major modifications to the code, based on comments (Thanks KillianDS and n.m.!). Most importantly, ifront and iback are now ifront1, ifront2, iback1, and iback2. push() will now increment iback1, notifying other pushing threads that they can safely write to the next element (as long as it's not full), write the element, then increment iback2. iback2 is all that gets checked by pop(). pop() does the same thing, but with the ifrontn indices.

Now, once again, I fall into the trap of "this SHOULD work...", but I don't know anything about formal proofs or anything like that. At least this time, I can't think of a potential way that it could fail. Any advice is appreciated, except for "stop trying to write lock-free code".

Herron answered 4/2, 2014 at 19:4 Comment(14)

A concurrent container doensn't have a size, nor has it "full" or "empty" states. Those are meaningless. – Watkins 4/2, 2014 at 19:6

@KerrekSB: A circular buffer does have a size, and they are often used as concurrent containers, so ... – Dollfuss 4/2, 2014 at 19:9

A side note. if (ifront == iback) return true; else return false; in isEmpty() should just be return ifront == iback; same in isFull(). Avoid avoid extra if-else. – Torrence 4/2, 2014 at 19:20

@concept3d Its not necessarily thread safe, it may not have to be...It certainly doesn't follow the rule of 3, probably should set the buffer to NULL, copy constructor and assignment, ect – Cachexia 4/2, 2014 at 19:22

@ZanLynx It makes no sense for the size and "emptiness" to be part of the public interface if other threads can concurrently change that state. – Cholinesterase 4/2, 2014 at 19:29

To me, it seems your concerns are justified: the compare_and_exchange in push both notifies writers that the element may not be written to any more and readers that the element can now be read from. There's no guarantee the atomic write to buffer[i] occurs before the read in the first scenario. – Gehenna 4/2, 2014 at 19:29

In the same line with @murrekatt's code, incr can be simplified to return (val+1)%size. It does not change anything, but less code is easier to grasp in concurrency. – Polysemy 4/2, 2014 at 19:29

I do not think comparison atomicx == atomicy is atomic (speaking of isEmpty() and isFull()) – Medicine 4/2, 2014 at 19:30

It seems to me that if the writer is quick enough it can overwrite the node after you've incremented the pointer and before you've fetched the data from the node, in pop() – Medicine 4/2, 2014 at 19:36

I've removed isEmpty() and isFull() from the interface. I didn't actually have any plans that required them, I just figured that they would be useful maybe, but if they cause problems then it's best if they go away. – Herron 4/2, 2014 at 19:46

Have a read: drdobbs.com/article/… – Torrence 4/2, 2014 at 19:59

"which gives thread B time" -- how much time? Is it enough to do what it needs to do? Unless you can fill in the numbers, this is not a valid line of reasoning. Stick to "X happens before Y". – Darrelldarrelle 4/2, 2014 at 20:16

If one statement is executed after another on thread B, with no pauses in between statements, then it works. If both threads are running on the same processor, the operating system may choose to execute something else in between those two statements. Therein lies the potential for errors. – Herron 4/2, 2014 at 20:28

I think you can successfully implements a lock-free circular buffer if you use two indices for back and front (two of each). At the push side you advance iback1, push, then advance iback2. The pop side only looks at iback2. – Darrelldarrelle 5/2, 2014 at 7:58

The proper way to approach a lock free data structure is to write a semi formal proof that your design works in pseudo code. You shouldn't be asking "is this lock free code thread safe", but rather "does my proof that this lock free code is thread safe have any errors?"

Only after you have a formal proof that a pseudo code design works do you try to implement it. Often this brings to light issues like garbage collection that have to be handled carefully.

Your code should be the formal proof and pseudo code in comments, with the relatively unimportant implementation interspersed within.

Verifying your code is correct then consists of understanding the pseudo code, checking the proof, then checking for failure for your code to map to your pseudo code and proof.

Directly taking code and trying to check that it is lock free is impractical. The proof is the important thing in correctly designing this kind of thing, the actual code is secondary, as the proof is the hard part.

And after and while you have done all of the above, and have other people validate it, you have to put your code through practical tests to see if you have a blind spot and there is a hole, or don't understand your concurrency primitives, or if your concurrency primitives have bugs in them.

If you aren't interested in writing semi formal proofs to design your code, you shouldn't be hand rolling lock free algorithms and data structures and putting them into place in production code.

Determining if a pile of code "is thread safe" is putting all of the work load on other people. You need to have an argument why your code "is thread safe" arranged in such a way that it is as easy as possible for others to find holes in it. If your argument why your code "is thread safe" is arranged in ways that makes it harder to find holes, your code cannot be presumed to be thread safe, even if nobody can spot a hole in your code.

The code you posted above is a mess. It contains commented out code, no formal invariants, no proofs that the lines, no strong description of why it is thread safe, and in general does not put forward an attempt to show itself as thread safe in a way that makes it easy to spot flaws. As such, no reasonable reader will consider the code thread safe, even if they cannot find any errors in it.

Olszewski answered 4/2, 2014 at 21:14 Comment(2)

-1 This is an opinion based scolding, when the asker simply asked a straight-forward question - which admittedly he should not have asked without more work on his part. – Killoran 4/2, 2014 at 22:54

@Killoran it also tells him how to write a thread safe algorithm. Start with the pseudo code and the proof. Even if nobody here could find an error in his code, the lack of proof means that the code cannot be considered reliably thread safe. (Note a proof is not sufficient: rather necessary). – Olszewski 4/2, 2014 at 23:44

No, it's not thread safe - consider the following sequence if events:

First thread completes if (ifront.compare_exchange_weak(i, incr(i))) in pop and goes to sleep by scheduler.
Second thread calls push size times (just enough to make ifront be equal to value of i in the first thread).
First thread wakes.

In this case pop buffer[i] will contain the last pushed value, which is wrong.

Cott answered 4/2, 2014 at 20:28 Comment(2)

Is it possible to notify the scheduler to not put the thread to sleep between compare_exchange and buffer[i]'s read/write? – Herron 4/2, 2014 at 20:41

@HaydnV.Harach No, unless you are on some real-time OS with kernel-level critical section (but that will make your code non-lock-free). I wonder why do you need lock-free queue at all. If just for self-education, than it's ok, but do not ever use self-written lock-free code in production code. – Cott 4/2, 2014 at 20:45

There are some issues when considering wrap-around but I think the main issue of your code is that it may pop invalid values from the buffer.

Consider this:

ifront = iback = 0

Push gets called and CAS increases the value of iback 0 -> 1. However the thread now get's stalled before buffer[0] is assigned.

ifront = 0, iback = 1

Pop is now called. CAS increases ifront 1 -> 1 and buffer[0] is read before it's assigned.

A stale or invalid value is popped.

PS. Some researches therefore have asked for a DCAS or TCAS (Di and Tri CAS).

Varese answered 4/2, 2014 at 20:38 Comment(1)

A potential way forward is implementing a spin lock. – Varese 4/2, 2014 at 20:39

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags