Atomic operations for lock-free doubly linked list
Asked Answered
A

3

6

I am writing a lock-free doubly linked list based on these papers:

"Efficient and Reliable Lock-Free Memory Reclamation Based on Reference Counting" Anders Gidenstam,Member, IEEE,Marina Papatriantafilou, H˚ akan Sundell and Philippas Tsigas

"Lock-free deques and doubly linked lists" Håkan Sundell, Philippas Tsigas

For this question we can put aside first paper.

In this paper, they use a smart way for storing a deletion flag and a pointer in a word. (More info here)

Pseudo code for this section in the paper:

union Link
    : word
    (p,d): {pointer to Node, boolean} 

structure Node
    value: pointer to word
    prev: union Link
    next: union Link

And my code for above pseudo code:

template< typename NodeT >
struct LockFreeLink
{
public:
    typedef NodeT NodeType;

private:

protected:
    std::atomic< NodeT* > mPointer;

public:
    bcLockFreeLink()
    {
        std::atomic_init(&mPointer, nullptr);
    }
    ~bcLockFreeLink() {}

    inline NodeType* getNode() const throw()
    {
        return std::atomic_load(&mPointer, std::memory_order_relaxed);
    }
    inline std::atomic< NodeT* >* getAtomicNode() const throw()
    {
        return &mPointer;
    }
};

struct Node : public LockFreeNode
{
    struct Link : protected LockFreeLink< Node >
    {
        static const int dMask = 1;
        static const int ptrMask = ~dMask;

        Link() { } throw()
        Link(const Node* pPointer, bcBOOL pDel = bcFALSE) throw()
        { 
            std::atomic_init(&mPointer, (reinterpret_cast<int>(pPointer) | (int)pDel)); 
        }

        Node* pointer() const throw() 
        { 
            return reinterpret_cast<Node*>(
                std::atomic_load(&data, std::memory_order_relaxed) & ptrMask); 
        }
        bool del() const throw() 
        { 
            return std::atomic_load(&data, std::memory_order_relaxed) & dMask; 
        }
        bool compareAndSwap(const Link& pExpected, const Link& pNew) throw() 
        { 
            Node* lExpected = std::atomic_load(&pExpected.mPointer, std::memory_order_relaxed);
            Node* lNew = std::atomic_load(&pNew.mPointer, std::memory_order_relaxed);

            return std::atomic_compare_exchange_strong_explicit(
                &mPointer,
                &lExpected,
                lNew,
                std::memory_order_relaxed,
                std::memory_order_relaxed); 
        }

        bool operator==(const Link& pOther) throw() 
        { 
            return std::atomic_load(data, std::memory_order_relaxed) == 
                std::atomic_load(pOther.data, std::memory_order_relaxed); 
        }
        bool operator!=(const Link& pOther) throw() 
        { 
            return !operator==(pOther); 
        }
    };

    Link mPrev;
    Link mNext;
    Type mData;

    Node() {};
    Node(const Type& pValue) : mData(pValue) {};
};

In this paper there is this function for set deletion mark of link to true:

procedure SetMark(link: pointer to pointer to Node)
    while true do
       node = *link;
       if node.d = true or CAS(link, node, (node.p, true)) then break;

And my code for this function:

void _setMark(Link* pLink)
{
    while (bcTRUE)
    {
        Link lOld = *pLink;
        if(pLink->del() || pLink->compareAndSwap(lOld, Link(pLink->pointer(), bcTRUE)))
            break;
    }
}

But my problem is in compareAndSwap function where i must compare and swap three atomic variable. Information about problem is here

(Actually new variable in compare and swap function isn't important because it is thread local)

Now my question: how can i write compareAndSwap function to compare and swap three atomic varialbe or where am i making mistake?

(Excuse me for long question)

Edit:

similar problem is in memory manager paper:

function CompareAndSwapRef(link:pointer to pointer toNode,
old:pointer toNode, new:pointer toNode):boolean
    if CAS(link,old,new) then
        if new=NULL then
            FAA(&new.mmref,1);
            new.mmtrace:=false;
    if old=NULLthen FAA(&old.mmref,-1);
    return true;
return false; 

here again i must compare and swap three atomic variable. (Note that my arguments are type of Link and i must compare and swap mPointer of Link)

Arletha answered 26/10, 2013 at 17:4 Comment(0)
D
3

Unless you can make your three data items that you are comparing/swapping fit into two pointer-size elements, you can't do this with compare and swap (certainly not on x86, and I've not heard of any other machine architecture that has such a thing).

If you rely on the data being stored on an address that is (at least) aligned to an even byte-address, you could potentially use bitwise OR to set the lowest bit when deleting the element. In the past, people have been using the upper parts of the address to store extra data, but in x86-64 at least, this is not possible, as the upper part of the address must be "canonical", meaning that any address bits above the "usable limit" (defined by the processor architecture, currently this is 48 bits), must all be the same as the highest bit of the usable limit (so, same as bit 47).

Edit: This section of code does exactly what I describe:

    static const int dMask = 1;
    static const int ptrMask = ~dMask;

    Link() { } throw()
    Link(const Node* pPointer, bcBOOL pDel = bcFALSE) throw()
    { 
        std::atomic_init(&mPointer, (reinterpret_cast<int>(pPointer) | (int)pDel)); 
    }

    Node* pointer() const throw() 
    { 
        return reinterpret_cast<Node*>(
            std::atomic_load(&data, std::memory_order_relaxed) & ptrMask); 
    }

It uses the lowest bit to store the pDel flag.

You should be able to do this for a double-linked list by using the a form of cmpxchg16b (on x86). In a Windows system, that would be the _InterlockedCompareExchange128. In gcc (for Unix type OS's, such as Linux/MacOS) you will need to first construct a int128 from your two pointers. If you are compiling for 32-bit code, you will probably need to make a 64-bit int for both Windows and Unix OS's.

Dissyllable answered 26/10, 2013 at 17:18 Comment(22)
I think it's still OK to use the top bit, as long as you mask out the boolean before attempting to use the pointer. Although technically it would be undefined behaviour.Pamalapamela
your talk about bitwise OR is correct but, in general i want use CAS.Arletha
@AlanStokes: It is not OK on x86-64 - ALL bits above the "valid" bits need to be the correct value (same value as bit 47). Of course, you could use a high bit, but you then need to find a way to set it to the correct one or zero value based on bit 47, which tends to get quite messy.Dissyllable
The paper assumes that the pointers have a 4-byte or 8-byte alignment on 32-bit or 64-bit architectures. This gives either 2 or 3 bits to play with. The paper uses one for a deletion marker and one for a lock.Warnke
@MohammadRB: Yes, and my point is that you then need to ensure that both the bool and the pointer fit within one machineword. Using the OR solution will achieve this (obviously, you need the relevant AND on the other side to remove it again). There is no OTHER way to achieve this as far as I understand (and I kind of think I know fairly well how most processor architectures work and what you can/can't do).Dissyllable
So, if I understand sfstewman's comment correctly, it uses the fact that pointers are aligned (and thus the lower bits in the pointer is always zero), and this allows other data to be stored in the lower portion of the pointer. However, this does require the ability to do bitwise or/and to add/remove those bits within the pointer.Dissyllable
@MatsPetersson do you know of a (modern) architecture where pointers, at least in assembly, are stored in non-arithmetic registers? The paper is not C/C++-specific, and you can always coerce a pointer into a intptr_t type (and back to a void*) in C.Warnke
@MatsPetersson The problem of storing bool inside pointer has solved. but i don't know, how can i compare and swap those atomics in a consistence way as i say in last link that i have provided in questionArletha
@sfstewman: Yes, that was my point entirely, that they are "coercing" pointers + booleans by using the fact that you can convert to integer and back to pointer and retain all bits.Dissyllable
@MohammadRB: Like my first sentence says, there is no way to use compare and swap type instructions to work on three different values. You can use the cmpxchg8b (in 32-bit) or cmpxchg16b (in 64-bit) to do a "double linked list update", assuming your next and prev links are located next to each other. As my edit says, you may need to convert your pointers into int64 or int128 types.Dissyllable
the whole point of paper is to use CAS instead of an instruction that will work on double word. are you saying that i store two pointer in one 64bit int and update two pointer with one instruction? but it isn't cross platform ;(Arletha
There is no cross-platform way to do 64-bit updates on 32-bit platforms, so there is no way you can solve that without using a 64-bit CAS type operation supported by the 32-bit processor you have. Many processors have some way to achieve this, but it's certainly far from 100% portable. Like my edit says, MS and GCC compilers have builtin functions that provide 32-, 64- and 128-bit compare/swap operation (128-bit only for 64-bit architectures).Dissyllable
The point of the referenced paper is that only one of the pointers is actually locked at a time. The paper treats the structure as a singly-linked list with prev pointers that may not be up to date. This scheme only requires a single CAS operation at a time. It helps to actually read the paper.Warnke
@Warnke i have read two paper and already implemented memory manager. but actual problem is that i can't do CAS on non-atomic types in c++, and i must first read expected pointer (Node* lExpected = std::atomic_load(&pExpected.mPointer, std::memory_order_relaxed);) and then use it in CAS instruction, and in between, another thread can modify expected pointerArletha
@MohammadRB Why can't you use one of the std::atomic_compare_exchange_* functions? They directly implement CAS. The point of CAS is that it only exchanges if the values are equal, and returns false if the values are not. The terminating condition of the SetMark pseudocode is that either the deletion mark was set by another thread or the compare-and-swap succeeded, setting the deletion mark.Warnke
@Warnke the setMark method can be replaced with atomic OR but in last pseudo code that i have added to my question, i first read expected value then compare it with atomic variable, and in between these two operation, another thread can change actual value of expected, but current thread will compare atomic variable with previous value of expected As i said here(https://mcmap.net/q/1174331/-compareandexchange-on-three-atomic-variable/2279977)Arletha
@MohammadRB This is why the SetMark method cannot be replaced by an atomic OR. It requires an outer loop to ensure that the deleted bit is set, either by the current thread or another.Warnke
@Warnke Why setMark cannot be replaced with atomic OR? we want first bit become 1 and it can be done with atomic OR. i add a loop to my CAS but at that post other users say it cann't be safeArletha
@Mats x64 requires the top bits to be consistent when accessing an address. That isn't necessarily the same as requiring the same of all stored pointer values.Pamalapamela
@Warnke another user has implemented this papers. i see his codes and for CAS he has read expected atomic variable and then compare it with source atomic variable. here: https://mcmap.net/q/1040128/-is-a-lock-wait-free-doubly-linked-list-possible. is this approach correct or not? am i making mistake in this case?Arletha
@AlanStokes: This is true only for Windows; virtual addresses with the high order bit set aren't assigned unless 4 GB tuning is enabled on the application.Deering
@AlanStokes: yes, like I said, you can store anything in the 64-bit pointer value, but as soon as you use it to actually dereference memory, it is a GP fault to have non-canonical address. Which means that you need to fill it in appropriately before using it as a pointer. Not terribly hard, but harder than masking off some low order bits (a single, non-conditional operation, where setting the top bit needs at the very least two operations, and probably more, but I haven't really checked what is the simplest way (probably shuft up 16 bits and then down again using a signed shift)Dissyllable
C
3

http://www.drdobbs.com/cpp/lock-free-code-a-false-sense-of-security/210600279

But replacing locks wholesale by writing your own lock-free code is not the answer. Lock-free code has two major drawbacks. First, it's not broadly useful for solving typical problems—lots of basic data structures, even doubly linked lists, still have no known lock-free implementations. Coming up with a new or improved lock-free data structure will still earn you at least a published paper in a refereed journal, and sometimes a degree.

I don't think it would be efficient enough to use it, but anyway it's interesting to read.

Clad answered 13/1, 2014 at 16:36 Comment(1)
This answer is misleading, first in that there exist imperfect but pretty good lock free doubly linked list implementations and second in that it advises against using lock free code at all. Lock free code is able to perform significantly better than code using locks in situations where there could be high contention for the lock.Tumbling
F
3

On x64, only 44 bits of address space are used. If your pointers are aligned to 8 bytes then you are only using 41 bits. 41x2 is still too large for 64 bits. There is a 128 bit compare and swap although I can't vouch for its speed. I always try to use the 64 bit one.

Maybe you only need up to 2 billion nodes. So what you could do is preallocate a pool of nodes that the list pulls from. You create nodes by grabbing the next free pool index using atomic ops of course. Then instead of next and prev being pointers, they could be 31 bit indexes into the node pool and you have 2 bits left over for delete flags. Assuming you don't need 2 billion nodes, you have even more bits left over. The only downside is you have to know how many nodes you are going to need at startup, although you could realloc the nodes if you had too.

What I have done is used virtual memory functions to reserve GB of address space and then map physical ram into that space as I need it to extend my pool without having to reallocate.

Forta answered 15/5, 2014 at 6:3 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.