Ensuring usage of double-compare-and-swap instruction, for lock-free stack?
Asked Answered
M

2

3

(Assume 64-bit x86-64 architecture and Intel 3rd/4th generation CPU)

Here is a lock-free implementation for a stack from Concurrency in Action book, page 202:

template<typename T>
class lock_free_stack
{
private:
    struct node;

    struct counted_node_ptr
    {
        int external_count;
        node* ptr;
    };

    struct node
    {
        std::shared_ptr<T> data;
        std::atomic<int> internal_count;
        counted_node_ptr next;

        node(T const& data_):data(std::make_shared<T>(data_)),internal_count(0){}
    };

    std::atomic<counted_node_ptr> head;

public:
    ~lock_free_stack()
    {
        while(pop());
    }

    void push(T const& data)
    {
        counted_node_ptr new_node;
        new_node.ptr=new node(data);
        new_node.external_count=1;
        new_node.ptr->next=head.load();
        while(!head.compare_exchange_weak(new_node.ptr->next,new_node));
    }
};

It says below the code:

On those platforms that support a double-word-compare-and-swap operation, this structure will be small enough for std::atomic to be lock-free.

I believe x86-64 does have support for the double CAS (I cannot remember the name of the instruction off the top of my head).

If I were to check the assembly (and I couldn't see the double CAS instruction) what inline assembly function would I need to write to ensure double-CAS is used?

UPDATE - I think I have found what I was looking for here:

http://blog.lse.epita.fr/articles/42-implementing-generic-double-word-compare-and-swap-.html

template<typename T>
struct DPointer <T,sizeof (uint64_t)> {
public:
  union {
    uint64_t ui[2];
    struct {
      T* ptr;
      size_t count;
    } __attribute__ (( __aligned__( 16 ) ));
  };

  DPointer() : ptr(NULL), count(0) {}
  DPointer(T* p) : ptr(p), count(0) {}
  DPointer(T* p, size_t c) : ptr(p), count(c) {}

  bool cas(DPointer<T,8> const& nval, DPointer<T,8> const& cmp)
  {
    bool result;
    __asm__ __volatile__ (
        "lock cmpxchg16b %1\n\t"
        "setz %0\n"
        : "=q" ( result )
         ,"+m" ( ui )
        : "a" ( cmp.ptr ), "d" ( cmp.count )
         ,"b" ( nval.ptr ), "c" ( nval.count )
        : "cc"
    );
    return result;
  }

  // We need == to work properly
  bool operator==(DPointer<T,8> const&x)
  {
    return x.ptr == ptr && x.count == count;
  }
};
Menado answered 31/5, 2014 at 15:33 Comment(2)
You don't need inline-asm. Modern gcc (with -mcx16) will use LOCK CMPXCHG16B when compiling a compare_exchange_weak on a 16B object like std::atomic<my_struct>. See this answer where I included code to do it with (mostly) portable C++11.Curvilinear
I can close this question as a duplicate of that one, if you think that's appropriate (let me know).Curvilinear
T
2

The oldest versions of the x86_64 do not support this instruction (CMPXCHG16B), which is required for Windows 8.1/64-bit and newer. Afaik this is most of the Athlon64 range (socket 751, 939 and some of the X2's, maybe the first generation (8xx) of Pentium D too)

How to force a compiler to use a certain instruction varies, usually one must use a not wholly portable intrinsic.

Teryn answered 31/5, 2014 at 15:57 Comment(3)
"Assume 64-bit x86-64 architecture and Intel 3rd/4th generation CPU" :)Menado
@Menado You can certainly assume it, but you need to tell the compiler to assume it too. How you do so depends on the compiler you are using, which you haven't told us.Drogue
True. I responded to the later line " x86-64 does have support for the double CAS". IOW I meant to state that it is not a generic architecture feature, and thus not part of the base instruction and featureset of compilers. Though in general I don't think you can rely on the compiler reliably doing such things without additional specifiers, so that point is a bit moot.Teryn
I
1

You can assert

std::atomic<T>::is_lock_free()
Irv answered 16/12, 2015 at 10:36 Comment(2)
While this code may answer the question, it is better to explain what it does and add some references to it.Ils
is_lock_free() tells if the atomic operations of an atomic type are implemented using no locks on the target platform, i.e using atomic instructions on CPU level. To read further check en.cppreference.com/w/cpp/atomic/atomic/is_lock_free or check the book C++ Concurrency in Action which is a phenomenal sourceWoeful

© 2022 - 2024 — McMap. All rights reserved.