Compare and swap in C++

So we're using a version of boost which is pretty old for now, and until upgrading I need to have an atomic CAS operation in C++ for my code. (we're not using C++0x yet either)

I created the following cas function:

inline uint32_t CAS(volatile uint32_t *mem, uint32_t with, uint32_t cmp)
{
    uint32_t prev = cmp;
    // This version by Mans Rullgard of Pathscale
    __asm__ __volatile__ ( "lock\n\t"
            "cmpxchg %2,%0"
            : "+m"(*mem), "+a"(prev)
              : "r"(with)
                : "cc");

    return prev;
}

My code which uses the function is somewhat as following:

void myFunc(uint32_t &masterDeserialize )
{
    std::ostringstream debugStream;

    unsigned int tid = pthread_self();
    debugStream << "myFunc, threadId: " << tid << " masterDeserialize= " << masterDeserialize << " masterAddress = " << &masterDeserialize << std::endl;

    // memory fence
    __asm__ __volatile__ ("" ::: "memory");
    uint32_t retMaster = CAS(&masterDeserialize, 1, 0);
    debugStream << "After cas, threadid = " << tid << " retMaster = " << retMaster << " MasterDeserialize = " << masterDeserialize << " masterAddress = " << &masterDeserialize << std::endl;
    if(retMaster != 0) // not master deserializer.
    {
       debugStream << "getConfigurationRowField, threadId: " << tid << " NOT master.  retMaster = " << retMaster << std::endl;

       DO SOMETHING...
    }
    else
    {
        debugStream << "getConfigurationRowField, threadId: " << tid << " MASTER. retMaster = " << retMaster << std::endl;

        DO SOME LOGIC  

        // Signal we're done deserializing.
        masterDeserialize = 0;
    }
    std::cout << debugStream.str();
}

My test of this code spawns 10 threads, and signals all of them to call the function with the same masterDeserialize variable.

This works well most of the time, but once every couple of thousand - couple of million test iterations 2 threads can both enter the path of acquiring the MASTER lock.

I'm not sure how this is possible, or how to avoid it.

I tried to use a memory fence before the resetting of the masterDeserialize, thinking that the cpu OOO can have affect, but this has no affect on the result.

Obviously this runs on a machine with many cores, and it is compiled in debug mode, so GCC should not reorder execution for optimizations.

Any suggestions as to what is wrong with the above?

EDIT: I tried using gcc primitive instead of assembly code, got the same result.

inline uint32_t CAS(volatile uint32_t *mem, uint32_t with, uint32_t cmp)
{
    return __sync_val_compare_and_swap(mem, cmp, with);
}

I am running on a multi core, multi cpu machine, but it is a Virtual machine, is it possible that this behavior is caused somehow by the VM?

Not only two but any number of threads can in theory become "masters" in this code. The problem is that the thread that took the master path after completion sets the masterDeserialize variable back to 0, thus making it possible to "acquire" again by a thread that might arrive very late to CAS (e.g. due to preemption).

The fix is actually simple - add the third state (with e.g. the value of 2) to this flag to mean "master has completed", and use this state (instead of the initial state of 0) at the end of the master's path to signal its job is done. Thus, only one of the threads that call myFunc can ever see 0, which gives you the guarantee you need. To reuse the flag, you'd need to explicitly reinitialize it to 0.

Recommended topics

Hot tags