Compare and swap in C++
Asked Answered
A

1

11

So we're using a version of boost which is pretty old for now, and until upgrading I need to have an atomic CAS operation in C++ for my code. (we're not using C++0x yet either)

I created the following cas function:

inline uint32_t CAS(volatile uint32_t *mem, uint32_t with, uint32_t cmp)
{
    uint32_t prev = cmp;
    // This version by Mans Rullgard of Pathscale
    __asm__ __volatile__ ( "lock\n\t"
            "cmpxchg %2,%0"
            : "+m"(*mem), "+a"(prev)
              : "r"(with)
                : "cc");

    return prev;
}

My code which uses the function is somewhat as following:

void myFunc(uint32_t &masterDeserialize )
{
    std::ostringstream debugStream;

    unsigned int tid = pthread_self();
    debugStream << "myFunc, threadId: " << tid << " masterDeserialize= " << masterDeserialize << " masterAddress = " << &masterDeserialize << std::endl;

    // memory fence
    __asm__ __volatile__ ("" ::: "memory");
    uint32_t retMaster = CAS(&masterDeserialize, 1, 0);
    debugStream << "After cas, threadid = " << tid << " retMaster = " << retMaster << " MasterDeserialize = " << masterDeserialize << " masterAddress = " << &masterDeserialize << std::endl;
    if(retMaster != 0) // not master deserializer.
    {
       debugStream << "getConfigurationRowField, threadId: " << tid << " NOT master.  retMaster = " << retMaster << std::endl;

       DO SOMETHING...
    }
    else
    {
        debugStream << "getConfigurationRowField, threadId: " << tid << " MASTER. retMaster = " << retMaster << std::endl;

        DO SOME LOGIC  

        // Signal we're done deserializing.
        masterDeserialize = 0;
    }
    std::cout << debugStream.str();
}

My test of this code spawns 10 threads, and signals all of them to call the function with the same masterDeserialize variable.

This works well most of the time, but once every couple of thousand - couple of million test iterations 2 threads can both enter the path of acquiring the MASTER lock.

I'm not sure how this is possible, or how to avoid it.

I tried to use a memory fence before the resetting of the masterDeserialize, thinking that the cpu OOO can have affect, but this has no affect on the result.

Obviously this runs on a machine with many cores, and it is compiled in debug mode, so GCC should not reorder execution for optimizations.

Any suggestions as to what is wrong with the above?

EDIT: I tried using gcc primitive instead of assembly code, got the same result.

inline uint32_t CAS(volatile uint32_t *mem, uint32_t with, uint32_t cmp)
{
    return __sync_val_compare_and_swap(mem, cmp, with);
}

I am running on a multi core, multi cpu machine, but it is a Virtual machine, is it possible that this behavior is caused somehow by the VM?

Asinine answered 13/1, 2015 at 10:23 Comment(3)
How do you know they are actually both entering the same code path at the same time? Your debug output isn't guarded by any lock. Also, just for completeness sake and it only matters on weakly ordered architectures (and this looks a lot like x86/x86-64): The setting of masterDeserialize should be an atomic operation as well.Correspondent
1. Code path: all the threads which calls this functions, are loaded and wait on some some flag which is shared among them. I set that flag to true from a control thread. 2. This is a x86-64 architecture, I compile as 32 bit for this program. 3. Just setting masterDeserialize is atomic, but I want only 1 thread to be found in the code path of masterDeserialize== true simultaneously, hence just setting it is not enough - I need to also test the old value so that each thread will know which path to take.Asinine
It might be also worth to mention that once a single thread enteres with masterDeserialize = true, no other thread in that iteration should do so. (the code section does some deserialization and sets a pointer to be not null, when that pointer is not null the flow does no go into action. Another thing worth to mention is that the debug info is there to debug a problem in the code which is exhibited by such odd behaviour.Asinine
I
1

Not only two but any number of threads can in theory become "masters" in this code. The problem is that the thread that took the master path after completion sets the masterDeserialize variable back to 0, thus making it possible to "acquire" again by a thread that might arrive very late to CAS (e.g. due to preemption).

The fix is actually simple - add the third state (with e.g. the value of 2) to this flag to mean "master has completed", and use this state (instead of the initial state of 0) at the end of the master's path to signal its job is done. Thus, only one of the threads that call myFunc can ever see 0, which gives you the guarantee you need. To reuse the flag, you'd need to explicitly reinitialize it to 0.

Identification answered 13/1, 2015 at 18:12 Comment(1)
Thanks Alexey! I feel really stupid for missing this.Asinine

© 2022 - 2024 — McMap. All rights reserved.