double checked locking pattern

Asked 19/2, 2011 at 12:26 Answered 19/8, 2014 at 5:46

Solved c++multithreading double-checked-locking

In C++ and the Perils of Double-Checked Locking, there's persudo code to implement the pattern correctly which is suggested by the authors. See below,

Singleton* Singleton::instance () {
    Singleton* tmp = pInstance;
    ... // insert memory barrier (1)
    if (tmp == 0) {
        Lock lock;
        tmp = pInstance;
        if (tmp == 0) {
            tmp = new Singleton;
            ... // insert memory barrier (2)
            pInstance = tmp;
        }
    }
    return tmp;
}

I just wonder that whether the first memory barrier can be moved right above the return statement?

EDIT: Another question: In the linked article, as vidstige quoted

Technically, you don’t need full bidirectional barriers. The first barrier must prevent downwards migration of Singleton’s construction (by another thread); the second barrier must prevent upwards migration of pInstance’s initialization. These are called ”acquire” and ”release” operations, and may yield better performance than full barriers on hardware (such as Itainum) that makes the distinction.

It says that the second barrier doesn't need to be bidirectional, so how can it prevent the assignment to pInstance from being moved before that barrier? Even though the first barrier can prevent upwards migration, but another thread can still have chance to see the un-initialized members.

EDIT: I think I almost understand the purpose of the first barrier. As sonicoder noted, branch prediction may cause tmp to be NULL when the if returns true. To avoid that problem, there must be a acquire barrier to prevent the reading of tmp in return before the reading in if.

The first barrier is paired with the second barrier to achieve synchronize-with relationship, so it can be move down.

EDIT: For those who are interested in this question, I strongly recommend reading memory-barriers.txt.

Unitive answered 19/2, 2011 at 12:26 Comment(7)

You need the mem-barrier to enforce no out of order memory accesses in the instructions. Think of issues like branch predication and how it could screw up the inner if-statement. – Synn 19/2, 2011 at 12:32

You want to remove the memory barrier. Does the article you linked not explain why it is needed? – Renner 19/2, 2011 at 12:33

@David: I saw in another book <<Concurrent programming on Windows>> where the author placed a barrier before the return statement. So I just get a little confused. The first barrier is to prevent the thread from seeing the un-initialized members, right? – Unitive 19/2, 2011 at 12:46

@David: he doesn't want to remove it, he wants to move it further down. If the only danger is that without the barrier, the function could return a pointer to an object that (in this thread) is/appears uninitialized, then a barrier immediately before return would be OK. So the question is, is there some other danger that means the specific position of the barrier matters? – Pomcroy 19/2, 2011 at 13:4

@Alex: one possibility is that on Windows you can make stronger assumptions about the memory/threading model than the authors of the first paper do. For example, Windows uses Intel-based architectures with coherent memory caches, but some other OSes on other architectures do not. I don't know though whether that makes a difference in this case. Consider that this code assumes that writing a pointer is atomic, which is fair enough as a constraint but again might not be true of all compilers and all hardware everywhere, ever. – Pomcroy 19/2, 2011 at 13:8

@alex it's possible that joe duffy was making use of specifics of the memory models that windows software runs on. – Renner 19/2, 2011 at 13:10

@Alex.Shen You still have this question? It's been more than three years after all. The existing answer is not satisfying, which makes me give my own. – Spanking 19/8, 2014 at 5:54

I didn't see any correct answer here related to your question so I decide to post one even after more than three years;)

I just wonder that whether the first memory barrier can be moved right above the return statement?

Yes, it can.

It's for threads that won't enter the if statement, i.e., pInstance has already been constructed and initialized correctly, and is visible.

The second barrier (the one right before pInstance = tmp;) guarantees that the initialization of singleton's members fields are committed to memory before pInstance = tmp; is committed. But this does NOT necessarily mean that other threads (on other cores) will see these memory effects in the same order (counter-intuitive, right?). A second thread may see the new value of the pointer in cache but not those member fields yet. When it accesses a member by dereferencing the pointer (e.g., p->data), the address of that member may has already been in cache, but not the one that's desired. Bang! A wrong data is read. Note that this is more than theoretical. There are systems that you need perform a cache coherence instruction (e.g., a memory barrier) to pull new data from memory.

That's why the first barrier is there. It also explains why it's ok to place it right before the return statement (but it has to be after Singleton* tmp = pInstance;).

It says that the second barrier doesn't need to be bidirectional, so how can it prevent the assignment to pInstance from being moved before that barrier?

A write barrier guarantees that every write preceding it will effectively happen before every write following it. It's a stop sign, and no write can cross it to the other side. For a more detailed description, refer to here.

Spanking answered 19/8, 2014 at 5:46 Comment(3)

You're right, a write barrier is a full fence, I probably misunderstood it and confused it with an one way fence, i.e a release fence. – Unitive 20/8, 2014 at 7:3

A write barrier is NOT a full barrier. When people talk about "full barrier", what they really mean is one that prevents both read and write from crossing the line. A write barrier(or release barrier) only restricts writes. FYI, there is also a weaker relation in C++11 called "write-release". But it's not necessarily a barrier. You can regard them as full barrier > write barrier >= write-release where > means stronger than. Here is an excellent article about write-release vs. write barrier. – Spanking 20/8, 2014 at 7:30

One thing worth mentioning is that a standalone release fence in C++ std::atomic_thread_fence(std::memory_order_release) is stronger than a write barrier which generally restricts writes only. This is because a release fence also prevents previous reads from being reordered. A relase fence = A write/write barrier + A read/write barrier. So I should have pointed out that "release fence" and "write barrier" are different. – Spanking 21/8, 2014 at 8:10

No, the memory barrier cannot be moved below the assignment-statement since the memory barrier protects the assignment from upwards migration. From the linked article:

The ﬁrst barrier must prevent downwards migration of Singleton’s construction (by another thread); the second barrier must prevent upwards migration of pInstance’s initialization.

On a side note: The double-checked locking pattens for singletons is only useful if you have huge performance requirements.

Have you profiled your binaries and observed the singleton access as a bottle-neck? If not chances are you do not need to bother at all with the double-checked locking pattern.

I recommend using a simple lock.

Charyl answered 19/2, 2011 at 13:52 Comment(4)

You mention upwards migration, but then the quote says the first memory barrier is for downwards migration. – Owing 19/2, 2011 at 14:44

@Thomas Edleson yes. The quote also mention the first barrier, but the question and answer was for the second. – Charyl 19/2, 2011 at 14:56

Then why does the question say "I just wonder that whether the first memory barrier can be moved..."? – Owing 19/2, 2011 at 14:59

@Thomas Edleson Your right. I completely missunderstood the question. – Charyl 19/2, 2011 at 17:30

Recommended topics

Hot tags