Double-Checked Lock Singleton in C++11

H

3

43

Is the following singleton implementation data-race free?

static std::atomic<Tp *> m_instance;
...

static Tp &
instance()
{
    if (!m_instance.load(std::memory_order_relaxed))
    {
        std::lock_guard<std::mutex> lock(m_mutex);
        if (!m_instance.load(std::memory_order_acquire))
        {
            Tp * i = new Tp;
            m_instance.store(i, std::memory_order_release);    
        }    
    }

    return * m_instance.load(std::memory_order_relaxed);
}

Is the std::memory_model_acquire of the load operation superfluous? Is it possible to further relax both load and store operations by switching them to std::memory_order_relaxed? In that case, is the acquire/release semantic of std::mutex enough to guarantee its correctness, or a further std::atomic_thread_fence(std::memory_order_release) is also required to ensure that the writes to memory of the constructor happen before the relaxed store? Yet, is the use of fence equivalent to have the store with memory_order_release?

EDIT: Thanks to the answer of John, I came up with the following implementation that should be data-race free. Even though the inner load could be non-atomic at all, I decided to leave a relaxed load in that it does not affect the performance. In comparison to always have an outer load with the acquire memory order, the thread_local machinery improves the performance of accessing the instance of about an order of magnitude.

static Tp &
instance()
{
    static thread_local Tp *instance;

    if (!instance && 
        !(instance = m_instance.load(std::memory_order_acquire)))
    {
        std::lock_guard<std::mutex> lock(m_mutex);
        if (!(instance = m_instance.load(std::memory_order_relaxed)))
        {
            instance = new Tp; 
            m_instance.store(instance, std::memory_order_release);    
        }    
    }
    return *instance;
}

Honebein answered 22/5, 2011 at 8:48 Comment(11)

I see one bug, your code is not singleton-free. – Underwing 22/5, 2011 at 12:9

If you don't know how to implement a Singleton safely, then you probably shouldn't be using one. – Milkwhite 22/5, 2011 at 12:15

Upvoted. C++11 <atomic> is a new header with new terminology. Asking how to use it to safely implement double checked locking is an excellent question that I imagine many people will be asking. – Gonroff 22/5, 2011 at 14:19

The double-checked locking pattern has been shown to be unsafe by Scott Meyers and Andrei Alexandrescu. See aristeia.com/Papers/DDJ_Jul_Aug_2004_revised.pdf for details. Matthew – Boaten 23/5, 2011 at 12:29

Where would m_mutex be defined? – Briscoe 25/5, 2011 at 21:31

@Matthew Gilman: that paper specifically refers to [15] ISO/IEC 14882: 1998 – Josie 26/5, 2011 at 9:39

If it's thread local, how is it a singleton? And if it's thread local and its address is not passed around, why do you need atomic loading at all? But without the thread_local it seems good. – Taranto 30/8, 2011 at 12:46

In my opinion, the entire notion of double-checked locking as such is rubbish. The only safe (and not totally convoluted, unintellegible) way to achieve what double-checked locking tries to do is create a singleton from the main thread before any other threads are created. Trivial to do, no complicated voodoo needed, considerably less overhead, and guaranteed to work. – Magenta 13/12, 2011 at 12:54

@Nicola Bonelli: Do you realize that the code above with thread_local is does not implement singleton pattern? Your instances will be multiple, one per thread. – Ruggiero 29/1, 2013 at 12:56

@wilx: No they won't, since all threads share the same instance via m_instance. The thread_local instance is just a cache for the instance to avoid generating an atomic load per access – Intensive 4/3, 2013 at 14:42

@Grizzly: You are right, I have missed that. Though given how are atomic reads implemented on most platforms and how is thread_local implemented in GCC, currently, I think it would be better to do without the thread_local variable and incur atomic read instead. – Ruggiero 4/3, 2013 at 18:33

P

20

That implementation is not race-free. The atomic store of the singleton, while it uses release semantics, will only synchronize with the matching acquire operation—that is, the load operation that is already guarded by the mutex.

It's possible that the outer relaxed load would read a non-null pointer before the locking thread finished initializing the singleton.

The acquire that is guarded by the lock, on the other hand, is redundant. It will synchronize with any store with release semantics on another thread, but at that point (thanks to the mutex) the only thread that can possibly store is the current thread. That load doesn't even need to be atomic—no stores can happen from another thread.

See Anthony Williams' series on C++0x multithreading.

Petula answered 22/5, 2011 at 12:43 Comment(3)

A possible fix could be to make the outer load with memory_order_acquire and the inner with memory_order_relaxed. What do you think about this? – Honebein 22/5, 2011 at 14:23

That should make it race-free. But the inner load doesn't need to be atomic at all, in any sense. No reordering can happen to make the inner load happen before the mutex is locked, and once the mutex is locked there's no way to get "part" of another value back, even if Tp is bigger than a hardware word. – Petula 22/5, 2011 at 21:44

Great link. Anthony Williams does a great job explaining the what and why of the memory ordering used. – Briscoe 23/5, 2011 at 16:6

B

27

I think this a great question and John Calsbeek has the correct answer.

However, just to be clear a lazy singleton is best implemented using the classic Meyers singleton. It has garanteed correct semantics in C++11.

§ 6.7.4

... If control enters the declaration concurrently while the variable is being initialized, the concurrent execution shall wait for completion of the initialization. ...

The Meyer's singleton is preferred in that the compiler can aggressively optimize the concurrent code. The compiler would be more restricted if it had to preserve the semantics of a std::mutex. Furthermore, the Meyer's singleton is 2 lines and virtually impossible to get wrong.

Here is a classic example of a Meyer's singleton. Simple, elegant, and broken in c++03. But simple, elegant, and powerful in c++11.

class Foo
{
public:
   static Foo& instance( void )
   {
      static Foo s_instance;
      return s_instance;
   }
};

Briscoe answered 23/5, 2011 at 16:4 Comment(5)

The downvoter (not me) may not be aware of C++11's new guarantee of thread-safety for static local variables. Mind adding a standard citation? – Chyme 24/5, 2011 at 1:17

I thought I had provided an explanation. This question is only tangentially about singletons, and the Meyer's singleton isn't particularly robust, even if it is thread-safe now. However, I'll remove my downvote since you did flesh out the bit that does pertain to the actual question. – Cognition 25/5, 2011 at 21:39

i think it should be noted that MS still not support this C++11 feature even in Visual Studio 2013 – Ablative 12/2, 2014 at 8:59

I can confirm that VS2013 does NOT support C++11 thread-safe construction of function local statics. – Offenbach 16/7, 2014 at 17:43

I find the drawback with this solution is that one has no control of the lifespan of the singleton. In some cases it is good to be able to delete and recreate one e.g. when doing unit testing. – Amersham 19/10, 2018 at 7:43

P

20

That implementation is not race-free. The atomic store of the singleton, while it uses release semantics, will only synchronize with the matching acquire operation—that is, the load operation that is already guarded by the mutex.

It's possible that the outer relaxed load would read a non-null pointer before the locking thread finished initializing the singleton.

The acquire that is guarded by the lock, on the other hand, is redundant. It will synchronize with any store with release semantics on another thread, but at that point (thanks to the mutex) the only thread that can possibly store is the current thread. That load doesn't even need to be atomic—no stores can happen from another thread.

See Anthony Williams' series on C++0x multithreading.

Petula answered 22/5, 2011 at 12:43 Comment(3)

A possible fix could be to make the outer load with memory_order_acquire and the inner with memory_order_relaxed. What do you think about this? – Honebein 22/5, 2011 at 14:23

That should make it race-free. But the inner load doesn't need to be atomic at all, in any sense. No reordering can happen to make the inner load happen before the mutex is locked, and once the mutex is locked there's no way to get "part" of another value back, even if Tp is bigger than a hardware word. – Petula 22/5, 2011 at 21:44

Great link. Anthony Williams does a great job explaining the what and why of the memory ordering used. – Briscoe 23/5, 2011 at 16:6

C

7

See also call_once. Where you'd previously use a singleton to do something, but not actually use the returned object for anything, call_once may be the better solution. For a regular singleton you could do call_once to set a (global?) variable and then return that variable...

Simplified for brevity:

template< class Function, class... Args>
void call_once( std::once_flag& flag, Function&& f, Args&& args...);

Exactly one execution of exactly one of the functions, passed as f to the invocations in the group (same flag object), is performed.
No invocation in the group returns before the abovementioned execution of the selected function is completed successfully

Cockaleekie answered 20/11, 2011 at 16:31 Comment(0)

Recommended topics

Hot tags