Is the following singleton implementation data-race free?
static std::atomic<Tp *> m_instance;
...
static Tp &
instance()
{
if (!m_instance.load(std::memory_order_relaxed))
{
std::lock_guard<std::mutex> lock(m_mutex);
if (!m_instance.load(std::memory_order_acquire))
{
Tp * i = new Tp;
m_instance.store(i, std::memory_order_release);
}
}
return * m_instance.load(std::memory_order_relaxed);
}
Is the std::memory_model_acquire
of the load operation superfluous? Is it possible to further relax both load and store operations by switching them to std::memory_order_relaxed
? In that case, is the acquire/release semantic of std::mutex
enough to guarantee its correctness, or a further std::atomic_thread_fence(std::memory_order_release)
is also required to ensure that the writes to memory of the constructor happen before the relaxed store? Yet, is the use of fence equivalent to have the store with memory_order_release
?
EDIT: Thanks to the answer of John, I came up with the following implementation that should be data-race free. Even though the inner load could be non-atomic at all, I decided to leave a relaxed load in that it does not affect the performance. In comparison to always have an outer load with the acquire memory order, the thread_local machinery improves the performance of accessing the instance of about an order of magnitude.
static Tp &
instance()
{
static thread_local Tp *instance;
if (!instance &&
!(instance = m_instance.load(std::memory_order_acquire)))
{
std::lock_guard<std::mutex> lock(m_mutex);
if (!(instance = m_instance.load(std::memory_order_relaxed)))
{
instance = new Tp;
m_instance.store(instance, std::memory_order_release);
}
}
return *instance;
}
m_mutex
be defined? – Briscoe[15] ISO/IEC 14882:
1998 – Josiethread_local
is does not implement singleton pattern? Your instances will be multiple, one per thread. – Ruggierom_instance
. Thethread_local instance
is just a cache for the instance to avoid generating an atomic load per access – Intensivethread_local
implemented in GCC, currently, I think it would be better to do without thethread_local
variable and incur atomic read instead. – Ruggiero