Is my Double-Checked Locking Pattern implementation right?

A

4

6

An example in Meyers's book Effective Modern C++, Item 16.

in a class caching an expensive-to-compute int, you might try to use a pair of std::atomic avriables instead of a mutex:

class Widget {
public:
    int magicValue() const {
        if (cachedValid) {
            return cachedValue;
        } else {
            auto val1 = expensiveComputation1();
            auto val2 = expensiveComputation2();

            cachedValue = va1 + val2;
            cacheValid = true;
            return cachedValue;
        }
    }
private:
    mutable std::atomic<bool> cacheValid { false };
    mutable std::atomic<int> cachedValue;
};

This will work, but sometimes it will work a lot harder than it should.Consider: A thread calls Widget::magicValue, sees cacheValid as false, performs the two expensive computations, and assigns their sum to cachedValud. At that point, a second thread calss Widget::magicValue, also sees cacheValid as false, and thus carries out the same expensive computations that the first thread has just finished.

Then he gives a solution with mutex:

class Widget {
public:
    int magicValue() const {
        std::lock_guard<std::mutex> guard(m);
        if (cacheValid) {
            return cachedValue;
        } else {
            auto val1 = expensiveComputation1();
            auto val2 = expensiveComputation2();

            cachedValue = va1 + val2;
            cacheValid = true;
            return cachedValue;
        }
    }
private:
    mutable std::mutex m;
    mutable bool cacheValid { false };
    mutable int cachedValue;
};

But I think the solution is not so effecient, I consider to combine mutex and atomic to make up a Double-Checked Locking Pattern as below.

class Widget {
public:
    int magicValue() const {
        if (!cacheValid)  {
            std::lock_guard<std::mutex> guard(m);
            if (!cacheValid) {
                auto val1 = expensiveComputation1();
                auto val2 = expensiveComputation2();

                cachedValue = va1 + val2;
                cacheValid = true;
            }
        }
        return cachedValue;
    }
private:
    mutable std::mutex m;
    mutable std::atomic<bool> cacheValid { false };
    mutable std::atomic<int> cachedValue;
};

Because I am a newbie in multithread programming, so I want to know：

Is my code right?
Does it performance better ?

EDIT:

Fixed the code.if (!cachedValue) -> if (!cacheValid)

Anthropophagi answered 5/5, 2015 at 8:49 Comment(8)

In principle, yes, you are right. But if we assume the computations take a lot longer than setting the atomics, the likelihood of your scenario is very low, and using the atomics avoid expensive mutexes. – Brendin 5/5, 2015 at 8:54

I think your approach is correct and better than the previous two. It just explained the Double-Checked Locking Pattern. – Assignment 5/5, 2015 at 9:10

I don't think it is correct, if a second thread evaluates the cacheValid after the first has evaluated but before the guard is instantiated. I think the impact may be equal to the first example. – Krawczyk 5/5, 2015 at 9:16

It's hard to tell if your code is legal/safe without seeing how cachedValue gets set to false. – Ieshaieso 5/5, 2015 at 9:46

@Assignment I have just fixed the code, – Anthropophagi 5/5, 2015 at 9:50

@DavidSchwartz mutable std::atomic<bool> cacheValid { false };. magicValue can only be called on a constructed object. And for this, cacheValid must have already been initialized to false before that. – Assignment 5/5, 2015 at 9:50

@Krawczyk I have fixed the type error , sorrry for mislead – Anthropophagi 5/5, 2015 at 9:52

@prehistoricpenguin: cachedValid variable actually has name cacheValid. Please, fix this in all code you provide (even in the book's snippets). – Analphabetic 5/5, 2015 at 10:17

A

1

Is my code right?

Yes. You application of Double-Checked Locking Pattern is correct. But see below for some improvements.

Does it performance better ?

When compared with fully locked variant(2nd in your post), it mostly has better performance, until magicValue() is called only once(but even in that case performance losses are negligibly small).

When compared with lockless variant(1st in your post), your code show better perfomance, until value computing is faster than waiting on mutex.

E.g., sum of 10 values is (usually) faster than waiting on mutex. In that case 1st variant is preferrable. From the other side, 10 reads from file is slower than waiting on mutex, so your variant is better then 1st.

Actually, there are simple improvemnts to your code, which make it faster (at least, on some machines) and improve code's understanding:

cachedValue variable doesn't require atomic semantic at all. It is protected by cacheValid flag, which atomicity do all the work. Moreover, single atomic flag can protect several non-atomic values.
Also, as noted in that answer https://mcmap.net/q/1817231/-is-my-double-checked-locking-pattern-implementation-right, when access to cacheValid flag you don't need sequential consistency order(which is applied by default when you simply read or write atomic variable), release-acquire order is sufficient.

class Widget {
public:
    int magicValue() const {
        //'Acquire' semantic when read flag.
        if (!cacheValid.load(std::memory_order_acquire))  { 
            std::lock_guard<std::mutex> guard(m);
            // Reading flag under mutex locked doesn't require any memory order.
            if (!cacheValid.load(std::memory_order_relaxed)) {
                auto val1 = expensiveComputation1();
                auto val2 = expensiveComputation2();

                cachedValue = va1 + val2;
                // 'Release' semantic when write flag
                cacheValid.store(true, std::memory_order_release);
            }
        }
        return cachedValue;
    }
private:
    mutable std::mutex m;
    mutable std::atomic<bool> cacheValid { false };
    mutable int cachedValue; // Atomic isn't needed here.
};

Analphabetic answered 5/5, 2015 at 11:32 Comment(6)

cachedValue need not be atomic in this solution, because writing to it is protected by the mutex. In the code in Meyers' book (and in the suggested solution by Maxim Egorushkin below), there is no mutex preventing multiple threads from simultaneously assigning to cachedValue (all such threads will have seen a cacheValid value of false at the top of the function). In that case, it's important that cachedValue be atomic, because otherwise simultaneous writes to it yield undefined behavior. – Torment 17/7, 2015 at 23:4

@KnowItAllWannabe: cachedValue is already not atomic in this solution. If you mean cacheValid, it should be atomic, because it is checked (first time) out of mutex protection This is main feature of DCLP, that flag should be either volatile*(in languages where this term is thread-related, e.g. java) or *atomic. – Analphabetic 17/7, 2015 at 23:21

My comment was just that: a comment. I agree that the given implementation is fine (and said so), but in your point 1 above the code, you write that "cachedValue variable doesn't require atomic semantic at all [because] it is protected by cacheValid flag." That's not true. cachedValue need not be atomic, because it's protected by the mutex. As I noted in my comment on Maxim Egorushkin's answer below, if there is no mutex, both cacheValid and cachedValue must be atomic. – Torment 18/7, 2015 at 0:49

@KnowItAllWannabe: Expression return cachedValue; is not protected by the mutex, so one cannot say that cachedValue variable is protected by the mutex. Mutex protects only some accesses, and prevent concurrent writting. – Analphabetic 18/7, 2015 at 19:35

The return statement is a read. Simultaneous reads are not a problem, hence don't need to be protected. – Torment 19/7, 2015 at 0:57

@KnowItAllWannabe: No, accesses to variable are said to be protected by mutex only when all accesses to that variable are performed with mutex locked. Without appropritate accesses to cacheValid variable, given example would be raced on cachedValue variable (and broken). – Analphabetic 19/7, 2015 at 17:6

A

2

As pointed out by HappyCactus, the second check if (!cachedValue) should actually be if (!cachedValid). Except this typo, I think your demonstration of the Double-Checked Locking Pattern is correct. However, I think it is unnecessary to use std::atomic on cachedValue. The only place where cachedValue is written to is cachedValue = va1 + val2;. Before it is completed, no thread will ever reach the statement return cachedValue; which is the only place cachedValue is read. Therefore, it is impossible for a write and a read to be concurrent. And there is no problem with concurrent reads.

Assignment answered 5/5, 2015 at 9:43 Comment(2)

Making it atomic is necessary: cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html. Since the flag is atomic, its write may be synchronized across processors, while the non-atomic value is not. So you could see cacheValid = true but cacheValue = /* uninitialized on this processor */ – Stringent 6/5, 2015 at 1:49

@BillyONeal No. The memory order semantics used on cacheValid is memory_order_seq_cst. It establishes the required synchronization relationships. When cacheValid is read true, the modification to cacheValue must also be seen on the same processor. This is actually why atomics can be used to order non-atomics. – Assignment 6/5, 2015 at 2:4

A

1

Is my code right?

Yes. You application of Double-Checked Locking Pattern is correct. But see below for some improvements.

Does it performance better ?

When compared with fully locked variant(2nd in your post), it mostly has better performance, until magicValue() is called only once(but even in that case performance losses are negligibly small).

When compared with lockless variant(1st in your post), your code show better perfomance, until value computing is faster than waiting on mutex.

E.g., sum of 10 values is (usually) faster than waiting on mutex. In that case 1st variant is preferrable. From the other side, 10 reads from file is slower than waiting on mutex, so your variant is better then 1st.

Actually, there are simple improvemnts to your code, which make it faster (at least, on some machines) and improve code's understanding:

cachedValue variable doesn't require atomic semantic at all. It is protected by cacheValid flag, which atomicity do all the work. Moreover, single atomic flag can protect several non-atomic values.
Also, as noted in that answer https://mcmap.net/q/1817231/-is-my-double-checked-locking-pattern-implementation-right, when access to cacheValid flag you don't need sequential consistency order(which is applied by default when you simply read or write atomic variable), release-acquire order is sufficient.

class Widget {
public:
    int magicValue() const {
        //'Acquire' semantic when read flag.
        if (!cacheValid.load(std::memory_order_acquire))  { 
            std::lock_guard<std::mutex> guard(m);
            // Reading flag under mutex locked doesn't require any memory order.
            if (!cacheValid.load(std::memory_order_relaxed)) {
                auto val1 = expensiveComputation1();
                auto val2 = expensiveComputation2();

                cachedValue = va1 + val2;
                // 'Release' semantic when write flag
                cacheValid.store(true, std::memory_order_release);
            }
        }
        return cachedValue;
    }
private:
    mutable std::mutex m;
    mutable std::atomic<bool> cacheValid { false };
    mutable int cachedValue; // Atomic isn't needed here.
};

Analphabetic answered 5/5, 2015 at 11:32 Comment(6)

cachedValue need not be atomic in this solution, because writing to it is protected by the mutex. In the code in Meyers' book (and in the suggested solution by Maxim Egorushkin below), there is no mutex preventing multiple threads from simultaneously assigning to cachedValue (all such threads will have seen a cacheValid value of false at the top of the function). In that case, it's important that cachedValue be atomic, because otherwise simultaneous writes to it yield undefined behavior. – Torment 17/7, 2015 at 23:4

@KnowItAllWannabe: cachedValue is already not atomic in this solution. If you mean cacheValid, it should be atomic, because it is checked (first time) out of mutex protection This is main feature of DCLP, that flag should be either volatile*(in languages where this term is thread-related, e.g. java) or *atomic. – Analphabetic 17/7, 2015 at 23:21

My comment was just that: a comment. I agree that the given implementation is fine (and said so), but in your point 1 above the code, you write that "cachedValue variable doesn't require atomic semantic at all [because] it is protected by cacheValid flag." That's not true. cachedValue need not be atomic, because it's protected by the mutex. As I noted in my comment on Maxim Egorushkin's answer below, if there is no mutex, both cacheValid and cachedValue must be atomic. – Torment 18/7, 2015 at 0:49

@KnowItAllWannabe: Expression return cachedValue; is not protected by the mutex, so one cannot say that cachedValue variable is protected by the mutex. Mutex protects only some accesses, and prevent concurrent writting. – Analphabetic 18/7, 2015 at 19:35

The return statement is a read. Simultaneous reads are not a problem, hence don't need to be protected. – Torment 19/7, 2015 at 0:57

@KnowItAllWannabe: No, accesses to variable are said to be protected by mutex only when all accesses to that variable are performed with mutex locked. Without appropritate accesses to cacheValid variable, given example would be raced on cachedValue variable (and broken). – Analphabetic 19/7, 2015 at 17:6

S

0

You can make your solution slightly more efficient by reducing the memory ordering requirements. The default sequential consistency memory order for atomic operations is not necessary here.

The performance difference may be negligible on x86, but noticeable on ARM, because sequential consistency memory order is expensive on ARM. See “Strong” and “weak” hardware memory models by Herb Sutter for more details.

Suggested changes:

class Widget {
public:
    int magicValue() const {
        if (cachedValid.load(std::memory_order_acquire)) { // Acquire semantics.
            return cachedValue;
        } else {
            auto val1 = expensiveComputation1();
            auto val2 = expensiveComputation2();

            cachedValue = va1 + val2; // Non-atomic write.

            // Release semantics.
            // Prevents compiler and CPU store reordering.
            // Makes this and preceding stores by this thread visible to other threads.
            cachedValid.store(true, std::memory_order_release); 
            return cachedValue;
        }
    }
private:
    mutable std::atomic<bool> cacheValid { false };
    mutable int cachedValue; // Non-atomic.
};

Secessionist answered 5/5, 2015 at 10:4 Comment(7)

Have you considered the situation : A thread calls Widget::magicValue, sees cacheValid as false, performs the two expensive computations, and assigns their sum to cachedValud. At that point, a second thread calss Widget::magicValue, also sees cacheValid as false, and thus carries out the same expensive computations that the first thread has just finished. – Anthropophagi 5/5, 2015 at 10:10

@Anthropophagi If calculating cachedValue must only be done once, you probably may like to use a mutex. – Secessionist 5/5, 2015 at 10:12

Downvoted, because if two threads read cacheValid as false, both will compute cachedValue, and if both make an assignment to cachedValue at the same time, you have two threads writing to a non-atomic at once, and that results in undefined behavior. – Torment 17/7, 2015 at 22:55

@KnowItAllWannabe: Concurrent writting same value into variable is OK for all known C++ implementation. Probably, C++ standard also notes that, but I don't remember where it does so. – Analphabetic 17/7, 2015 at 23:29

@Tsyvarev: As far as I know, it's undefined behavior, and there's an example of how it can go wrong in this paper. – Torment 18/7, 2015 at 0:43

@KnowItAllWannabe: Section 2.4. of the paper you reffer to says: " It seems unlikely on any conventional architecture that a store would be implemented such that rewriting the same bits would result in a final value different from either of the written values". All futher examples are not applicable to the given code as it doesn't access cachedValue before writting. The only exception is "note 6", which tells about SPARC's "block initializing store" instruction. – Analphabetic 18/7, 2015 at 19:50

@Tsyvarev: In terms of undefined behavior, the only document that matters is the C++ standard. To the best of my knowledge, simultaneous writes to a single memory location always yield undefined behavior, even if both writes are of the same value. The paper I cited simply explains circumstances--even theoretical ones--that could explain how UB could arise. Are you aware of something in the Standard that says that unsynchronized writes to a single memory location yield well-defined behavior if both writes are of the same value? – Torment 19/7, 2015 at 0:56

K

-1

It is not correct:

int magicValue() const {
    if (!cachedValid)  {

        // this part is unprotected, what if a second thread evaluates
        // the previous test when this first is here? it behaves 
        // exactly like in the first example.

        std::lock_guard<std::mutex> guard(m);
        if (!cachedValue) {
            auto val1 = expensiveComputation1();
            auto val2 = expensiveComputation2();

            cachedValue = va1 + val2;
            cachedValid = true;
        }
    }
    return cachedValue;

Krawczyk answered 5/5, 2015 at 9:21 Comment(3)

I don't think it's a problem for multiple threads to be in this part. – Assignment 5/5, 2015 at 9:26

It is, unless the second test should be read as "CachedValid" and not "CachedValue". I didn't guess it might be a typo, but it is obvious it is. – Krawczyk 5/5, 2015 at 9:31

You have a pair of sharp eyes :-) Didn't notice the typo. – Assignment 5/5, 2015 at 9:32

Recommended topics

Hot tags