What are some use cases for memory_order_relaxed

Asked 6/5, 2014 at 6:27 Answered 27/8, 2018 at 17:11

The C++ memory model has relaxed atomics, which do not put any ordering guarantees on memory operations. Other than the mailbox example in C which I have found here:

http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1525.htm

Based on the motivating example in this paper:

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2153.pdf

I was curious about other use cases for this type of synchronization mechanism.

Ethnic answered 6/5, 2014 at 6:27 Comment(0)

A simple example that I see in my work frequently is a stats counter. If you want to count the number of times an event happens but don't need any sort of synchronization across threads aside from making the increment safe, using memory_order_relaxed makes sense.

static std::atomic<size_t> g_event_count_;

void HandleEvent() {
  // Increment the global count. This operation is safe and correct even
  // if there are other threads concurrently running HandleEvent or
  // PrintStats.
  g_event_count_.fetch_add(1, std::memory_order_relaxed);

  [...]
}

void PrintStats() {
  // Snapshot the "current" value of the counter. "Current" is in scare
  // quotes because the value may change while this function is running.
  // But unlike a plain old size_t, reading from std::atomic<size_t> is
  // safe.
  const size_t event_count =
      g_event_count_.load(std::memory_order_relaxed);

  // Use event_count in a report.
  [...]
}

In both cases, there is no need to use a stronger memory order. On some platforms, doing so could have negative performance impact.

Maura answered 12/6, 2014 at 12:7 Comment(11)

Would it also be appropriate to use relaxed memory order in cases where something is lazily computed, and computing it more than once would be slightly inefficient but otherwise harmless? If a value will be read millions of times, even a tiny reduction in the cost of each read could more than make up for the cost of a few redundant compuations. – Candlefish 1/11, 2015 at 17:41

That seems fine to me, but you have to be very careful that you're not trying to synchronize using the value. For example, if you compute a struct and then try to use std::atomic<Struct*> with std::memory_order_relaxed, you're going to have a bad time, because you haven't ensured that other threads see the the writes initializing the struct before the write setting the pointer. – Maura 1/11, 2015 at 23:25

OK, so you have writers which atomically increment a counter. But eventually you will want to read the counter somewhere. Like the PrintStats() in your example. So is this only applicable when you have count increments that don't necessarily have to propagate immediately? When you read the counter with std::memory_order_relaxed could it be possible that you read an outdated g_event_count, or not? – Fregger 10/10, 2017 at 18:49

I found an answer to my own question: "The only way to guarantee you have the "latest" value is to use a read-modify-write operation such as exchange(), compare_exchange_strong() or fetch_add()" https://mcmap.net/q/18507/-concurrency-atomic-and-volatile-in-c-11-memory-model – Fregger 10/10, 2017 at 20:5

If you read the value, you're guaranteed to see any updates before the most recent synchronizing operation. For example, if thread A updates the counter, then unlocks a mutex that thread B takes and then reads the counter, thread B will see thread A's write. (This is compatible with your link because the mutex access is like a read-modify-write op.) In the absence of such a synchronizing event there is no such thing as the "latest" value, because without such an event there's no way to prove that you got a stale value. The write and read are happening concurrently. – Maura 10/10, 2017 at 23:42

since there is no synchronization between threads that call HandleEvent() and PrintStats(). Declaring static size_t g_event_count_ will have the same effect, no? – Discommend 5/10, 2019 at 12:24

No, that will cause undefined behavior due to a data race. The likely outcome based on the code compilers will actually generate is lost increments, but in theory anything could happen. – Maura 5/10, 2019 at 22:3

@Maura I see you point -- as the coder should respect what the standard mentions but not the specific architecture (e.g. binary produced for x86-64 should be the same for anything <= 64 bit with or without std::memory_order_relaxed but it is architecture specific). – Discommend 7/10, 2019 at 3:41

@HCSF: if you're writing assembly code then you get to reason about what the architecture does, but not if the compiler is writing assembly for you. There is no such thing as a benign data race. And in this case even forgetting the data race, you will still lose increments: the compiler may generate a naive "load, add 1, store", which is not atomic. – Maura 7/10, 2019 at 8:11

What if any other thread(s) call g_event_count_.store(0, std::memory_order_relaxed); to reset the counter? Would that break the correctness of the code? – Magisterial 21/12, 2021 at 18:21

It would of course reset the counter, but assuming that’s what you wanted to do it would still be correct. Each atomic has an underlying single total order of modifications, so the counter would now represent increments that came after the reset in that total order. There is no issue of data race because the operations remain atomic. – Maura 22/12, 2021 at 21:53

Event reader in this case could be connected to X11 socket, where frequency of events depends from a user actions (resizing window, typing, etc.) And if the GUI thread's event dispatcher is checking for events at regular intervals (e.g. due to some timer events in user application) we don't want to needlessly block event reader thread by acquiring lock on the shared event queue which we know is empty. We can simply check if anything has been queued by using the 'dataReady' atomic. This is also known as "Double-checked locking" pattern.

namespace {
std::mutex mutex;
std::atomic_bool dataReady(false);
std::atomic_bool done(false);
std::deque<int> events; // shared event queue, protected by mutex
}

void eventReaderThread()
{
    static int eventId = 0;
    std::chrono::milliseconds ms(100);
    while (true) {
        std::this_thread::sleep_for(ms);
        mutex.lock();
        eventId++; // populate event queue, e.g from pending messgaes on a socket
        events.push_back(eventId);
        dataReady.store(true, std::memory_order_release);
        mutex.unlock();
        if (eventId == 10) {
            done.store(true, std::memory_order_release);
            break;
        }
    }
}

void guiThread()
{
    while (!done.load(std::memory_order_acquire)) {
        if (dataReady.load(std::memory_order_acquire)) { // Double-checked locking pattern
            mutex.lock();
            std::cout << events.front() << std::endl;
            events.pop_front();
            // If consumer() is called again, and producer() has not added new events yet,
            // we will see the value set via this memory_order_relaxed.
            // If producer() has added new events, we will see that as well due to normal
            // acquire->release.
            // relaxed docs say: "guarantee atomicity and modification order consistency"
            dataReady.store(false, std::memory_order_relaxed);
            mutex.unlock();
        }
    }
}

int main()
{
    std::thread producerThread(eventReaderThread);
    std::thread consumerThread(guiThread);
    producerThread.join();
    consumerThread.join();
}

Cedar answered 27/8, 2018 at 17:11 Comment(0)

Recommended topics

Hot tags