confused about atomic class: memory_order_relaxed
Asked Answered
P

3

7

I am studying this site: https://gcc.gnu.org/wiki/Atomic/GCCMM/AtomicSync, which is very helpful to understand the topic about atomic class.

But this example about relaxed mode is hard to understand:

    /*Thread 1:*/

    y.store (20, memory_order_relaxed)
    x.store (10, memory_order_relaxed)
    /*Thread 2*/
 if (x.load (memory_order_relaxed) == 10)
      {
        assert (y.load(memory_order_relaxed) == 20) /* assert A */
        y.store (10, memory_order_relaxed)
      }
 /*Thread 3*/
     if (y.load (memory_order_relaxed) == 10)
        assert (x.load(memory_order_relaxed) == 10) /* assert B */

To me assert B should never fail, since x must be 10 and y=10 because of thread 2 has conditioned on this.

But the website says either assert in this example can actually FAIL.

Psychosocial answered 6/9, 2017 at 2:12 Comment(5)
If neither x or y is atomic, do you understand why the asserts may fail? memory_order_relaxed removed the possibility of data race, but in other aspects, it is the same as when they are not atomic.Loft
If you wanted both stores to happen sequentially, why didn't you use the sequential consistent memory order?Urology
Think of it this way, what if x had been loaded with another value before, and the cache line of thread 3 where x rests still hasn't been updated since 10 was stored in x, but y has been updated properly. By asking for a relaxed memory order, you're basically saying you don't care if it's the right value, only that it doesn't raceUrology
@Urology What could cause the cache line to contain stale values?Maneater
@Maneater Have you found the answers incomplete on that regard? If so, why not ask a new question?Urology
B
2

I find it much easier to understand atomics with some knowledge of what might be causing them, so here's some background knowledge. Know that these concepts are in no way stated in the C++ language itself, but is some of the possible reasons why things are the way they are.

Compiler reordering

Compilers, often when optimizing, will choose to refactor the program as long as its effects are the same on a single threaded program. This is circumvented with the use of atomics, which will tell the compiler (among other things) that the variable might change at any moment, and that its value might be read elsewhere.

Formally, atomics ensures one thing: there will be no data races. That is, accessing the variable will not make your computer explode.

CPU reordering

CPU might reorder instructions as it is executing them, which means the instructions might get reordered on the hardware level, independent of how you wrote the program.

Caching

Finally there are effects of caches, which are faster memory that sorta contains a partial copy of the global memory. Caches are not always in sync, meaning they don't always agree on what is "correct". Different threads may not be using the same cache, and due to this, they may not agree on what values variables have.

Back to the problem

What the above surmounts to is pretty much what C++ says about the matter: unless explicitly stated otherwise, the ordering of side effects of each instruction is totally and completely unspecified. It might not even be the same viewed from different threads.

Formally, the guarantee of an ordering of side effects is called a happens-before relation. Unless a side effect happens-before another, it is not. Loosely, we just say call it synchronization.

Now, what is memory_order_relaxed? It is telling the compiler to stop meddling, but don't worry about how the CPU and cache (and possibly other things) behave. Therefore, one possibility of why you see the "impossible" assert might be

  1. Thread 1 stores 20 to y and then 10 to x to its cache.
  2. Thread 2 reads the new values and stores 10 to y to its cache.
  3. Thread 3 didn't read the values from thread 1, but reads those of thread 2, and then the assert fails.

This might be completely different from what happens in reality, the point is anything can happen.

To ensure a happens-before relation between the multiple reads and writes, see Brian's answer.

Another construct that provides happens-before relations is std::mutex, which is why they are free from such insanities.

Bioecology answered 6/9, 2017 at 6:23 Comment(6)
@Psychosocial No, that's not true. Mutex is easier to reason with most of the time, yes. But it doesn't necessarily reduce concurrency, here is some more on the topic of locksBioecology
What are caches out of sync?Maneater
@Maneater Caches hold copies of values, saying they're out of sync just means they hold different values.Bioecology
@PasserBy Yes but when will they get different values?Maneater
@Maneater Any time the contents are modified and before they're flushed.Bioecology
@PasserBy Do you know how much time can other caches old the stale value?Maneater
E
6

To me assert B should never fail, since x must be 10 and y=10 because of thread 2 has conditioned on this.

In effect, your argument is that since in thread 2 the store of 10 into x occurred before the store of 10 into y, in thread 3 the same must be the case.

However, since you are only using relaxed memory operations, there is nothing in the code that requires two different threads to agree on the ordering between modifications of different variables. So indeed thread 2 may see the store of 10 into x before the store of 10 into y while thread 3 sees those two operations in the opposite order.

In order to ensure that assert B succeeds, you would, in effect, need to ensure that when thread 3 sees the value 10 of y, it also sees any other side effects performed by the thread that stored 10 into y before the time of the store. That is, you need the store of 10 into y to synchronize with the load of 10 from y. This can be done by having the store perform a release and the load perform an acquire:

// thread 2
y.store (10, memory_order_release);

// thread 3
if (y.load (memory_order_acquire) == 10)

A release operation synchronizes with an acquire operation that reads the value stored. Now because the store in thread 2 synchronizes with the load in thread 3, anything that happens after the load in thread 3 will see the side effects of anything that happens before the store in thread 2. Hence the assertion will succeed.

Of course, we also need to make sure assertion A succeeds, by making the x.store in thread 1 use release and the x.load in thread 2 use acquire.

Europe answered 6/9, 2017 at 3:46 Comment(1)
Thank you. Passer By also added some useful explanations in addition to yours.Psychosocial
B
2

I find it much easier to understand atomics with some knowledge of what might be causing them, so here's some background knowledge. Know that these concepts are in no way stated in the C++ language itself, but is some of the possible reasons why things are the way they are.

Compiler reordering

Compilers, often when optimizing, will choose to refactor the program as long as its effects are the same on a single threaded program. This is circumvented with the use of atomics, which will tell the compiler (among other things) that the variable might change at any moment, and that its value might be read elsewhere.

Formally, atomics ensures one thing: there will be no data races. That is, accessing the variable will not make your computer explode.

CPU reordering

CPU might reorder instructions as it is executing them, which means the instructions might get reordered on the hardware level, independent of how you wrote the program.

Caching

Finally there are effects of caches, which are faster memory that sorta contains a partial copy of the global memory. Caches are not always in sync, meaning they don't always agree on what is "correct". Different threads may not be using the same cache, and due to this, they may not agree on what values variables have.

Back to the problem

What the above surmounts to is pretty much what C++ says about the matter: unless explicitly stated otherwise, the ordering of side effects of each instruction is totally and completely unspecified. It might not even be the same viewed from different threads.

Formally, the guarantee of an ordering of side effects is called a happens-before relation. Unless a side effect happens-before another, it is not. Loosely, we just say call it synchronization.

Now, what is memory_order_relaxed? It is telling the compiler to stop meddling, but don't worry about how the CPU and cache (and possibly other things) behave. Therefore, one possibility of why you see the "impossible" assert might be

  1. Thread 1 stores 20 to y and then 10 to x to its cache.
  2. Thread 2 reads the new values and stores 10 to y to its cache.
  3. Thread 3 didn't read the values from thread 1, but reads those of thread 2, and then the assert fails.

This might be completely different from what happens in reality, the point is anything can happen.

To ensure a happens-before relation between the multiple reads and writes, see Brian's answer.

Another construct that provides happens-before relations is std::mutex, which is why they are free from such insanities.

Bioecology answered 6/9, 2017 at 6:23 Comment(6)
@Psychosocial No, that's not true. Mutex is easier to reason with most of the time, yes. But it doesn't necessarily reduce concurrency, here is some more on the topic of locksBioecology
What are caches out of sync?Maneater
@Maneater Caches hold copies of values, saying they're out of sync just means they hold different values.Bioecology
@PasserBy Yes but when will they get different values?Maneater
@Maneater Any time the contents are modified and before they're flushed.Bioecology
@PasserBy Do you know how much time can other caches old the stale value?Maneater
P
2

The answer to your question is the C++ standard.

The section [intro.races] is surprisingly very clear (which is not the rule of normative text kind: formalism consistency oftenly hurts readability).

I have read many books and tuto which treats the subject of memory order, but it just confused me. Finally I have read the C++ standard, the section [intro.multithread] is the clearest I have found. Taking the time to read it carefully (twice) may save you some time!

The answer to your question is in [intro.races]/4:

All modifications to a particular atomic object M occur in some particular total order, called the modification order of M. [ Note: There is a separate order for each atomic object. There is no requirement that these can be combined into a single total order for all objects. In general this will be impossible since different threads may observe modifications to different objects in inconsistent orders. — end note ]

You were expecting a single total order on all atomic operations. There is such an order, but only for atomic operations that are memory_order_seq_cst as explained in [atomics.order]/3:

There shall be a single total order S on all memory_order_seq_cst operations, consistent with the “happens before” order and modification orders for all affected locations [...]

Photolysis answered 6/9, 2017 at 7:33 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.