Does memory_order_relaxed respect data dependencies within the same thread?
Asked Answered
S

1

5

Given:

std::atomic<uint64_t> x;

uint64_t f()
{
    x.store(20, std::memory_order::memory_order_relaxed);
    x.store(10, std::memory_order::memory_order_relaxed);
    return x.load(std::memory_order::memory_order_relaxed);
}

Is it ever possible for f to return a value other than 10, assuming there is only one thread writing to x? This would obviously not be true for a non-atomic variable, but I don't know if relaxed is so relaxed that it will ignore data dependencies in the same thread?

Sexual answered 26/10, 2021 at 15:2 Comment(2)
As a general rule, memory ordering never affects observable behavior within a single thread. Each thread will behave as if its own code executes precisely in program order, and relaxed is no exception to this. Anything else would be utter madness.Moonraker
This is generally the case at the level of machine code too: all out-of-order execution is completely transparent to an individual thread. So for instance, if the stores to x are put into the core's store buffer and written to L1 cache out of order, then a later load of x within the same core has to fulfill it from the latest store buffer entry; it can't load the stale value from L1 cache. (Here "later" again means "in program order", i.e. in the order that the instructions appear in memory, taking into consideration all jumps and so forth.)Moonraker
R
6

The result of the load is always 10 (assuming there is only one thread). Even a relaxed atomic variable is "stronger" than a non-atomic variable:

  1. as with a non-atomic variable, all threads must agree on a single order in which all modifications to that variable occur,
  2. as with a non-atomic variable, that single order is consistent with the "sequenced before" relationship, and
  3. the implementation will guarantee that potentially concurrent accesses will somehow sort themselves out into some order that all threads will agree on (and thus satisfy requirement 1). On the other hand, in the case of a non-atomic variable, potentially concurrent accesses result in undefined behaviour.

A relaxed atomic variable can't be used to synchronize different threads with each other, unless accompanied by explicit fences. That's the sense in which it's relaxed, compared with the other memory orderings that are applicable to atomic variables.

For language lawyering, see C++20 [intro.races]/10:

An evaluation A happens before an evaluation B (or, equivalently, B happens after A) if:

  • A is sequenced before B, or [...]

and [intro.races]/15:

If an operation A that modifies an atomic object M happens before an operation B that modifies M, then A shall be earlier than B in the modification order of M. [Note: This requirement is known as write-write coherence. — end note]

and [intro.races]/18:

If a side effect X on an atomic object M happens before a value computation B of M , then the evaluation B shall take its value from X or from a side effect Y that follows X in the modification order of M. [Note: This requirement is known as write-read coherence. — end note]

Thus, in your program, the store of 20 happens before the store of 10 (since it is sequenced before it) and the store of 10 happens before the load. The write-write coherence requirement guarantees that the store of 10 occurs later in the modification order of x than the store of 20. When the load occurs, it is required to take its value from the store of 10, since the store of 10 happens before it and there is no other modification that can follow the store of 10 in the modification order of x.

Rudin answered 26/10, 2021 at 15:25 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.