Can atomic loads be merged in the C++ memory model?
Asked Answered
E

2

22

Consider the C++ 11 snippet below. For GCC and clang this compiles to two (sequentially consistent) loads of foo. (Editor's note: compilers do not optimize atomics, see this Q&A for more details, especially http://wg21.link/n4455 standards discussion about the problems this could create which the standard doesn't give programmers tools to work around. This language-lawyer Q&A is about the current standard, not what compilers do.)

Does the C++ memory model allow the compiler to merge these two loads into a single load and use the same value for x and y?

(Editor's note: this is something the standards group is working on: http://wg21.link/n4455 and http://wg21.link/p0062. The current standard on paper allows behaviours that are undesirable.)


I think it cannot merge these loads, because that means that polling an atomic doesn't work anymore, but I cannot find the relevant part in the memory model documentation.

#include <atomic>
#include <cstdio>

std::atomic<int> foo;

int main(int argc, char **argv)
{
    int x = foo;
    int y = foo;

    printf("%d %d\n", x, y);
    return 0;
}
Eyla answered 14/10, 2015 at 14:22 Comment(24)
I believe a sufficiently smart compiler could merge these two loads under as-if rule: any optimization is legal as long as a conforming program cannot tell the difference. On the other hand, if you are polling in a loop, then the compiler does have to issue multiple reads.Ever
I think they''re semantically different. What if foo is modified immediately after the first initialization? The semantic allows x and y to have different values. In your case, however, since no one modifies foo, the compiler may do the optimization.Beverleebeverley
To guarantee two loads, use volatile. That's what it's for.Trevethick
@Nawaz Define "immediately after". The term doesn't make much sense in a multithreaded program in the absence of synchronization.Ever
@IgorTandetnik: re "On the other hand, if you are polling in a loop, then the compiler does have to issue multiple reads", why. The optimization rests on whether the compiler can prove that foo is not modified (as far as the C++ code is concerned), not on where the code that uses it, is.Trevethick
@IgorTandetnik: "immediately after" means between the two initializations.Beverleebeverley
@Cheersandhth.-Alf Yes, the more I look at the text of the standard, the less I'm convinced that the standard says what I originally claimed it said. It says an atomic load takes its value from some atomic store that didn't happen-after the load - but it doesn't have to be the most recent store (nor is there a good way to define "most recent").Ever
@Nawaz That's begging the question. Define "between".Ever
@IgorTandetnik: Define "define".Beverleebeverley
@Nawaz Explain how the term "one evaluation is between two others" relates to the terms and definitions in the C++ standard. The closest I can think of is "X is between A and B iff A happens-before ([intro.multithread]/14) X and X happens-before B" - but in the OP's example, no evaluation can satisfy this predicate, so you must be meaning something different.Ever
@IgorTandetnik: I meant exactly that. Also, I said that in general case, not referring to the question (the question is not useful in general case, e.g why use std::atomic when there is no thread? Or if I'm allowed to assume there are threads then one of them may execute X between A and B). Though in this exact form of the question, none of these apply (please read my first comment again).Beverleebeverley
Sorry, I dont understand. Can any one can tell me why y = (x = foo) is not correct? the value set to x will be assigned to y after x is set.Wiencke
@Nawaz Even if there were another thread busily modifying foo, no side effect would happen-after int x = foo;, nor would int y = foo; happen-after any such side effect, in the way the term happens-after is defined by the standard. You are thinking in terms of linear passage of time - but there is no single consistent timeline of evaluations and side effects in a parallel program.Ever
@KenmanTsang Why are you bringing up y = (x = foo)? I don't see anyone claiming that it's not "correct"; wherever did you get that idea? Looks perfectly OK to me, though completely irrelevant to the discussion at hand.Ever
@IgorTandetnik: So you mean x and y are guaranteed to have same value, irrespective of other threads busily modifying foo?Beverleebeverley
@Nawaz No, not guaranteed - a program may observe x and y to have different values. However, a conforming program may also legitimately observe x and y to always be equal - and that gives the optimizer an opportunity to eliminate one load, because a program won't be able to tell the difference between x and y being equal by pure coincidence, or through a deliberate optimization. That's the very crux of the as-if rule, the rule that permits optimizations in the first place.Ever
"different values? then "always be equal"? You're not contradicting. Please consider only this case. Not a general case. Or at least do mention which case you're considering. Your comment led me to ask my question, so please stick to the context of your own comment.Beverleebeverley
@Nawaz I must admit I do not understand the nature of your objection.Ever
@IgorTandetnik: Please ignore my previous comment. Now if x and y could have different values, why is that? Somebody has modified foo after x is initialized and before y is initialized?Beverleebeverley
@Nawaz Imagine this line added right after the opening brace in main: std::thread t( []() { for(;;) {++foo;} } ); This is what I mean by another thread busily modifying foo. With the program so adjusted, it is possible for x and y to end up with different values, but it's also possible for them to legitimately end up with the same value. The optimizer can take advantage of this latter fact, by only loading foo once.Ever
@IgorTandetnik: I understand that part. I asked this: Now if x and y could have different values, why is that? Somebody has modified foo after x is initialized and before y is initialized? –Beverleebeverley
@Nawaz Yes, of course. How else? That "somebody" being the second thread.Ever
@IgorTandetnik: When exactly did it do that?Beverleebeverley
@Nawaz Doesn't matter, as long as it didn't happen-after the read. Quoth the standard: "[intro.multithread]/16 The value of an atomic object M, as determined by evaluation B, shall be the value stored by some side effect A that modifies M, where B does not happen before A." So a load can reflect any write that doesn't affirmatively happen-after it - that is, any write that happened-before, or that was unsynchronized with, the load (further subject to coherence rules: y cannot reflect the write that happened-before the one that x reflects).Ever
S
18

Yes, because we can not observe the difference!

An implementation is allowed to turn your snippet into the following (pseudo-implementation).

int __loaded_foo = foo;

int x = __loaded_foo;
int y = __loaded_foo;

The reason is that there is no way for you to observe the difference between the above, and two separate loads of foo given the guarantees of sequential-consistency.

Note: It is not just the compiler that can make such an optimization, the processor can simply reason that there is no way in which you can observe the difference and load the value of foo once — even though the compiler might have asked it to do it twice.





Explanation

Given a thread that keeps on updating foo in an incremental fashion, what you are guaranteed is that y will have either the same, or a later written value, when compared to the contents of x.

// thread 1 - The Writer
while (true) {
  foo += 1;
}
// thread 2 - The Reader
while (true) {
  int x = foo;
  int y = foo;

  assert (y >= x); // will never fire, unless UB (foo has reached max value)
}                  

Imagine the writing thread for some reason pauses its execution on every iteration (because of a context-switch or other implementation defined reason); there is no way in which you can prove that this is what is causing both x and y to have the same value, or if it is because of a "merge optimization".


In other words, we have to potential outcomes given the code in this section:

  1. No new value is written to foo between the two reads (x == y).
  2. A new value is written to foo between the two reads (x < y).

Since any of the two can happen, an implementation is free to narrow down the scope to simply always execute one of them; we can in no way observe the difference.





What does the Standard say?

An implementation can make whatever changes it wants as long as we cannot observe any difference between the behavior which we expressed, and the behavior during execution.

This is covered in [intro.execution]p1:

The semantic descriptions in this International Standard define a parameterized nondeterministic abstract machine. This International Standard places no requirement on the structure of conforming implementations. In particular, they need not copy or emulate the structure of the abstract machine. Rather, conforming implementations are required to emulate (only) the observable behavior of the abstract machine as explained below.

Another section which makes it even more clear [intro.execution]p5:

A conforming implementation executing a well-formed program shall produce the same observable behavior as one of the possible executions of the corresponding instance of the abstract machine with the same program and the same input.

Further Reading:





What about polling in a loop?

// initial state
std::atomic<int> foo = 0;
// thread 1
while (true) {
  if (foo)
    break;
}
// thread 2
foo = 1

Question: Given the reasoning in the previous sections, could an implementation simply read foo once in thread 1, and then never break out of the loop even if thread 2 writes to foo?

The answer; No.

In a sequentially-consistent environment we are guaranteed that a write to foo in thread 2 will become visible in thread 1; this means that when that write has happened, thread 1 must observe this change of state.

Note: An implementation can turn two reads into a single one because we cannot observe the difference (one fence is just as effective as two), but it cannot completely disregard a read that exists by itself.

Note: The contents of this section is guaranteed by [atomics.order]p3-4.





What if I really want to prevent this form of "optimization"?

If you would like to force the implementation to actually read the value of some variable at every point where you have written it you should look into usage of volatile (note that this in no way enhances thread-safety).

But in practice compilers don't optimize atomics, and the standards group has recommended against using volatile atomic for this kind of reason until the dust settles on this issue. See

Sepulture answered 14/10, 2015 at 14:32 Comment(6)
Let us continue this discussion in chat.Crackbrain
The polling secton needs work. The fact that each unblocked thread must experience time 'after' other threads at some point is hard to tease out from the standard. Then again, maybe a Q&A in a separate post might be better than making this one bigger.Twirp
@Yakk I certainly agree with you, the previous comment that links to a discussion regarding the polling section certainly clarifies it; do you think I need to incorporate it in the post itself?Crackbrain
@Yakk I mean, just including [atomics.order]p3-4 isn't a too big of addition, thought as you said between the line; it isn't the clearest of wording (even though that is the definite bit that guarantees what the post is saying).Crackbrain
@Yakk I will probably create a self-answered Q&A that addresses any confusion in regards of what we are talking about (as a build up from this question).Crackbrain
In practice compilers don't optimize atomics. You're correct about the in-theory as far as the C++ standard on paper, but in practice there are potential problems with that so compilers aren't going to do that. I don't think this question is an exact duplicate of Why don't compilers merge redundant std::atomic writes? but there should probably be a more prominent note in your answer and/or the question.Wakashan
H
-1

Yes, in your particular example (no otherwise).

Your particular example has a single thread of execution, foo has static storage duration and initialization (that is, before main is entered) and it is otherwise never modified during the program's lifetime.
In other words, there is no externally observable difference, and the as-if rule can legally be applied. In fact, the compiler could do away with atomic instructions, alltogether. There is no way the value of x or y could be anything different, ever.

In a program with concurrency which modifies foo, this is not the case.

You do not specify a memory model, so the default model is used, which is sequential consistency. Sequencial consistency is defined as giving the same happens-before / memory ordering guarantees as release/acquire and establish a single total modification order of all atomic operations. That last bit is the important part.

Single total modification order means that if you have three (atomic) operations, e.g. A, B, and C which happen in that order (maybe concurrently, in two threads), and B is a write operation while A and C are read operations, then C must see the state established by B, not some other earlier state. That is, the value seen at points A and C will be different.

In terms of your code sample, if another thread modifies foo immediately after you are reading it into x (but before you are reading the value into y), the value that is placed into y must be the value that was written. Because if the operations happen in that order, they must also be realized in that order.

Of course, a write that happens exactly in between two consecutive load instructions is a rather unlikely thing (since the time window is very small, a mere single tick), but it does not matter whether it is unlikely.
The compiler must produce code that makes sure that if this constellation arises, operations are still seen in exactly the order in which they happened.

Heterotopia answered 14/10, 2015 at 15:20 Comment(13)
-1: this is incorrect. Given A, B, and C, there is no guarantee that the order will be A -> B -> C (this doesn't even hold up in theory) given two threads where one executes A and C, and the other B; as such there is nothing saying that the happens-before relationship is broken with a single read (instead of two).Crackbrain
then C must see the state established by B, not some other earlier state. This presupposes that C happens-after B. But what would establish this relationship?Ever
@IgorTandetnik: This relationship is established where I say "... which happen in that order". It is not necessary that the operations always happen in that order (with several threads, for example). But if they do happen in that order (the precondition), they must be realized in that order, too. Even if unlikely, as long as you can get this exact order A-B-C (i.e. with two threads), the compiler must thus use 2 loads. The generated code is required to work as-specified not only in the likely case, but also in the unlikely case.Heterotopia
@Heterotopia see this comment and this answer, [intro.execution]p1-5 is very important.Crackbrain
"this is incorrect. Given A, B, and C, there is no guarantee that the order will be A -> B -> C " -- it is certainly correct. There is no need for a guarantee. The standard requires that if this is the order in which A, B, and C happen, then C must see the result of B. That is, if this constellation can happen (and it can happen if there is concurrency) then the code must be prepared for it.Heterotopia
@Heterotopia I expected you to write something like that, see my previous comment.Crackbrain
Yes, but that comment is irrelevant. It is of course correct in the scope of the single-threaded code that you have posted. Which, as I said, makes the as-if rule applicable. However, in general this is not true. In general, you will use atomics in presence of concurrency (it is nonsensical to do so without!). And in presence of concurrency, the assumption "no visible difference" is not true, if the standard is followed.Heterotopia
@Heterotopia What does "happen in this order" mean, exactly, if not a happens-after relationship? There is indeed a total modification order - a total order of writes - but there isn't a total order of reads and writes combined.Ever
deleted my comment since that could be interpreted in a misleading manner, @IgorTandetnik's comment is a better approach to explain it.Crackbrain
Proof of concept: coliru.stacked-crooked.com/a/54ccae97b0126292 If the reads of x and y couldn't be different, why does the assert fail? Well, because if the write happens before y is read, they are different.Heterotopia
No one has said that the assert cannot fail, but we got two potential outcomes (which is why the assert in my post is y >= x); given these two outcomes, the implementation can choose to only make it possible to observe one (as stated in [intro.execution]p5).Crackbrain
@Heterotopia No one says that x and y absolutely cannot be different. What we say is this: they may or may not be different, a conforming program cannot rely on them being different, and that gives leeway to the optimizer, should it so choose, to eliminate one load and make them always the same. Such an optimization would be legal - but of course, like any other optimization, the compiler isn't required to actually perform it. Apparently, in your example, it doesn't.Ever
@IgorTandetnik: In practice, compilers don't optimize atomics, they basically treat them like volatile atomic. Why don't compilers merge redundant std::atomic writes?Wakashan

© 2022 - 2024 — McMap. All rights reserved.