Yes, because we can not observe the difference!
An implementation is allowed to turn your snippet into the following (pseudo-implementation).
int __loaded_foo = foo;
int x = __loaded_foo;
int y = __loaded_foo;
The reason is that there is no way for you to observe the difference between the above, and two separate loads of foo given the guarantees of sequential-consistency.
Note: It is not just the compiler that can make such an optimization, the processor can simply reason that there is no way in which you can observe the difference and load the value of foo
once — even though the compiler might have asked it to do it twice.
Explanation
Given a thread that keeps on updating foo in an incremental fashion, what you are guaranteed is that y
will have either the same, or a later written value, when compared to the contents of x
.
// thread 1 - The Writer
while (true) {
foo += 1;
}
// thread 2 - The Reader
while (true) {
int x = foo;
int y = foo;
assert (y >= x); // will never fire, unless UB (foo has reached max value)
}
Imagine the writing thread for some reason pauses its execution on every iteration (because of a context-switch or other implementation defined reason); there is no way in which you can prove that this is what is causing both x
and y
to have the same value, or if it is because of a "merge optimization".
In other words, we have to potential outcomes given the code in this section:
- No new value is written to foo between the two reads (
x == y
).
- A new value is written to foo between the two reads (
x < y
).
Since any of the two can happen, an implementation is free to narrow down the scope to simply always execute one of them; we can in no way observe the difference.
What does the Standard say?
An implementation can make whatever changes it wants as long as we cannot observe any difference between the behavior which we expressed, and the behavior during execution.
This is covered in [intro.execution]p1
:
The semantic descriptions in this International Standard define a
parameterized nondeterministic abstract machine. This International
Standard places no requirement on the structure of conforming
implementations. In particular, they need not copy or emulate the
structure of the abstract machine. Rather, conforming implementations
are required to emulate (only) the observable behavior of the abstract
machine as explained below.
Another section which makes it even more clear [intro.execution]p5
:
A conforming implementation executing a well-formed program shall
produce the same observable behavior as one of the possible executions
of the corresponding instance of the abstract machine with the same
program and the same input.
Further Reading:
What about polling in a loop?
// initial state
std::atomic<int> foo = 0;
// thread 1
while (true) {
if (foo)
break;
}
// thread 2
foo = 1
Question: Given the reasoning in the previous sections, could an implementation simply read foo
once in thread 1, and then never break out of the loop even if thread 2 writes to foo
?
The answer; No.
In a sequentially-consistent environment we are guaranteed that a write to foo in thread 2 will become visible in thread 1; this means that when that write has happened, thread 1 must observe this change of state.
Note: An implementation can turn two reads into a single one because we cannot observe the difference (one fence is just as effective as two), but it cannot completely disregard a read that exists by itself.
Note: The contents of this section is guaranteed by [atomics.order]p3-4
.
What if I really want to prevent this form of "optimization"?
If you would like to force the implementation to actually read the value of some variable at every point where you have written it you should look into usage of volatile
(note that this in no way enhances thread-safety).
But in practice compilers don't optimize atomics, and the standards group has recommended against using volatile atomic
for this kind of reason until the dust settles on this issue. See
foo
is modified immediately after the first initialization? The semantic allowsx
andy
to have different values. In your case, however, since no one modifiesfoo
, the compiler may do the optimization. – Beverleebeverleyvolatile
. That's what it's for. – Trevethickfoo
is not modified (as far as the C++ code is concerned), not on where the code that uses it, is. – Trevethickstd::atomic
when there is no thread? Or if I'm allowed to assume there are threads then one of them may executeX
betweenA
andB
). Though in this exact form of the question, none of these apply (please read my first comment again). – Beverleebeverleyfoo
, no side effect would happen-afterint x = foo;
, nor wouldint y = foo;
happen-after any such side effect, in the way the term happens-after is defined by the standard. You are thinking in terms of linear passage of time - but there is no single consistent timeline of evaluations and side effects in a parallel program. – Every = (x = foo)
? I don't see anyone claiming that it's not "correct"; wherever did you get that idea? Looks perfectly OK to me, though completely irrelevant to the discussion at hand. – Everx
andy
are guaranteed to have same value, irrespective of other threads busily modifyingfoo
? – Beverleebeverleyx
andy
to have different values. However, a conforming program may also legitimately observex
andy
to always be equal - and that gives the optimizer an opportunity to eliminate one load, because a program won't be able to tell the difference betweenx
andy
being equal by pure coincidence, or through a deliberate optimization. That's the very crux of the as-if rule, the rule that permits optimizations in the first place. – Everx
andy
could have different values, why is that? Somebody has modifiedfoo
afterx
is initialized and beforey
is initialized? – Beverleebeverleymain
:std::thread t( []() { for(;;) {++foo;} } );
This is what I mean by another thread busily modifyingfoo
. With the program so adjusted, it is possible forx
andy
to end up with different values, but it's also possible for them to legitimately end up with the same value. The optimizer can take advantage of this latter fact, by only loadingfoo
once. – EverM
, as determined by evaluationB
, shall be the value stored by some side effectA
that modifiesM
, whereB
does not happen beforeA
." So a load can reflect any write that doesn't affirmatively happen-after it - that is, any write that happened-before, or that was unsynchronized with, the load (further subject to coherence rules:y
cannot reflect the write that happened-before the one thatx
reflects). – Ever