First of all: You are in the land of UB because there is a race condition: Both flag
and a
are written and read from different threads without proper synchronization - this is always a data race. The C++ standard does not impose any requirements on implementations when you give them such a program.
There is therefore no way to "guarantee" a specific behavior.
However, we can look at assembly output to determine what a given compiled program can or cannot do. I was not successful at using reordering alone to show the problem with volatile
as synchronization mechanism, but below is a demonstration using a related optimization.
Here is an example of a program that has no data races:
std::atomic<int> a = 0;
std::atomic<int> flag = 0;
std::thread t1([&]() {
while (flag != 1);
int b = a;
std::cout << "b = " << b << std::endl;
});
std::thread t2([&]() {
a = 5;
int x = 1000000;
while (x-- > 1) flag = 0;
flag = 1;
x = 1000000;
while (x-- > 1) flag = 1;
flag = 0;
a = 0;
});
t1.join();
t2.join();
https://wandbox.org/permlink/J1aw4rJP7P9o1h7h
Indeed, the usual output of this program is b = 5
(other outputs are possible, or the program might not terminate at all with "unlucky" scheduling, but there is no UB).
If we use improper synchronization instead, we can see in the assembly that this output is no longer in the realm of possibility (given the guarantees of the x86 platform):
int a = 0;
volatile int flag = 0;
std::thread t1([&]() {
while (flag != 1);
int b = a;
std::cout << "b = " << b << std::endl;
});
std::thread t2([&]() {
a = 5;
int x = 1000000;
while (x-- > 1) flag = 0;
flag = 1;
x = 1000000;
while (x-- > 1) flag = 1;
flag = 0;
a = 0;
});
t1.join();
t2.join();
The assembly for the second thread body, as per https://godbolt.org/z/qsjca1:
std::thread::_State_impl<std::thread::_Invoker<std::tuple<main::{lambda()#2}> > >::_M_run():
mov rcx, QWORD PTR [rdi+8]
mov rdx, QWORD PTR [rdi+16]
mov eax, 999999
.L4:
mov DWORD PTR [rdx], 0
sub eax, 1
jne .L4
mov DWORD PTR [rdx], 1
mov eax, 999999
.L5:
mov DWORD PTR [rdx], 1
sub eax, 1
jne .L5
mov DWORD PTR [rdx], 0
mov DWORD PTR [rcx], 0
ret
Notice how a = 5;
has been completely optimized away. Nowhere in the compiled program does a
get a chance to take the value 5
.
As you can see in https://wandbox.org/permlink/Pnbh38QpyqKzIClY, the program will always output 0 (or not terminate), even though the original C++ code for thread 2 would - in a "naive" interpretation - always have a == 5
while flag == 1
.
The while
loops are of course to "burn time" and give the other thread a chance to interleave - sleep
or other system calls would generally constitute a memory barrier for the compiler and might break the effect of the second snippet.
volatile
s are not adequate for synchronization. This looks like undefined behavior. If you introduce synchronization, I suspect you will find that it makes it impossible to see the reordering. – Capitola = 0
afterflag = 1
and suddenly there is no morea = 5
: godbolt.org/z/iivl7A – Vagrancyvolatile
has no meaning basically. Thereproducebility
here means that you are able to find two specific compilers (and even corresponding OS) such that they produce a different result or by changing the code to letint b = a
happens beforea = 5
. – Marlainstruction reorder
indeed happens and influence thesequential consistency
. – Marla