How to write observable example for instruction reorder?
Asked Answered
M

1

8

Given an example:

#include <thread>
#include <iostream>

int main() {
    int a = 0;
    volatile int flag = 0;

    std::thread t1([&]() {
        while (flag != 1);

        int b = a;
        std::cout << "b = " << b << std::endl;
    });

    std::thread t2([&]() {
        a = 5;
        flag = 1;
    });

    t1.join();
    t2.join();
    return 0;
}

It is conceptually understandable that flag = 1; could be reordered and executed before a = 5;, therefore the result of b could be 5 or 0.

However, realistically, I cannot produce a result that output 0 on my machine. How could we guarantee the behavior or instruction reorder reproducibly? How to change the code example specifically?

Marla answered 16/7, 2019 at 14:14 Comment(7)
volatiles are not adequate for synchronization. This looks like undefined behavior. If you introduce synchronization, I suspect you will find that it makes it impossible to see the reordering.Capitol
@FrançoisAndrieux The goal kind of is to show the UB. But I agree that this is outside the realm of the C++ specification.Vagrancy
Not purely instruction reordering, but try adding a = 0 after flag = 1 and suddenly there is no more a = 5: godbolt.org/z/iivl7AVagrancy
Guarantee order with threads? No. Full stop. That's one of the reasons for using locks, mutexes and so on.Apparitor
@FrançoisAndrieux I doubt I mis-expressed my question. The idea here is to explore processor or compiler instruction reorder. volatile has no meaning basically. The reproducebility here means that you are able to find two specific compilers (and even corresponding OS) such that they produce a different result or by changing the code to let int b = a happens before a = 5.Marla
@Apparitor I aware of synchronizations and UB here. The question is not about guarantee order of threads. It is about sequential consistency. Say, for educational purpose, you have to let student observe instruction reorder indeed happens and influence the sequential consistency.Marla
@Jakob You are using C++ abstract machine terms such as "sequential consistency" and "happens before" to talk about a program that the C++ standard does not apply to (well, places no restrictions on). Yes, those terms also may have meaning outside of C++, but you would first need to identify a C++ implementation where "happens before" and "sequential consistency" are defined even for programs with C++ standard UB.Vagrancy
V
4

First of all: You are in the land of UB because there is a race condition: Both flag and a are written and read from different threads without proper synchronization - this is always a data race. The C++ standard does not impose any requirements on implementations when you give them such a program.

There is therefore no way to "guarantee" a specific behavior.

However, we can look at assembly output to determine what a given compiled program can or cannot do. I was not successful at using reordering alone to show the problem with volatile as synchronization mechanism, but below is a demonstration using a related optimization.


Here is an example of a program that has no data races:

std::atomic<int> a = 0;
std::atomic<int> flag = 0;

std::thread t1([&]() {
    while (flag != 1);

    int b = a;
    std::cout << "b = " << b << std::endl;
});

std::thread t2([&]() {
    a = 5;
    int x = 1000000;
    while (x-- > 1) flag = 0;
    flag = 1;
    x = 1000000;
    while (x-- > 1) flag = 1;
    flag = 0;
    a = 0;
});

t1.join();
t2.join();

https://wandbox.org/permlink/J1aw4rJP7P9o1h7h

Indeed, the usual output of this program is b = 5 (other outputs are possible, or the program might not terminate at all with "unlucky" scheduling, but there is no UB).


If we use improper synchronization instead, we can see in the assembly that this output is no longer in the realm of possibility (given the guarantees of the x86 platform):

int a = 0;
volatile int flag = 0;

std::thread t1([&]() {
    while (flag != 1);

    int b = a;
    std::cout << "b = " << b << std::endl;
});

std::thread t2([&]() {
    a = 5;
    int x = 1000000;
    while (x-- > 1) flag = 0;
    flag = 1;
    x = 1000000;
    while (x-- > 1) flag = 1;
    flag = 0;
    a = 0;
});

t1.join();
t2.join();

The assembly for the second thread body, as per https://godbolt.org/z/qsjca1:

std::thread::_State_impl<std::thread::_Invoker<std::tuple<main::{lambda()#2}> > >::_M_run():
        mov     rcx, QWORD PTR [rdi+8]
        mov     rdx, QWORD PTR [rdi+16]
        mov     eax, 999999
.L4:
        mov     DWORD PTR [rdx], 0
        sub     eax, 1
        jne     .L4
        mov     DWORD PTR [rdx], 1
        mov     eax, 999999
.L5:
        mov     DWORD PTR [rdx], 1
        sub     eax, 1
        jne     .L5
        mov     DWORD PTR [rdx], 0
        mov     DWORD PTR [rcx], 0
        ret

Notice how a = 5; has been completely optimized away. Nowhere in the compiled program does a get a chance to take the value 5.

As you can see in https://wandbox.org/permlink/Pnbh38QpyqKzIClY, the program will always output 0 (or not terminate), even though the original C++ code for thread 2 would - in a "naive" interpretation - always have a == 5 while flag == 1.

The while loops are of course to "burn time" and give the other thread a chance to interleave - sleep or other system calls would generally constitute a memory barrier for the compiler and might break the effect of the second snippet.

Vagrancy answered 16/7, 2019 at 15:8 Comment(4)
@Eric Wouldn't that last a = 0 be unsequenced with respect to b = a? Thread 1 could escape the endless while when flag is still 1, then get preempted, with thread 2 setting flag = 0, then both threads racing on a = 0 and b = a, no?Vagrancy
Yep, you’re right. Had to read the code a little closer. The last store to a in t2 and the load of a in t1 would be racy if not atomic and ordered such that flag=0 is sequenced before a=0.Strawflower
Thank you for your answer. I don't know if I expressed my question clearly. In your example, by appending a=0; in the end of t2 in fact introduces the order of a. In my original example, a = 5; and flag = 1; has no coupling inside t2 and therefore I learned from instruction reorder that processor or compiler can reorganize the execution order of these two lines. It seems your answer does not fit the intention here very well, itn't it?Marla
@Jakob I don't understand what you mean with "introduces the order of a". In any case: As I explicitly said in the answer, I could not give you an example that is pure reordering. However, the reason why a = 5; flag = 1; a = 0; can be optimized to flag = 1; a = 0; is exactly the same as why reordering is allowed to happen. One might argue that the compiler first reordered it to flag = 1; a = 5; a = 0; and then decided that a = 5 has no effect so it can be eliminated.Vagrancy

© 2022 - 2024 — McMap. All rights reserved.