std::memory_order and instruction order, clarification

Asked 8/1, 2020 at 17:53 Answered 3/2, 2020 at 9:57

This is a follow up question to this one.

I want to figure exactly the meaning of instruction ordering, and how it is affected by the std::memory_order_acquire, std::memory_order_release etc...

In the question I linked there's some detail already provided, but I felt like the provided answer isn't really about the order (which was more what was I looking for) but rather motivating a bit why this is necessary etc.

I'll quote the same example which I'll use as reference

#include <thread>
#include <atomic>
#include <cassert>
#include <string>

std::atomic<std::string*> ptr;
int data;

void producer()
{
    std::string* p  = new std::string("Hello");
    data = 42;
    ptr.store(p, std::memory_order_release);
}

void consumer()
{
    std::string* p2;
    while (!(p2 = ptr.load(std::memory_order_acquire)))
        ;
    assert(*p2 == "Hello"); // never fires
    assert(data == 42); // never fires
}

int main()
{
    std::thread t1(producer);
    std::thread t2(consumer);
    t1.join(); t2.join();
}

In a nutshell I want to figure what exactly happens with the instruction order at both line

ptr.store(p, std::memory_order_release);

and

while (!(p2 = ptr.load(std::memory_order_acquire)))

Focusing on the first according to the documentation

... no reads or writes in the current thread can be reordered after this store ...

I've been watching few talks to understand this ordering issue, I understand why it is important now. The thing I cannot quite figure yet how the compiler translates the order specification, I think also the example given by the documentation isn't particularly useful as well because after the store operation in the thread running producer there's no other instruction, hence nothing would be re-ordered anyway. However is also possible I'm missunderstanding, is it possible they mean that the equivalent assembly of

std::string* p  = new std::string("Hello");
data = 42;
ptr.store(p, std::memory_order_release);

will be such that the first two lines translated will never be moved after the atomic store? Likewise in the thread running producer is it possible that none of the asserts (or the equivalent assembly) will ever be moved before the atomic load? Suppose I had a third instruction after the store what would happen to those instruction instead which would be already after the atomic load?

I've also tried to compile such code to save the intermediate assembly code with the -S flag, but it's quite large and I can't really figure.

Again, to clarify, this question is about how the ordering, is not about why these mechanism are useful or necessary.

Cow answered 8/1, 2020 at 17:53 Comment(9)

It can help to understand modernescpp.com/index.php/sequential-consistency – Rick 8/1, 2020 at 18:29

It's not that lines of code will be reordered. Any given compiler will produce the same machine code instructions for the same input program. The reordering atomics actually address is that the physical hardware itself is allowed to reorder and execute those machine code instructions out of order so long as the end result is not changed. But in the case of atomics this can be breaking. The memory orders translate into machine code instructions telling the processor not to reorder in certain ways across these boundaries. – Sailfish 8/1, 2020 at 18:44

@Cruz Jean, this is great but still my question is how exactly those two modes I highlighted affect the order? – Cow 8/1, 2020 at 19:58

preshing.com/20120625/memory-ordering-at-compile-time – Charest 8/1, 2020 at 20:8

@Cow acquire means you want the most up to date values in memory, which means stores cannot be reordered to be after it. release is the counterpart, meaning loads cannot be reordered to be before it. Together they can be used as a synchronization mechanism between threads. – Sailfish 8/1, 2020 at 20:20

@Jean Cruz, they don't talk about stores in the documentation but read/write operations (atomic and not). Still I'd like to see a concrete example or even a diagram explaining once you use these relaxed models what sort of orderings can happen that wouldn't break the program anyway. – Cow 8/1, 2020 at 23:58

@CruzJean "means stores cannot be reordered to be after it" which stores? – Present 10/1, 2020 at 4:21

@Present Any stores anywhere. It's the same thing that makes it safe to read/write an int in a multithreaded environment under a mutex lock despite the mutex having nothing to do with the int itself. – Sailfish 10/1, 2020 at 6:29

@CruzJean Why wouldn't these store be reordered? Can you give an example of forbidden reordering? – Present 10/1, 2020 at 7:47

I know that when it comes to memory orderings, people usually try to argue if and how operations can be reorder, but in my opinion this is the wrong approach! The C++ standard does not state how instructions can be reordered, but instead defines the happens-before relation, which itself is based on the sequenced-before, synchronize-with and inter-thread-happens-before relations.

An acquire-load that reads the value from a store-release sychronizes-with that acquire load, therefore establishing a happens-before relation. Due to the transitivity of the happens-before relation, operations that are "sequenced-before" the store-release, also "happen-before" the acquire-load. Any arguments about the correctness of an implementation using atomics should always rely on the happens-before relation. If and how instructions can be reordered is merely a result of applying the rules for the happens-before relation.

For a more detailed explanation of the C++ memory model you can take a look at Memory Models for C/C++ Programmers.

Hobo answered 3/2, 2020 at 9:57 Comment(0)

Without atomic:

std::string* ptr;
int data;

void producer()
{
    std::string* p  = new std::string("Hello");
    data = 42;
    ptr = p;
}

void consumer()
{
    std::string* p2;
    while (!(p2 = ptr))
        ;
    assert(*p2 == "Hello"); // never fires
    assert(data == 42); // never fires
}

In producer, the compiler is free to move the assignment to data after the assignment to ptr. Because ptr becomes non-null before data is set, that can trigger the corresponding assert.

The release-store forbids the compiler from doing that.

In consumer, the compiler is free to move the assert on data to before the loop.

The load-acquire forbids the compiler from doing that.

Not related to ordering, but the compiler is free to omit the loop entirely, because if ptr is null when the loop starts, nothing can validly make it appear not null, leading to an infinite loop, which also may be assumed not to occur.

Supererogate answered 9/1, 2020 at 2:43 Comment(8)

I still feel like my question isn't clear, my question is more like it follows. Suppose (in the producer for example) you add few more statements after the atomic store, for example data_2 = 175 and maybe a data_3 = 10, where data_2 and data_3 are globals. How exactly is the re-ordering affected now? I understand you probably covered this in your answer, so I do apologize if I'm being annoying. – Cow 9/1, 2020 at 9:22

data_2 and data_3 assignments after the store have no restrictions on them. They can be reordered to before the store. – Supererogate 9/1, 2020 at 15:21

This is exactly the bit I don't understand, with "re-ordered before the store " do they mean "move/permute the instructions above the store statement?" – Cow 9/1, 2020 at 16:43

Yes. The compiler is free to transform the program as if the programmer wrote the data_2 assignment before the store. – Supererogate 9/1, 2020 at 17:47

@JeffGarrett Important note: even if the instructions generated by the compiler are in some given order, the processor itself might reorder them in absence of a memory order barrier – Sailfish 10/1, 2020 at 16:20

@curiousguy, What for? – Cow 12/1, 2020 at 13:1

@Cow To make clear what you're asking on data_2 and data_3 – Present 12/1, 2020 at 13:16

As you say, without atomic at all, the compiler is free to hoist the load of ptr out of the consumer loop (because it can assume the value isn't changed by other threads; that would be data race UB). So it can optimize it to if(!ptr) inf_loop;. You answer would be more useful if you used std::atomic<> with mo_relaxed. – Fantastic 12/1, 2020 at 16:36

I think also the example given by the documentation isn't particularly useful as well because after the store operation in the thread running producer there's no other instruction, hence nothing would be re-ordered anyway.

If there were, they could be executed in advance anyway. How would that hurt?

The only thing a producer must guarantee is that the "production" in memory is completely written before the flag is set; otherwise there would be nothing a consumer could do to avoid reading uninitialized memory (or an old value of an object).

Setting up the published object too late would be catastrophic. But how is starting setting up another published object (say the second one) "too early" a problem?

How would you even know what a producer does too early? The only thing you are allowed to do is check the flag and only once the flag is set you can observe the published object.

So if anything is reordered before the modification of the flag, you shouldn't be able to see it.

But there is nothing to see in the assembly output of GCC on x86-64:

producer():
        sub     rsp, 8
        mov     edi, 32
        call    operator new(unsigned long)
        mov     DWORD PTR data[rip], 42
        lea     rdx, [rax+16]
        mov     DWORD PTR [rax+16], 1819043144
        mov     QWORD PTR [rax], rdx
        mov     BYTE PTR [rax+20], 111
        mov     QWORD PTR [rax+8], 5
        mov     BYTE PTR [rax+21], 0
        mov     QWORD PTR ptr[abi:cxx11][rip], rax
        add     rsp, 8
        ret

(If you were wondering, ptr[abi:cxx11] is a decorated name not some funky asm syntax, so ptr[abi:cxx11][rip] means ptr[rip].)

which can be summarized to:

setup stack frame
assign data
setup string object
assign ptr
remove frame and return

So really nothing notable, except ptr is assigned last.

You would have to select another target to see something more interesting.

Present answered 12/1, 2020 at 19:2 Comment(2)

Can you provide the line you used to get that assembly? Is it just the -S option? – Cow 13/1, 2020 at 7:26

@user8469759: it's copy-pasted from the Godbolt compiler explorer linked right above the code. (Godbolt runs some filtering and demangling on the gcc -S output; it includes the exact GCC command line if you click on the checkmark for compiler-options.) – Fantastic 14/1, 2020 at 13:32

It may be useful to answer your comment:

I still feel like my question isn't clear, my question is more like it follows. Suppose (in the producer for example) you add few more statements after the atomic store, for example data_2 = 175 and maybe a data_3 = 10, where data_2 and data_3 are globals. How exactly is the re-ordering affected now? I understand you probably covered this in your answer, so I do apologize if I'm being annoying

Let's fiddle with your producer()

void producer()
{
    data = 41;
    std::string* p  = new std::string("Hello");
    data = 42;
    ptr.store(p, std::memory_order_release);
}

Can consumer() find the value 41 in 'data'. No. The value of 42 has been (logically) stored to data at the point of the release fence and if consumer() found the value 42 the store of 42 would (at least appear) to have taken place after the release fence.

OK, now lets tinker further...

void producer()
{
    data = 0xFF01;
    std::string* p  = new std::string("Hello");
    data = 0xFF02;
    ptr.store(p, std::memory_order_release);
    data = 0x0003
}

Now all bets are off. data isn't atomic and there's no guarantee what consumer might find. On most architectures the reality is that the only candidates are 0xFF02 or 0x0003 but there are certainly architectures where it might find 0xFF03 and/or 0x0002. That might happen on an architecture with an 8-bit bus were a 16-bit int is written as 2 single byte operations (from either 'end').

But in principle there's now simply no guarantee what will be stored in the face of such a data race. It's a data race because there is no control to ensure whether consumeris ordered with that additional write.

Prefabricate answered 14/1, 2020 at 13:13 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags