Understanding c++11 memory fences
Asked Answered
H

3

47

I'm trying to understand memory fences in c++11, I know there are better ways to do this, atomic variables and so on, but wondered if this usage was correct. I realize that this program doesn't do anything useful, I just wanted to make sure that the usage of the fence functions did what I thought they did.

Basically that the release ensures that any changes made in this thread before the fence are visible to other threads after the fence, and that in the second thread that any changes to the variables are visible in the thread immediately after the fence?

Is my understanding correct? Or have I missed the point entirely?

#include <iostream>
#include <atomic>
#include <thread>

int a;

void func1()
{
    for(int i = 0; i < 1000000; ++i)
    {
        a = i;
        // Ensure that changes to a to this point are visible to other threads
        atomic_thread_fence(std::memory_order_release);
    }
}

void func2()
{
    for(int i = 0; i < 1000000; ++i)
    {
        // Ensure that this thread's view of a is up to date
        atomic_thread_fence(std::memory_order_acquire);
        std::cout << a;
    }
}

int main()
{
    std::thread t1 (func1);
    std::thread t2 (func2);

    t1.join(); t2.join();
}
Horsehide answered 29/11, 2012 at 18:31 Comment(8)
AFAIK you need to use atomics here because of data races.Effy
Yes thank you, I understand that the code I wrote isn't correct for other reasons, but I was struggling a bit to write a simple example to demonstrate my question.Horsehide
I notice that my compiler, visual studio 12 in 64 bit mode doesn't generate the code any differently even if I remove the fences. Is that because they are not needed on that achitecture? Probably this is a new question I guess...Horsehide
x64 is strongly ordered and ints are atomic so I don't think the fences will do anything.Effy
The best explanation I've seen of the C++ memory model is Mathematizing C++ Concurrency by Batty et al. from POPL 2011. The diagrams are particularly helpful.Kasandrakasevich
@Pubby, As long as memory order is enforced correctly, nonatomic operation can also be used here.Beholden
@Beholden In code with fences, how can you enforced memory order w/o using any atomic object?Flacon
@Effy Non atomic objects are not atomic objects.Flacon
W
51

Your usage does not actually ensure the things you mention in your comments. That is, your usage of fences does not ensure that your assignments to a are visible to other threads or that the value you read from a is 'up to date.' This is because, although you seem to have the basic idea of where fences should be used, your code does not actually meet the exact requirements for those fences to "synchronize".

Here's a different example that I think demonstrates correct usage better.

#include <iostream>
#include <atomic>
#include <thread>

std::atomic<bool> flag(false);
int a;

void func1()
{
    a = 100;
    atomic_thread_fence(std::memory_order_release);
    flag.store(true, std::memory_order_relaxed);
}

void func2()
{
    while(!flag.load(std::memory_order_relaxed))
        ;

    atomic_thread_fence(std::memory_order_acquire);
    std::cout << a << '\n'; // guaranteed to print 100
}

int main()
{
    std::thread t1 (func1);
    std::thread t2 (func2);

    t1.join(); t2.join();
}

The load and store on the atomic flag do not synchronize, because they both use the relaxed memory ordering. Without the fences this code would be a data race, because we're performing conflicting operations a non-atomic object in different threads, and without the fences and the synchronization they provide there would be no happens-before relationship between the conflicting operations on a.

However with the fences we do get synchronization because we've guaranteed that thread 2 will read the flag written by thread 1 (because we loop until we see that value), and since the atomic write happened after the release fence and the atomic read happens-before the acquire fence, the fences synchronize. (see § 29.8/2 for the specific requirements.)

This synchronization means anything that happens-before the release fence happens-before anything that happens-after the acquire fence. Therefore the non-atomic write to a happens-before the non-atomic read of a.

Things get trickier when you're writing a variable in a loop, because you might establish a happens-before relation for some particular iteration, but not other iterations, causing a data race.

std::atomic<int> f(0);
int a;

void func1()
{
    for (int i = 0; i<1000000; ++i) {
        a = i;
        atomic_thread_fence(std::memory_order_release);
        f.store(i, std::memory_order_relaxed);
    }
}

void func2()
{
    int prev_value = 0;
    while (prev_value < 1000000) {
        while (true) {
            int new_val = f.load(std::memory_order_relaxed);
            if (prev_val < new_val) {
                prev_val = new_val;
                break;
            }
        }

        atomic_thread_fence(std::memory_order_acquire);
        std::cout << a << '\n';
    }
}

This code still causes the fences to synchronize but does not eliminate data races. For example if f.load() happens to return 10 then we know that a=1,a=2, ... a=10 have all happened-before that particular cout<<a, but we don't know that cout<<a happens-before a=11. Those are conflicting operations on different threads with no happens-before relation; a data race.

Whichever answered 29/11, 2012 at 19:38 Comment(8)
Thank you for this, I think I'm struggling to completely understand this. I feel reasonably confident using the default atomic types but feel there is something about this that I'm not quite feeling I understand. I think I need a book or some better articles, probably that's a different question!Horsehide
The book C++ Concurrency In Action covers the C++ memory model, memory orderings, atomics, and fences very well, in addition to covering the higher level constructs.Whichever
@J99 or if you have specific questions about the examples I can try to answer them.Whichever
In the first example, can we guarantee that the while loop will eventually terminate? Or is there only a guarantee that if the loop terminates then the program will print 100?Jostle
@Jostle Technically I believe that's a 'quality of implementation' issue. Implementations are supposed to ensure that writes eventually do become visible to other threads. In practice implementations do, and that loop is in practice guaranteed to terminate.Whichever
@Whichever would your first example work if I drop all fences, and replace flag.store(true, std::memory_order_relaxed); with flag.store(true, std::memory_order_release); and replace flag.load(std::memory_order_relaxed) with flag.load(std::memory_order_acquire)? As the release-acquire operations will create the synchronization and so all the changes happens before release op will be visible after the acquire op. no?Compeer
nice examples! Can I use this example code in my blog?Gerge
@Compeer Yes, load-acquired and store-release work well in first example as well. Two good examples here: riptutorial.com/cplusplus/example/25796/fence-example and riptutorial.com/cplusplus/example/25795/need-for-memory-modelFirstly
D
8

Your usage is correct, but insufficient to guarantee anything useful.

For example, the compiler is free to internally implement a = i; like this if it wants to:

 while(a != i)
 {
    ++a;
    atomic_thread_fence(std::memory_order_release);
 }

So the other thread may see any values at all.

Of course, the compiler would never implement a simple assignment like that. However, there are cases where similarly perplexing behavior is actually an optimization, so it's a very bad idea to rely on ordinary code being implemented internally in any particular way. This is why we have things like atomic operations and fences only produce guaranteed results when used with such operations.

Downspout answered 29/11, 2012 at 18:43 Comment(5)
Yes thank you, I understand that the code I wrote isn't correct for other reasons, but I was struggling a bit to write a simple example to demonstrate my question.Horsehide
Then it sounds like you get it.Downspout
Are you sure about this? I don't see that the example code meets the requirements in 29.8/1 for the fences to synchronize at all, and that would mean this code has data races and therefore undefined behavior.Whichever
@bames53: Correct. If I understand him correctly, he's just trying to understand the semantics of the fences. He's trying to use them with normal assignments, which of course can't work because they don't have sufficiently precise semantics. (I clarified this in the end of my answer.)Downspout
@David Schwartz yes i was only interested in what the fences did, and tried to come up with an example around them. Clearly it's only added to the confusion though :)Horsehide
S
0
#include <iostream>
#include <atomic>
#include <thread>

std::atomic<bool> flag(false);
int a;

void func1()
{
    for (int i = 0; i<1000000; ++i) {
        while(i != 0 && flag.load(std::memory_order_relaxed))
            ;
        a = i;
        atomic_thread_fence(std::memory_order_release);
        flag.store(true, std::memory_order_relaxed);
    }
}

void func2()
{
    for (int i = 0; i<1000000; ++i) {
        while(!flag.load(std::memory_order_relaxed))
            ;

        atomic_thread_fence(std::memory_order_acquire);
        std::cout  << a << '\n';
        flag.store(false, std::memory_order_relaxed);
    }
}

int main()
{
    std::thread t1 (func1);
    std::thread t2 (func2);

    t1.join(); t2.join();
}

I'm not sure my code implements your intention, please test.

Strontian answered 18/6 at 9:23 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.