Memory ordering issues
Asked Answered
I

4

9

I'm experimenting with C++0x support and there is a problem, that I guess shouldn't be there. Either I don't understand the subject or gcc has a bug.

I have the following code, initially x and y are equal. Thread 1 always increments x first and then increments y. Both are atomic integer values, so there is no problem with the increment at all. Thread 2 is checking whether the x is less than y and displays an error message if so.

This code fails sometimes, but why? The issue here is probably memory reordering, but all atomic operations are sequentially consistent by default and I didn't explicitly relax of those any operations. I'm compiling this code on x86, which as far as I know shouldn't have any ordering issues. Can you please explain what the problem is?

#include <iostream>
#include <atomic>
#include <thread>

std::atomic_int x;
std::atomic_int y;

void f1()
{
    while (true)
    {
        ++x;
        ++y;
    }
}

void f2()
{
    while (true)
    {
        if (x < y)
        {
            std::cout << "error" << std::endl;
        }
    }
}

int main()
{
    x = 0;
    y = 0;

    std::thread t1(f1);
    std::thread t2(f2);

    t1.join();
    t2.join();
}

The result can be viewed here.

Incontestable answered 25/10, 2010 at 5:48 Comment(8)
Actually this is experimental implementation of C++0x, so the second one is possible, but I believe the first one is more likely :PIncontestable
The code posted above will always produce "error", (x will always be greater than or equal to y) is this what you wanted?Chemise
Why if (x < y) should be always true, if x will always be greater than or equal to y?Incontestable
What is the default memory ordering for the increment? Does a store guarantee a release? [My experience with the C++0x memory model is limited]Kielty
@Paul: I immediately agreed with you, but you and I were thinking "if it's not what he tests, print "error". But the code is if it is, print it.Thread 2 can see pairs of the form (n, n) and (n + 1, n). In both cases, x < y is false. There is one sequence that triggers it, though. @James: Yeah.Adorable
"What is the default memory ordering for the increment?" - sequentially consistent is default.Incontestable
@confucius: I figured; a relaxed ordering by default would be terrible.Kielty
@GMan: You're right of course. I think I've spent far too long looking at code for one day.Chemise
C
11

The problem could be in your test:

if (x < y)

the thread could evaluate x and not get around to evaluating y until much later.

Corrupt answered 25/10, 2010 at 6:3 Comment(4)
I guess that is the problem, the question is why? Actually sequentially consistent atomic operations must prevent this, mustn't they?Incontestable
Thanks a lot! The problem is unspecified sequence of expression evaluation, so simple :)Incontestable
@confucius: while your scenario might have a dependency on the order that the variables might be read, the more general issue is that reading 2 different atomic instances isn't atomic.Corrupt
Of course, I think in the current situation misleading line in the code was (x < y) condition, which hides implicit load operations. (x.load() < y.load()) instead would make it easier to see.Incontestable
K
13

There is a problem with the comparison:

x < y

The order of evaluation of subexpressions (in this case, of x and y) is unspecified, so y may be evaluated before x or x may be evaluated before y.

If x is read first, you have a problem:

x = 0; y = 0;
t2 reads x (value = 0);
t1 increments x; x = 1;
t1 increments y; y = 1;
t2 reads y (value = 1);
t2 compares x < y as 0 < 1; test succeeds!

If you explicitly ensure that y is read first, you can avoid the problem:

int yval = y;
int xval = x;
if (xval < yval) { /* ... */ }
Kielty answered 25/10, 2010 at 6:20 Comment(1)
Thanks! That's it. Sorry guys, cannot plus, https is blocked in the office and I can't login :(Incontestable
C
11

The problem could be in your test:

if (x < y)

the thread could evaluate x and not get around to evaluating y until much later.

Corrupt answered 25/10, 2010 at 6:3 Comment(4)
I guess that is the problem, the question is why? Actually sequentially consistent atomic operations must prevent this, mustn't they?Incontestable
Thanks a lot! The problem is unspecified sequence of expression evaluation, so simple :)Incontestable
@confucius: while your scenario might have a dependency on the order that the variables might be read, the more general issue is that reading 2 different atomic instances isn't atomic.Corrupt
Of course, I think in the current situation misleading line in the code was (x < y) condition, which hides implicit load operations. (x.load() < y.load()) instead would make it easier to see.Incontestable
L
4

Every now and then, x will wrap around to 0 just before y wraps around to zero. At this point y will legitimately be greater than x.

Lipscomb answered 25/10, 2010 at 8:16 Comment(2)
Took a shot at editing it. It might as well be noted that signed overflow leads to undefined behavior, though unsigned overflow wraps as expected.Adorable
Agree. Overflow was not supposed to happen, it was done just for test, but this could be a problem too. Thanks.Incontestable
A
-3

First, I agree with "Michael Burr" and "James McNellis". Your test is not fair, and there's a legitime possibility to fail. However even if you rewrite the test the way "James McNellis" suggests the test may fail.

First reason for this is that you don't use volatile semantics, hence the compiler may do optimizations to your code (that are supposed to be ok in a single-threaded case).

But even with volatile your code is not guaranteed to work.

I think you don't fully understand the concept of memory reordering. Actually memory read/write reorder can occur at two levels:

  1. Compiler may exchange the order of the generated read/write instructions.
  2. CPU may execute memory read/write instructions in arbitrary order.

Using volatile prevents the (1). However you've done nothing to prevent (2) - memory access reordering by the hardware.

To prevent this you should put special memory fence instructions in the code (that are designated for CPU, unlike volatile which is for compiler only).

In x86/x64 there're many different memory fence instructions. Also every instruction with lock semantics by default issues full memory fence.

More information here:

http://en.wikipedia.org/wiki/Memory_barrier

Abagael answered 25/10, 2010 at 7:2 Comment(1)
valdo - no need to use volatile here, because memory barriers by default generated by C++0x atomic operations prevent both (1) and (2).Incontestable

© 2022 - 2024 — McMap. All rights reserved.