Race condition on x86

Asked 8/7, 2011 at 11:12 Answered 9/7, 2011 at 18:57

Solved c++c x86 race-condition memory-barriers

Could someone explain this statement:

shared variables
x = 0, y = 0

Core 1       Core 2
x = 1;       y = 1;
r1 = y;      r2 = x;

How is it possible to have r1 == 0 and r2 == 0 on x86 processors?

Source "The Language of Concurrency" by Bartosz Milewski.

Levitan answered 8/7, 2011 at 11:12 Comment(2)

preshing.com/20120515/memory-reordering-caught-in-the-act uses an assembly version of this example with/without mfence to demo memory reordering, with explanation of why it's allowed to happen. In C/C++ it's just plain UB if they're not atomic; if they are then reordering can only happen with memory_order_release or weaker. – Fixing 26/3, 2020 at 12:3

Unfortunately(?) the memory-order tag is a synonym for memory-barriers, so this question about memory ordering can only be tagged with a tag that would prevent reordering. (At least for a strong enough memory barrier...) – Fixing 26/3, 2020 at 12:9

The problem can arise due to optimizations involving reordering of instructions. In other words, both processors can assign r1 and r2 before assigning variables x and y, if they find that this would yield better performance. This can be solved by adding a memory barrier, which would enforce the ordering constraint.

To quote the slideshow you mentioned in your post:

Modern multicores/languages break sequential consistency.

Regarding the x86 architecture, the best resource to read is Intel® 64 and IA-32 Architectures Software Developer’s Manual (Chapter 8.2 Memory Ordering). Sections 8.2.1 and 8.2.2 describe the memory-ordering implemented by Intel486, Pentium, Intel Core 2 Duo, Intel Atom, Intel Core Duo, Pentium 4, Intel Xeon, and P6 family processors: a memory model called processor ordering, as opposed to program ordering (strong ordering) of the older Intel386 architecture (where read and write instructions were always issued in the order they appeared in the instruction stream).

The manual describes many ordering guarantees of the processor ordering memory model (such as Loads are not reordered with other loads, Stores are not reordered with other stores, Stores are not reordered with older loads etc.), but it also describes the allowed reordering rule which causes the race condition in the OP's post:

8.2.3.4 Loads May Be Reordered with Earlier Stores to Different Locations

On the other hand, if the original order of the instructions was switched:

shared variables
x = 0, y = 0

Core 1       Core 2
r1 = y;      r2 = x;
x = 1;       y = 1;

In this case, processor guarantees that r1 = 1 and r2 = 1 situation is not allowed (due to 8.2.3.3 Stores Are Not Reordered With Earlier Load guarantee), meaning that those instructions would never be reordered in individual cores.

To compare this with different architectures, check out this article: Memory Ordering in Modern Microprocessors. You can see that Itanium (IA-64) does even more reordering than the IA-32 architecture:

Larrikin answered 8/7, 2011 at 11:18 Comment(3)

That is possible. So the first operation is to change variable with constant. And second is to get value of variable and set to r[x]. So probably it will try to get variable value in the same time of performing first operation. – Levitan 8/7, 2011 at 11:24

The compiler can also be the culprit, c.f. 'How to Miscompile Programs with "Benign" Data Races' - usenix.org/events/hotpar11/tech/final_files/Boehm.pdf. – Misuse 8/7, 2011 at 12:27

@Brian: While interesting, a lot of the examples in that paper are false (not valid transformations the compiler can make) for C/POSIX due to signal handling semantics, especially if the elided code contains calls to sigprocmask. – Bespeak 8/7, 2011 at 13:13

On processors with a weaker memory consistency model (such as SPARC, PowerPC, Itanium, ARM, etc.), the above condition can take place because of a lack of enforced cache-coherency on writes without an explicit memory barrier instruction. So basically Core1 sees the write on x before y, while Core2 sees the write on y before x. A full fence instruction wouldn't be required in this case ... basically you would only need to enforce write or release semantics with this scenario so that all writes are committed and visible to all processors before reads take place on those variables that have been written to. Processor architectures with strong memory consistency models like x86 typically make this unnecessary, but as Groo points out, the compiler itself could re-order the operations. You can use the volatile keyword in C and C++ to prevent the re-ordering of operations by the compiler within a given thread. That is not to say that volatile will create thread-safe code that manages the visibility of reads and writes between threads ... a memory barrier would be required that. So while the use of volatile can still create unsafe threaded code, within a given thread it will enforce sequential consistency at the complied machine-code level.

Eyla answered 8/7, 2011 at 13:4 Comment(3)

the CPU could reorder instructions too (while keeping cache coherency with respect to the new ordering). – Bohlin 8/7, 2011 at 13:18

Volatile won't prevent reorders, except on older compilers that used "volatile" as a keyword to completely disable the optimizer. You need a memory fence instruction. – Denisse 8/7, 2011 at 13:41

If volatile didn't prevent compiler re-ordering within a given thread, then it would be absolutely pointless for use in memory-mapped I/O ... the compiler would still reorder the reads and writes, and you would get all sorts of undefined hardware behavior. The volatile keyword does not prevent CPU reordering of instructions, nor does it enforce the visibility of reads and writes between threads ... that requires a memory barrier ... but it does prevent the compiler itself from reordering the operations within a given thread as defined in the source-code. – Eyla 8/7, 2011 at 13:56

This is why some say: Threads Considered Harmful

The problem is that neither thread enforces any ordering between its two statements, because they are not inter-dependent.

The compiler knows that x and y are not aliased, and so it is not required to order the operations.
The CPU knows that x and y are not aliased, so it may reorder them for speed. A good example of when this happens is when the CPU detects an opportunity for write combining. It may merge one write with another if it can do so without violating its coherency model.

The mutual dependency looks odd but it's really no different than any other race condition. Directly writing shared-memory-threaded code is quite difficult, and that's why parallel languages and message-passing parallel frameworks have been developed, in order to isolate the parallel hazards to a small kernel and remove the hazards from the applications themselves.

Lutestring answered 9/7, 2011 at 18:57 Comment(0)

This is why some say: Threads Considered Harmful

Recommended topics

Hot tags