Out-of-order execution and reordering: can I see what after barrier before the barrier?
Asked Answered
M

3

6

According to wikipedia: A memory barrier, also known as a membar, memory fence or fence instruction, is a type of barrier instruction that causes a central processing unit (CPU) or compiler to enforce an ordering constraint on memory operations issued before and after the barrier instruction. This typically means that operations issued prior to the barrier are guaranteed to be performed before operations issued after the barrier.

Usually, articles talking about something like (I will use monitors instead of membars):

class ReadWriteExample {                                          
    int A = 0;    
    int Another = 0;                                                

    //thread1 runs this method                                    
    void writer () {                                              
      lock monitor1;   //a new value will be stored            
      A = 10;          //stores 10 to memory location A        
      unlock monitor1; //a new value is ready for reader to read
      Another = 20; //@see my question  
    }                                                             

    //thread2 runs this method                                    
    void reader () {                                              
      lock monitor1;  //a new value will be read               
      assert A == 10; //loads from memory location A
      print Another //@see my question           
      unlock monitor1;//a new value was just read              
   }                                                              
}       

But I wonder is it possible that compiler or cpu will shuffle the things around in a such way that code will print 20? I don't need guarantee.

I.e. by definition operations issued prior to barrier can't be pushed down by compiler, but is it possible that operations issued after barrier would be occasionally seen before barrier? (just a probability)

Thanks

Monosyllable answered 8/4, 2015 at 16:7 Comment(3)
I'm pretty far from an expert on this topic, but here are my thoughts: Seems like it would depend on the compiler and the OS. In C# for example, there is a volatile identifier to use for variables that may be hit from multiple threads. But ultimately, it's up to the OS to pass the thread handling to the CPU, and it's order may depend on a lot of things.Aggiornamento
The short answer is yes, depending on the language semantics of the monitor, the OS, and the CPU. Even if nothing was re-ordered, there'd still be a race in your example where 20 could be written and the effects fully propagated before reader is even called.Dentoid
On the CPU side, it depends on what your compiler and language semantics will issue, and on the architecture used. x86 for e.g. provides fencing barriers that would prevent reordering of memory operations in any direction (like an mfence), and also stronger serializing instructions that also block out-of-order reordering across (like a lock prefixed operation). IA64 had uni-directional fences (acquire/release)Callable
C
2

My answer below only addresses Java's memory model. The answer really can't be made for all languages as each may define the rules differently.

But I wonder is it possible that compiler or cpu will shuffle the things around in a such way that code will print 20? I don't need guarantee.

Your answer seems to be "Is it possible for the store of A = 20, be re-ordered above the unlock monitor?"

The answer is yes, it can be. If you look at the JSR 166 Cookbook, the first grid shown explains how re-orderings work.

In your writer case the first operation would be MonitorExit the second operation would be NormalStore. The grid explains, yes this sequence is permitted to be re-ordered.

This is known as Roach Motel ordering, that is, memory accesses can be moved into a synchronized block but cannot be moved out


What about another language? Well, this question is too broad to answer all questions as each may define the rules differently. If this is the case you would need to refine your question.

Cuspidate answered 8/4, 2015 at 16:30 Comment(0)
E
1

In Java there is the concept of happens-before. You can read all the details about it on in the Java Specification. A Java compiler or runtime engine can re-order code but it must abide by the happens-before rules. These rules are important for a Java developer that wants to have detailed control on how their code is re-ordered. I myself have been burnt by re-ordering code, turns out I was referencing the same object via two different variables and the runtime engine re-ordered my code not realizing that the operations were on the same object. If I had either a happens-before (between the two operations) or used the same variable, then the re-ordering would not have occurred.

Specifically:

It follows from the above definitions that:

An unlock on a monitor happens-before every subsequent lock on that monitor.

A write to a volatile field (§8.3.1.4) happens-before every subsequent read of that field.

A call to start() on a thread happens-before any actions in the started thread.

All actions in a thread happen-before any other thread successfully returns from a join() on that thread.

The default initialization of any object happens-before any other actions (other than default-writes) of a program.

Elery answered 8/4, 2015 at 16:58 Comment(1)
Thanks for your input, I am quite familiar with spec..but it lacks an answer on my question. +1 because others may need to read spec first :)Monosyllable
M
1

Short answer - yes. This is very compiler and CPU architecture dependent. You have here the definition of a Race Condition. The scheduling Quantum won't end mid-instruction (can't have two writes to same location). However - the quantum could end between instructions - plus how they are executed out-of-order in the pipeline is architecture dependent (outside of the monitor block).

Now comes the "it depends" complications. The CPU guarantees little (see race condition). You might also look at NUMA (ccNUMA) - it is a method to scale CPU & Memory access by grouping CPUs (Nodes) with local RAM and a group owner - plus a special bus between Nodes.

The monitor doesn't prevent the other thread from running. It only prevents it from entering the code between the monitors. Therefore when the Writer exits the monitor-section it is free to execute the next statement - regardless of the other thread being inside the monitor. Monitors are gates that block access. Also - the quantum could interrupt the second thread after the A== statement - allowing Another to change value. Again - the quantum won't interrupt mid-instruction. Always think of threads executing in perfect parallel.

How do you apply this? I'm a bit out of date (sorry, C#/Java these days) with current Intel processors - and how their Pipelines work (hyperthreading etc). Years ago I worked with a processor called MIPS - and it had (through compiler instruction ordering) the ability to execute instructions that occurred serially AFTER a Branch instruction (Delay Slot). On this CPU/Compiler combination - YES - what you describe could happen. If Intel offers the same - then yes - it could happen. Esp with the NUMA (both Intel & AMD have this, I'm most familiar with AMD implementation).

My point - if threads were running across NUMA nodes - and access was to the common memory location then it could occur. Of course the OS tries hard to schedule operations within the same node.

You might be able to simulate this. I know C++ on MS allows access to NUMA technology (I've played with it). See if you can allocate memory across two nodes (placing A on one, and Another on the other). Schedule the threads to run on specific Nodes.

What happens in this model is that there are two pathways to RAM. I suppose this isn't what you had in mind - probably only a single path/Node model. In which case I go back to the MIPS model I described above.

I assumed a processor that interrupts - there are others that have a Yield model.

Mating answered 8/4, 2015 at 17:13 Comment(3)
Wow, too complex for me, but +1)Monosyllable
Hi Vadim - I wasn't sure if you meant high level languages or low level CPUs. As the others have stated - in the high level compilers many things can happen (strength reduction, instruction ordering). The Compiler outputs the instructions to be executed. However - the CPU can also "change" the order of execution within some limits - and this is Pipelining. Virtual languages like .NET and Java add another level.Mating
Well, it's good to know about things on any level, so, thank you again)Monosyllable

© 2022 - 2024 — McMap. All rights reserved.