Here are four approaches to make Sequential Consistency in x86/x86_64:
- LOAD(without fence) and STORE+MFENCE
- LOAD(without fence) and LOCK XCHG
- MFENCE+LOAD and STORE(without fence)
- LOCK XADD(0) and STORE(without fence)
As it is written here: http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html
C/C++11 Operation x86 implementation
- Load Seq_Cst: MOV (from memory)
- Store Seq Cst: (LOCK) XCHG // alternative: MOV (into memory),MFENCE
Note: there is an alternative mapping of C/C++11 to x86, which instead of locking (or fencing) the Seq Cst store locks/fences the Seq Cst load:
- Load Seq_Cst: LOCK XADD(0) // alternative: MFENCE,MOV (from memory)
- Store Seq Cst: MOV (into memory)
GCC 4.8.2(GDB in x86_64) uses first(1) approach for C++11-std::memory_order_seq_cst, i.e. LOAD(without fence) and STORE+MFENCE:
std::atomic<int> a;
int temp = 0;
a.store(temp, std::memory_order_seq_cst);
0x4613e8 <+0x0058> mov 0x38(%rsp),%eax
0x4613ec <+0x005c> mov %eax,0x20(%rsp)
0x4613f0 <+0x0060> mfence
As we know, that MFENCE = LFENCE+SFENCE. Then this code we can rewrite to this: LOAD(without fence) and STORE+LFENCE+SFENCE
Questions:
- Why do we need not to use LFENCE here before LOAD, and need to use LFENCE after STORE (because LFENCE make sense only before LOAD!)?
- Why GCC does not use approach: LOAD(without fence) and STORE+SFENCE for std::memory_order_seq_cst?
movnt
loads/stores, which are weakly ordered as well as bypassing the cache. See stackoverflow.com/questions/32705169/…. – Scheers