What is the difference in logic and performance between x86-instructions LOCK XCHG
and MOV+MFENCE
for doing a sequential-consistency store.
(We ignore the load result of the XCHG
; compilers other than gcc use it for the store + memory barrier effect.)
Is it true, that for sequential consistency, during the execution of an atomic operation: LOCK XCHG
locks only a single cache-line, and vice versa MOV+MFENCE
locks whole cache-L3(LLC)?
atomic
)/C++11(std::atomic
) for all ordering in x86 except SC(sequential consistency): en.cppreference.com/w/cpp/atomic/memory_order But i said that MFENCE provide sequential consistency for atomic variables as we can see in C11(atomic
)/C++11(std::atomic
) in GCC4.8.2: stackoverflow.com/questions/19047327/… – Hoofermov
maybe atomic for what it does, butxchg
can't be expressed as a singlemov
. – Hartermov
is atomic for unaligned access, by the way.) – HarterMOV+MFENCE
(SC in GCC4.8.2) we can replace onLOCK XCHG
for SC as we can see in video where on 0:28:20 said that MFENCE more expensive that XCHG: channel9.msdn.com/Shows/Going+Deep/… – Hooferlock
is already implicit forxchg [mem], reg
. Hopefully when people say LOCK XCHG, they're just talking about the implied behaviour. I'm not sure if any assemblers will omit thelock
prefix from the machine code if you writelock xchg
, but they could. – Inspanxchg
and just use it to do a store + memory barrier. Turns out it's more efficient to usexchg
on Intel Skylake at least, wheremfence
blocks out-of-order exec of independent non-memory instructions. I'm closing this as a dup for now because I addressed this in an answer on a related question, but maybe this question deserves its own answer. Which is a better write barrier on x86: lock+addl or xchgl? is related. – Inspan