Why can _mm_pause() significantly improve performance? [duplicate]
Asked Answered
K

0

8

According to Intel's manual (on page 112):

void _mm_pause(void)

The execution of the next instruction is delayed an implementation specific amount of time. The instruction does not modify the architectural state. This intrinsic provides especially significant performance gain.

That is to say:

while (!acquire_spin_lock()) _mm_pause(); // code snippet 1

is faster and has lower power consumption than

while (!acquire_spin_lock()) continue; // code snippet 2

I can understand why code snippet 1 has lower power consumption than code snippet 2.

What I cannot understand is:

Why is code snippet 1 faster than code snippet 2?

Kenley answered 17/10, 2019 at 3:2 Comment(7)
This looks interesting : gitlab.haskell.org/ghc/ghc/issues/8578 "For spinlock implementations, Intel at least has a PAUSE instruction which specifically informs the CPU that this is a spinlock wait-loop, which allows it to optimize cache and memory accesses if possible to avoid memory ordering violations, requiring the CPUs to synchronize. This can be quite dramatic on older processors I believe .... PAUSE might also actually delay the CPU for this optimization to happen, where as rep nop will merely run as fast as possible."Oblige
It avoids a memory-order mis-speculation pipeline nuke (aka full clear of the pipeline, like a branch miss but worse) on the iteration that leaves the loop.Package
Also related: What are the latency and throughput costs of producer-consumer sharing of a memory location between hyper-siblings versus non-hyper siblings?Package
See also software.intel.com/en-us/articles/… with benchmarks on a Nehalem Xeon. Also aloiskraus.wordpress.com/2018/06/16/… makes some inflammatory claims, but also quotes Intel's optimization manual re: the benefit of increasing PAUSE delay on SKL from ~10 to ~100 cycles as speeding up threaded workloads on HT systems by stealing fewer cycles from the other hyperthread. (And causing less contention from re-reading often while spin-waiting.)Package
@racraman, It makes no sense to draw a distinction between PAUSE and REP NOP because they are exactly the same instruction. If you read a bit more of the thread you linked, you’ll see that the writer of that quotation retracted it.Utter
@prl: I think (hope) they mean a CPU that doesn't recognize rep nop as special and just decodes it as a 2-byte NOP. But the sentence it's part of doesn't make sense with that interpretation either.Package
@Utter No, they aren't. One is an instruction, the other is two instructions. They have the same representation and the interpretation depends on whether or not the CPU supports PAUSE.Madalynmadam

© 2022 - 2024 — McMap. All rights reserved.