According to Intel's manual (on page 112):
void _mm_pause(void)
The execution of the next instruction is delayed an implementation specific amount of time. The instruction does not modify the architectural state. This intrinsic provides especially significant performance gain.
That is to say:
while (!acquire_spin_lock()) _mm_pause(); // code snippet 1
is faster and has lower power consumption than
while (!acquire_spin_lock()) continue; // code snippet 2
I can understand why code snippet 1 has lower power consumption than code snippet 2.
What I cannot understand is:
Why is code snippet 1 faster than code snippet 2?
rep nop
as special and just decodes it as a 2-byte NOP. But the sentence it's part of doesn't make sense with that interpretation either. – Package