The _mm_pause()
intrinsic is fully documented by Intel and supported by all the major x86 compilers portably across OSes. IDK if MS's docs were lacking in the past, or if you just missed it ~7 years go.
#include <immintrin.h>
and use it. (Or for ancient compilers #include <emmintrin.h>
for SSE2).
#include <immintrin.h>
void test() {
_mm_pause();
_mm_pause();
}
compiles to this asm on all 4 of gcc/clang/ICC/MSVC (on the Godbolt compiler explorer):
test(): # @test()
pause
pause
ret
On CPUs without SSE2, it decodes as rep nop
which is just a nop
. Cross-platform implementation of the x86 pause instruction
Gcc even knows this, and still accepts _mm_pause()
when compiling with -mno-sse
. (Normally gcc and clang reject intriniscs for instructions that aren't enabled, unlike MSVC.) Amusingly, gcc even emits rep nop
in its asm output, while the other three emit pause
. They assemble to same machine code, of course.
Pause idles the front-end of that hyperthread for about 5 cycles on Sandybridge-family until Skylake. On Skylake, Intel increased it to ~100 cycles to save more power in spin-wait loops and increase overall throughput at the possible expense of latency, especially on Hyperthreaded cores.
On all CPUs it also avoids memory-order mis-speculation when leaving a spin-loop. So it does reduce latency right when it finally matters again.
See also What is the purpose of the "PAUSE" instruction in x86?.
__yield
is documented). Some times it a good idea just to comb over intrin.h looking for names similar to what your after(this is how I found_mm_pause
, though that macro of yours seems way better for portability, +1) – Mikvah