Which header files provide the intrinsics for the different x86 SIMD instruction set extensions (MMX, SSE, AVX, ...)? It seems impossible to find such a list online. Correct me if I'm wrong.
These days you should normally just include <immintrin.h>
. It includes everything.
GCC and clang will stop you from using intrinsics for instructions you haven't enabled at compile time (e.g. with -march=native
or -mavx2 -mbmi2 -mpopcnt -mfma -mcx16 -mtune=znver1
or whatever.)
MSVC and ICC will let you use intrinsics without enabling anything at compile time, but you still should enable AVX before using AVX intrinsics.
Historically (before immintrin.h
pulled in everything) you had to manually include a header for the highest level of intrinsics you wanted.
This may still be useful with MSVC and ICC to stop yourself from using instruction-sets you don't want to require.
<mmintrin.h> MMX
<xmmintrin.h> SSE
<emmintrin.h> SSE2
<pmmintrin.h> SSE3
<tmmintrin.h> SSSE3
<smmintrin.h> SSE4.1
<nmmintrin.h> SSE4.2
<ammintrin.h> SSE4A
<wmmintrin.h> AES
<immintrin.h> AVX, AVX2, FMA
Including one of these pulls in all previous ones (except AMD-only SSE4A: immintrin.h
doesn't pull that in)
Some compilers also have <zmmintrin.h>
for AVX512.
#include <x86intrin.h>
which pulls in everything you need. –
Pemberton <zmmintrin.h>
directly; gcc doesn't even provide it. Just use <immintrin.h>
or the even-more-complete <x86intrin.h>
. This answer is basically obsolete, unless you're intentionally avoiding including intrinsics for newer versions of SSE because your compiler doesn't complain when you use an SSE4.1 instruction while compiling for SSE2. (gcc/clang do complain, so you should just use immintrin.h for them. IDK about others.) –
Highup <x86intrin.h>
? –
Diamond <emmintrin.h>
(SSE2), although in v10.3 the intrinsics headers are old and unusable due to making use of retired Clang builtins. –
Waiver /arch:AVX
. Especially if you use only 256-bit intrinsics, not mixing in _mm_add_epi32
sometimes; if you do, check the asm and/or profile to check that you avoid SSE/AVX transition stalls. (There should be a HW event counter for that.) –
Highup On GCC/clang, if you use just
#include <x86intrin.h>
it will include all SSE/AVX headers which are enabled according to compiler switches like -march=haswell
or just -march=native
. Additionally some x86 specific instructions like bswap
or ror
become available as intrinsics.
The MSVC equivalent of this header <intrin.h>
If you just want portable SIMD, use #include <immintrin.h>
MSVC, ICC, and gcc/clang (and other compilers like Sun I think) all support this header for the SIMD intrinsics documented by Intel's only intrinsics finder / search tool: https://software.intel.com/sites/landingpage/IntrinsicsGuide/
<x86intrin.h>
, but <intrin.h>
achieves a similar effect. You still need conditional compilation, of course. :-( –
Anglesey #include <immintrin.h>
. Use that for SIMD intrinsics. You only need the even-larger (and slightly slower to compiler) x86intrin.h
or intrin.h
if you need stuff like integer rotate / bit-scan intrinsics (although Intel documents some of those as being available in immintrin.h
in their intrinsics guide). –
Highup x86intrin.h
/ intrin.h
but not in immintrin.h
. –
Highup The header name depends on your compiler and target architecture.
- For Microsoft C++ (targeting x86, x86-64 or ARM) and Intel C/C++ Compiler for Windows use
intrin.h
- For gcc/clang/icc targeting x86/x86-64 use
x86intrin.h
- For gcc/clang/armcc targeting ARM with NEON use
arm_neon.h
- For gcc/clang/armcc targeting ARM with WMMX use
mmintrin.h
- For gcc/clang/xlcc targeting PowerPC with VMX (aka Altivec) and/or VSX use
altivec.h
- For gcc/clang targeting PowerPC with SPE use
spe.h
You can handle all these cases with conditional preprocessing directives:
#if defined(_MSC_VER)
/* Microsoft C/C++-compatible compiler */
#include <intrin.h>
#elif defined(__GNUC__) && (defined(__x86_64__) || defined(__i386__))
/* GCC-compatible compiler, targeting x86/x86-64 */
#include <x86intrin.h>
#elif defined(__GNUC__) && defined(__ARM_NEON__)
/* GCC-compatible compiler, targeting ARM with NEON */
#include <arm_neon.h>
#elif defined(__GNUC__) && defined(__IWMMXT__)
/* GCC-compatible compiler, targeting ARM with WMMX */
#include <mmintrin.h>
#elif (defined(__GNUC__) || defined(__xlC__)) && (defined(__VEC__) || defined(__ALTIVEC__))
/* XLC or GCC-compatible compiler, targeting PowerPC with VMX/VSX */
#include <altivec.h>
#elif defined(__GNUC__) && defined(__SPE__)
/* GCC-compatible compiler, targeting PowerPC with SPE */
#include <spe.h>
#endif
From this page
+----------------+------------------------------------------------------------------------------------------+
| Header | Purpose |
+----------------+------------------------------------------------------------------------------------------+
| x86intrin.h | Everything, including non-vector x86 instructions like _rdtsc(). |
| mmintrin.h | MMX (Pentium MMX!) |
| mm3dnow.h | 3dnow! (K6-2) (deprecated) |
| xmmintrin.h | SSE + MMX (Pentium 3, Athlon XP) |
| emmintrin.h | SSE2 + SSE + MMX (Pentium 4, Athlon 64) |
| pmmintrin.h | SSE3 + SSE2 + SSE + MMX (Pentium 4 Prescott, Athlon 64 San Diego) |
| tmmintrin.h | SSSE3 + SSE3 + SSE2 + SSE + MMX (Core 2, Bulldozer) |
| popcntintrin.h | POPCNT (Nehalem (Core i7), Phenom) |
| ammintrin.h | SSE4A + SSE3 + SSE2 + SSE + MMX (AMD-only, starting with Phenom) |
| smmintrin.h | SSE4_1 + SSSE3 + SSE3 + SSE2 + SSE + MMX (Penryn, Bulldozer) |
| nmmintrin.h | SSE4_2 + SSE4_1 + SSSE3 + SSE3 + SSE2 + SSE + MMX (Nehalem (aka Core i7), Bulldozer) |
| wmmintrin.h | AES (Core i7 Westmere, Bulldozer) |
| immintrin.h | AVX, AVX2, AVX512, all SSE+MMX (except SSE4A and XOP), popcnt, BMI/BMI2, FMA |
+----------------+------------------------------------------------------------------------------------------+
So in general you can just include immintrin.h
to get all Intel extensions, or x86intrin.h
if you want everything, including _bit_scan_forward
and _rdtsc
, as well as all vector intrinsics include AMD-only ones. If you are against including more that you actually need then you can pick the right include by looking at the table.
x86intrin.h
is the recommended way to get intrinsics for AMD XOP (Bulldozer-only, not even future AMD CPUs), rather than having its own header.
Some compilers will still generate error messages if you use intrinsics for instruction-sets you haven't enabled (e.g. _mm_fmadd_ps
without enabling fma, even if you include immintrin.h
and enable AVX2).
smmintrin
(SSE4.1) is Penryn (45nm Core2), not Nehalem ("i7"). Can we stop using "i7" as an architecture name? It's meaningless now that Intel has kept using it for SnB-family. –
Highup immintrin.h
doesn't appear to include _popcnt32
and _popcnt64
(not to be confused with those in popcntintrin.h
!) intrinsics on GCC 9.1.0. So it appears x86intrin.h
still serves a purpose. –
Charpentier 20200914: latest best practice: <immintrin.h>
(also supported by MSVC)
I'll leave the rest of the answer for historic purposes; it might be useful for older compiler / platform combinations...
As many of the answers and comments have stated, <x86intrin.h>
is the comprehensive header for x86[-64] SIMD intrinsics. It also provides intrinsics supporting instructions for other ISA extensions. gcc
, clang
, and icc
have all settled on this. I needed to do some digging on versions that support the header, and thought it might be useful to list some findings...
gcc : support for
x86intrin.h
first appears ingcc-4.5.0
. Thegcc-4
release series is no longer being maintained, whilegcc-6.x
is the current stable release series.gcc-5
also introduced the__has_include
extension present in allclang-3.x
releases.gcc-7
is in pre-release (regression testing, etc.) and following the current versioning scheme, will be released asgcc-7.1.0
.clang :
x86intrin.h
appears to have been supported for allclang-3.x
releases. The latest stable release isclang (LLVM) 3.9.1
. The development branch isclang (LLVM) 5.0.0
. It's not clear what's happened to the4.x
series.Apple clang : annoyingly, Apple's versioning doesn't correspond with that of the
LLVM
projects. That said, the current release:clang-800.0.42.1
, is based onLLVM 3.9.0
. The firstLLVM 3.0
based version appears to beApple clang 2.1
back inXcode 4.1
.LLVM 3.1
first appears withApple clang 3.1
(a numeric coincidence) inXcode 4.3.3
.
Apple also defines__apple_build_version__
e.g.,8000042
. This seems about the most stable, strictly ascending versioning scheme available. If you don't want to support legacy compilers, make one of these values a minimum requirement.
Any recent version of clang
, including Apple versions, should therefore have no issue with x86intrin.h
. Of course, along with gcc-5
, you can always use the following:
#if defined (__has_include) && (__has_include(<x86intrin.h>))
#include <x86intrin.h>
#else
#error "upgrade your compiler. it's free..."
#endif
One trick you can't really rely on is using the __GNUC__
versions in clang
. The versioning is, for historical reasons, stuck at 4.2.1
. A version that precedes the x86intrin.h
header. It's occasionally useful for, say, simple GNU C extensions that have remained backwards compatible.
icc : as far as I can tell, the
x86intrin.h
header is supported since at least Intel C++ 16.0. The version test can by performed with:#if (__INTEL_COMPILER >= 1600)
. This version (and possibly earlier versions) also provides support for the__has_include
extension.MSVC : It appears that
MSVC++ 12.0 (Visual Studio 2013)
is the first version to provide theintrin.h
header - notx86intrin.h
... this suggests:#if (_MSC_VER >= 1800)
as a version test. Of course, if you're trying to write code that's portable across all these different compilers, the header name on this platform will be the least of your problems.
__has_builtin
instead of annoying version checks. Also note GCC still has some bugs on specific buitins at current; in this case, I'd consider target-specific ones, even undocumented. –
Housecarl © 2022 - 2024 — McMap. All rights reserved.
ammintrin.h
also has the XOP instructions. – Hajj