Header files for x86 SIMD intrinsics
Asked Answered
P

5

179

Which header files provide the intrinsics for the different x86 SIMD instruction set extensions (MMX, SSE, AVX, ...)? It seems impossible to find such a list online. Correct me if I'm wrong.

Piave answered 27/6, 2012 at 14:44 Comment(0)
P
233

These days you should normally just include <immintrin.h>. It includes everything.

GCC and clang will stop you from using intrinsics for instructions you haven't enabled at compile time (e.g. with -march=native or -mavx2 -mbmi2 -mpopcnt -mfma -mcx16 -mtune=znver1 or whatever.)

MSVC and ICC will let you use intrinsics without enabling anything at compile time, but you still should enable AVX before using AVX intrinsics.


Historically (before immintrin.h pulled in everything) you had to manually include a header for the highest level of intrinsics you wanted.

This may still be useful with MSVC and ICC to stop yourself from using instruction-sets you don't want to require.

<mmintrin.h>  MMX
<xmmintrin.h> SSE
<emmintrin.h> SSE2
<pmmintrin.h> SSE3
<tmmintrin.h> SSSE3
<smmintrin.h> SSE4.1
<nmmintrin.h> SSE4.2
<ammintrin.h> SSE4A
<wmmintrin.h> AES
<immintrin.h> AVX, AVX2, FMA

Including one of these pulls in all previous ones (except AMD-only SSE4A: immintrin.h doesn't pull that in)

Some compilers also have <zmmintrin.h> for AVX512.

Piave answered 27/6, 2012 at 14:45 Comment(12)
I think ammintrin.h also has the XOP instructions.Hajj
Or you can just #include <x86intrin.h> which pulls in everything you need.Pemberton
zmmintrin.h has the AVX-512 intrinsics.Intrust
Why are p, t, s and n for SSE3/SSSE3/SSE4.1 and 4.2? What do those characters represent?Checker
@LưuVĩnhPhúc I don't have the slightest clue, sorry.Piave
@LưuVĩnhPhúc SSE3 = Prescott new instructions, SSSE3 = Tejas new instructions. I think SSE4.2 and AES refer to the processor family they were introduced on (Nehalem and Westmere)Hotchpot
Don't include <zmmintrin.h> directly; gcc doesn't even provide it. Just use <immintrin.h> or the even-more-complete <x86intrin.h>. This answer is basically obsolete, unless you're intentionally avoiding including intrinsics for newer versions of SSE because your compiler doesn't complain when you use an SSE4.1 instruction while compiling for SSE2. (gcc/clang do complain, so you should just use immintrin.h for them. IDK about others.)Highup
Does MSVC have something equivalent of <x86intrin.h>?Diamond
C++Builder v10.x has only up to <emmintrin.h> (SSE2), although in v10.3 the intrinsics headers are old and unusable due to making use of retired Clang builtins.Waiver
Wow, prefixes of m, x, e, p, t, s, n, a, w, i. Are these random, or is there a method...Deploy
"MSVC and ICC will let you use intrinsics without enabling anything at compile time, but you still should enable AVX before using AVX intrinsics" - I want to try to support pre-haswell CPUs, so detect AVX2 at runtime, and pick the implementation based on that check. What you said isn't compatible with that, I think. Am I thinking about this wrong, or is the "should" here making assumptions that don't match my use case (e.g. assuming that my code will only running the code on AVX2+ processors)?Ina
@MerlynMorgan-Graham: Indeed, code that needs to run on non-AVX CPUs should not be built with AVX enabled. Modern MSVC might make ok asm with AVX intrinsics in some functions in a file built without /arch:AVX. Especially if you use only 256-bit intrinsics, not mixing in _mm_add_epi32 sometimes; if you do, check the asm and/or profile to check that you avoid SSE/AVX transition stalls. (There should be a HW event counter for that.)Highup
B
94

On GCC/clang, if you use just

#include <x86intrin.h>

it will include all SSE/AVX headers which are enabled according to compiler switches like -march=haswell or just -march=native. Additionally some x86 specific instructions like bswap or ror become available as intrinsics.


The MSVC equivalent of this header <intrin.h>


If you just want portable SIMD, use #include <immintrin.h>

MSVC, ICC, and gcc/clang (and other compilers like Sun I think) all support this header for the SIMD intrinsics documented by Intel's only intrinsics finder / search tool: https://software.intel.com/sites/landingpage/IntrinsicsGuide/

Burrell answered 27/6, 2012 at 15:59 Comment(4)
I wasn't sure, if the newer versions might... Anyway as long as gcc, icc and clang have it, it ok to use I think :-)Burrell
MSVC doesn't have <x86intrin.h>, but <intrin.h> achieves a similar effect. You still need conditional compilation, of course. :-(Anglesey
All the major x86 compilers have #include <immintrin.h>. Use that for SIMD intrinsics. You only need the even-larger (and slightly slower to compiler) x86intrin.h or intrin.h if you need stuff like integer rotate / bit-scan intrinsics (although Intel documents some of those as being available in immintrin.h in their intrinsics guide).Highup
IIRC, there are some non-SIMD intrinsics which Intel documents as being in immintrin.h, but which gcc, clang, and/or MSVC only have in x86intrin.h / intrin.h but not in immintrin.h.Highup
K
65

The header name depends on your compiler and target architecture.

  • For Microsoft C++ (targeting x86, x86-64 or ARM) and Intel C/C++ Compiler for Windows use intrin.h
  • For gcc/clang/icc targeting x86/x86-64 use x86intrin.h
  • For gcc/clang/armcc targeting ARM with NEON use arm_neon.h
  • For gcc/clang/armcc targeting ARM with WMMX use mmintrin.h
  • For gcc/clang/xlcc targeting PowerPC with VMX (aka Altivec) and/or VSX use altivec.h
  • For gcc/clang targeting PowerPC with SPE use spe.h

You can handle all these cases with conditional preprocessing directives:

#if defined(_MSC_VER)
     /* Microsoft C/C++-compatible compiler */
     #include <intrin.h>
#elif defined(__GNUC__) && (defined(__x86_64__) || defined(__i386__))
     /* GCC-compatible compiler, targeting x86/x86-64 */
     #include <x86intrin.h>
#elif defined(__GNUC__) && defined(__ARM_NEON__)
     /* GCC-compatible compiler, targeting ARM with NEON */
     #include <arm_neon.h>
#elif defined(__GNUC__) && defined(__IWMMXT__)
     /* GCC-compatible compiler, targeting ARM with WMMX */
     #include <mmintrin.h>
#elif (defined(__GNUC__) || defined(__xlC__)) && (defined(__VEC__) || defined(__ALTIVEC__))
     /* XLC or GCC-compatible compiler, targeting PowerPC with VMX/VSX */
     #include <altivec.h>
#elif defined(__GNUC__) && defined(__SPE__)
     /* GCC-compatible compiler, targeting PowerPC with SPE */
     #include <spe.h>
#endif
Kovar answered 10/3, 2014 at 3:22 Comment(1)
Here's some more to add to your list: On UltraSPARC+VIS with gcc, use visintrin.h; if you have Sun's VSDK, vis.h offers a different set of intrinsics. Documention can be found here: GCC VIS builtins, Sun VIS user's guide.Intrust
P
54

From this page

+----------------+------------------------------------------------------------------------------------------+
|     Header     |                                         Purpose                                          |
+----------------+------------------------------------------------------------------------------------------+
| x86intrin.h    | Everything, including non-vector x86 instructions like _rdtsc().                         |
| mmintrin.h     | MMX (Pentium MMX!)                                                                       |
| mm3dnow.h      | 3dnow! (K6-2) (deprecated)                                                               |
| xmmintrin.h    | SSE + MMX (Pentium 3, Athlon XP)                                                         |
| emmintrin.h    | SSE2 + SSE + MMX (Pentium 4, Athlon 64)                                                  |
| pmmintrin.h    | SSE3 + SSE2 + SSE + MMX (Pentium 4 Prescott, Athlon 64 San Diego)                        |
| tmmintrin.h    | SSSE3 + SSE3 + SSE2 + SSE + MMX (Core 2, Bulldozer)                                      |
| popcntintrin.h | POPCNT (Nehalem (Core i7), Phenom)                                                       |
| ammintrin.h    | SSE4A + SSE3 + SSE2 + SSE + MMX (AMD-only, starting with Phenom)                         |
| smmintrin.h    | SSE4_1 + SSSE3 + SSE3 + SSE2 + SSE + MMX (Penryn, Bulldozer)                             |
| nmmintrin.h    | SSE4_2 + SSE4_1 + SSSE3 + SSE3 + SSE2 + SSE + MMX (Nehalem (aka Core i7), Bulldozer)     |
| wmmintrin.h    | AES (Core i7 Westmere, Bulldozer)                                                        |
| immintrin.h    | AVX, AVX2, AVX512, all SSE+MMX (except SSE4A and XOP), popcnt, BMI/BMI2, FMA             |
+----------------+------------------------------------------------------------------------------------------+

So in general you can just include immintrin.h to get all Intel extensions, or x86intrin.h if you want everything, including _bit_scan_forward and _rdtsc, as well as all vector intrinsics include AMD-only ones. If you are against including more that you actually need then you can pick the right include by looking at the table.

x86intrin.h is the recommended way to get intrinsics for AMD XOP (Bulldozer-only, not even future AMD CPUs), rather than having its own header.

Some compilers will still generate error messages if you use intrinsics for instruction-sets you haven't enabled (e.g. _mm_fmadd_ps without enabling fma, even if you include immintrin.h and enable AVX2).

Pyrrha answered 2/7, 2015 at 13:23 Comment(2)
smmintrin (SSE4.1) is Penryn (45nm Core2), not Nehalem ("i7"). Can we stop using "i7" as an architecture name? It's meaningless now that Intel has kept using it for SnB-family.Highup
immintrin.h doesn't appear to include _popcnt32 and _popcnt64 (not to be confused with those in popcntintrin.h!) intrinsics on GCC 9.1.0. So it appears x86intrin.h still serves a purpose.Charpentier
S
24

20200914: latest best practice: <immintrin.h> (also supported by MSVC)

I'll leave the rest of the answer for historic purposes; it might be useful for older compiler / platform combinations...


As many of the answers and comments have stated, <x86intrin.h> is the comprehensive header for x86[-64] SIMD intrinsics. It also provides intrinsics supporting instructions for other ISA extensions. gcc, clang, and icc have all settled on this. I needed to do some digging on versions that support the header, and thought it might be useful to list some findings...

  • gcc : support for x86intrin.h first appears in gcc-4.5.0. The gcc-4 release series is no longer being maintained, while gcc-6.x is the current stable release series. gcc-5 also introduced the __has_include extension present in all clang-3.x releases. gcc-7 is in pre-release (regression testing, etc.) and following the current versioning scheme, will be released as gcc-7.1.0.

  • clang : x86intrin.h appears to have been supported for all clang-3.x releases. The latest stable release is clang (LLVM) 3.9.1. The development branch is clang (LLVM) 5.0.0. It's not clear what's happened to the 4.x series.

  • Apple clang : annoyingly, Apple's versioning doesn't correspond with that of the LLVM projects. That said, the current release: clang-800.0.42.1, is based on LLVM 3.9.0. The first LLVM 3.0 based version appears to be Apple clang 2.1 back in Xcode 4.1. LLVM 3.1 first appears with Apple clang 3.1 (a numeric coincidence) in Xcode 4.3.3.

    Apple also defines __apple_build_version__ e.g., 8000042. This seems about the most stable, strictly ascending versioning scheme available. If you don't want to support legacy compilers, make one of these values a minimum requirement.

Any recent version of clang, including Apple versions, should therefore have no issue with x86intrin.h. Of course, along with gcc-5, you can always use the following:

#if defined (__has_include) && (__has_include(<x86intrin.h>))
#include <x86intrin.h>
#else
#error "upgrade your compiler. it's free..."
#endif

One trick you can't really rely on is using the __GNUC__ versions in clang. The versioning is, for historical reasons, stuck at 4.2.1. A version that precedes the x86intrin.h header. It's occasionally useful for, say, simple GNU C extensions that have remained backwards compatible.

  • icc : as far as I can tell, the x86intrin.h header is supported since at least Intel C++ 16.0. The version test can by performed with: #if (__INTEL_COMPILER >= 1600). This version (and possibly earlier versions) also provides support for the __has_include extension.

  • MSVC : It appears that MSVC++ 12.0 (Visual Studio 2013) is the first version to provide the intrin.h header - not x86intrin.h... this suggests: #if (_MSC_VER >= 1800) as a version test. Of course, if you're trying to write code that's portable across all these different compilers, the header name on this platform will be the least of your problems.

Sideway answered 27/2, 2017 at 19:11 Comment(1)
I'd prefer __has_builtin instead of annoying version checks. Also note GCC still has some bugs on specific buitins at current; in this case, I'd consider target-specific ones, even undocumented.Housecarl

© 2022 - 2024 — McMap. All rights reserved.