We compile our code with g++ -march=ivybridge -mtune=skylake
. In case somebody runs on older/incompatible architecture I want app to inform and exit gracefully. How do I do this? How about AMD processors? Is there some sort of parity of architectures/instructions?
This is surprisingly difficulty, as well as being compiler-specific. I'll cover techniques usable with gcc, then clang, then alternatives. Bear in mind that the information presented is correct as of late 2023; it could change in future.
Determining the target processor
Note: you can skip this if you fully control the build system, since you can just pass e.g. -march=skylake -DMARCH=skylake
, but it's worth reading anyway since some techniques will be reused later.
gcc (in ix86_target_macros_internal
) sets preprocessor macros corresponding to the processor; so if you have a list of supported processors (e.g. extracted from processor_names
) you can test each of these macros (using #ifdef
or, inside a macro, using https://mcmap.net/q/1045110/-test-if-preprocessor-symbol-is-defined-inside-macro). Schematically:
char const* target_processor() {
// ...
#if __skylake__
return "skylake";
#endif
#if __skylake_avx512__
return "skylake-avx512";
#endif
// ...
}
Note that you will probably want to ensure that a compile error occurs if none or more than one of the macros is set, since in that case you would want to update your code.
Determining the runtime processor
gcc provides __builtin_cpu_is
for exact processor detection. Note that the argument must be a string literal, since internally this is converted to checking integer fields on the __cpu_model
global, which is set by cpu_indicator_init
called from __builtin_cpu_init
. So again you can list known processors:
char const* runtime_processor() {
// ...
if (__builtin_cpu_is("skylake"))
return "skylake";
if (__builtin_cpu_is("skylake-avx512"))
return "skylake-avx512";
// ...
return "unknown";
}
Note that you will need to handle the case of unknown processors, since your code could be run on a newer processor than gcc knows about!
Determining feature support
Even if you know the target and runtime processors, that doesn't always mean you know that the processor will be able to run your code; features are added to server and client lines at different times, they can be removed (e.g. 3dNow), they depend on vendor (Intel vs. AMD), and some are dependent on OS support (e.g. AVX via OSXSAVE, detected at runtime via XGETBV: does gcc's __builtin_cpu_supports check for OS support?). So it is better to detect whether the ISA features that -march
enables are present at runtime, using __builtin_cpu_supports
.
You can use the processor_features
enumeration to iterate over known features, use the macros set by ix86_target_macros_internal
to detect which are in use, and the ISA_NAMES_TABLE
table to map between these and feature name strings. Schematically:
void check_features() {
// ...
#if __F16C__
if (!__builtin_cpu_supports("f16c"))
error("f16c not present");
#endif
#if __RDSEED__
if (!__builtin_cpu_supports("rdseed"))
error("rdseed not present");
#endif
// ...
}
By constructing this code, you can be sure that the runtime environment actually does support all the features that you told gcc are available (via the -march=
flag).
clang
Unfortunately, things are a bit trickier with clang. Firstly, it does not set __processor__
macros, so if you want to inform the user of the target processor you will need to pass this in via the build system.
Detecting runtime processor version should work, using X86TargetParser.def; __builtin_cpu_is
works much the same as on gcc.
Secondly, clang is a long way behind in support for the ISA features that gcc knows about - at present, it doesn't expose support via __builtin_cpu_supports
for anything past avx512vp2intersect
(e.g., 3dNow, ADX and later instructions). You could examine the __cpu_model
and __cpu_features2
globals directly, although that won't work if you're using -rtlib=compiler-rt
, since compiler-rt only sets a limited and outdated set of flags.
Hopefully this will be fixed soon, but in the meantime you might need to write the checks yourself consulting cpuid flags; see get_available_features
for how libgcc does this, and getAvailableFeatures
for the corresponding compiler-rt code.
A just-barely viable technique might be to compile separate TUs with different -march=
flags and use that to detect statically which ISA features clang expects to be available for each processor; however that would not work for conditionally-present features or for newer processors that your version of clang does not know about.
Microarchitecture levels
A better alternative may be to set your target architecture to one of the four microarchitecture levels (x86-64-v1 through x86-64-v4) defined by the x86-64 psABI i.e. in your case -march=x86-64-v2
(v3 is Haswell, and v4 is Skylake-AVX512 / Cannon Lake), and use -mtune
and ifunc multiversion support (__attribute__ ((target)) / [[gnu::target]]
) for intermediate and specialized processor support for performance-critical code.
The advantage of this approach is that you can write __builtin_cpu_supports("x86-64-v2")
(since gcc 12) and know that you have at least a processor supporting that microarchitecture level. A downside is that support for this has not yet been released in clang, but it will be in clang 18.
Hybrid architectures
There could be concern over how this will work if you run on a processor (e.g. Alder Lake, Raptor Lake) that has performance and efficiency cores built on different architectures, since their silicon supports different instruction sets. If hardware exposed different feature sets, it could be a problem if your process might start on a P-core and then be rescheduled (or have some threads scheduled) on an E-core, which is why CPUs don't do that. Current software isn't ready for that (and there aren't plans for that to change).
Alder Lake disabled AVX-512 on the P cores, and added AVX2+BMI2 support to the E cores, bringing all cores to x86-64-v3 plus various other extensions other than AVX-512. Do efficiency cores support the same instructions as performance cores? (yes). Appropriate tuning choices can still differ between cores, since performance characteristics are different. (See Agner Fog's blog.)
gcc -mtune=alderlake
is (I think) for the P cores. -mtune=gracemont
is for the E cores. (Or for CPUs that only have E cores.) As -march
settings, they enable the same set of extensions.
Early Alder Lake systems allowed AVX-512 to be enabled if E cores were disabled in the BIOS, or not present at all on that model, but unfortunately Intel changed their mind on that, with newer microcode not allowing AVX-512 (which some BIOS vendors worked around), and newer steppings of the CPU physically fusing off AVX-512 so even an old microcode version couldn't enable it.
Agner Fog's answer on How to detect P/E-Core in Intel Alder Lake CPU? says both P and E cores report the same family/model numbers via cpuid
(except on old Alder Lake with AVX-512 enabled). But that some other cpuid
leaves have different data, and according to Intel documentation for hybrid CPUs, there's a leaf for detecting core type.
(Unless even this difference is disabled as compatibility with bad DRM that detects the P and E cores as different systems trying to play a game on the same key. There is a legacy game compatibility feature in some BIOSes; it might work by interfering with detection mechanisms.)
-march=gracemont
vs. -march=alderlake
, though. –
Drais vpermb
and vpternlogd
to be used on mainstream systems, which often were already only using 256-bit. –
Drais -march=alderlake
vs. -march=gracemont
do exist as separate options. (There are E-core-only CPUs like i3-N305, which Intel calls "Alder Lake-N" - Why performance for this index-of-max function over many arrays of 256 bytes is so slow on Intel i3-N305 compared to AMD Ryzen 7 3800X? is an example of something it's bad at, and where GCC and clang -mtune=gracemont
doesn't help.) –
Drais © 2022 - 2024 — McMap. All rights reserved.
/proc/cpuinfo
might have some useful info – Dionysian__builtin_cpu_supports
inmain
in a source file compiled without any-march
(i.e. baseline). As @Devoirs points out, you can't safely run any code compiled with-march=xyz
on a machine that might be older than xyz. See does gcc's __builtin_cpu_supports check for OS support? . But in theory you'd have to check every single feature, not just SIMD ones like AVX. – Drais__builtin_cpu_is("ivybridge")
is only true on IvB, not IvB-and-later (false on Skylake for example). gcc.gnu.org/onlinedocs/gcc/x86-Built-in-Functions.html. Enumerating all known CPU models now would create a binary that refuses to work on next year's new CPU, so that's terrible, ruling out that idea. – Drais