How to verify at runtime that architecture matches -march=?
Asked Answered
B

1

8

We compile our code with g++ -march=ivybridge -mtune=skylake. In case somebody runs on older/incompatible architecture I want app to inform and exit gracefully. How do I do this? How about AMD processors? Is there some sort of parity of architectures/instructions?

Burgee answered 18/9, 2020 at 9:18 Comment(6)
Do you mean cpuid?Guadalupe
You would need to have separate runner compiled for any architecture that will either report error or run your specifically compiled program. This seems like safe generic solutionDevoirs
/proc/cpuinfo might have some useful infoDionysian
On x86, you can maybe use __builtin_cpu_supports in main in a source file compiled without any -march (i.e. baseline). As @Devoirs points out, you can't safely run any code compiled with -march=xyz on a machine that might be older than xyz. See does gcc's __builtin_cpu_supports check for OS support? . But in theory you'd have to check every single feature, not just SIMD ones like AVX.Drais
__builtin_cpu_is("ivybridge") is only true on IvB, not IvB-and-later (false on Skylake for example). gcc.gnu.org/onlinedocs/gcc/x86-Built-in-Functions.html. Enumerating all known CPU models now would create a binary that refuses to work on next year's new CPU, so that's terrible, ruling out that idea.Drais
(Also, note that Haswell is the ISA that introduced AVX2 and FMA for Intel, and BMI1 / BMI2. IvyBridge is missing a lot of really useful stuff. So you might want some multiversioned functions, depending on your application, if you can't target an AVX2 + FMA + BMI2 baseline.)Drais
H
6

This is surprisingly difficulty, as well as being compiler-specific. I'll cover techniques usable with gcc, then clang, then alternatives. Bear in mind that the information presented is correct as of late 2023; it could change in future.

Determining the target processor

Note: you can skip this if you fully control the build system, since you can just pass e.g. -march=skylake -DMARCH=skylake, but it's worth reading anyway since some techniques will be reused later.

gcc (in ix86_target_macros_internal) sets preprocessor macros corresponding to the processor; so if you have a list of supported processors (e.g. extracted from processor_names) you can test each of these macros (using #ifdef or, inside a macro, using https://mcmap.net/q/1045110/-test-if-preprocessor-symbol-is-defined-inside-macro). Schematically:

char const* target_processor() {
    // ...
#if __skylake__
    return "skylake";
#endif
#if __skylake_avx512__
    return "skylake-avx512";
#endif
    // ...
}

Note that you will probably want to ensure that a compile error occurs if none or more than one of the macros is set, since in that case you would want to update your code.

Determining the runtime processor

gcc provides __builtin_cpu_is for exact processor detection. Note that the argument must be a string literal, since internally this is converted to checking integer fields on the __cpu_model global, which is set by cpu_indicator_init called from __builtin_cpu_init. So again you can list known processors:

char const* runtime_processor() {
    // ...
    if (__builtin_cpu_is("skylake"))
        return "skylake";
    if (__builtin_cpu_is("skylake-avx512"))
        return "skylake-avx512";
    // ...
    return "unknown";
}

Note that you will need to handle the case of unknown processors, since your code could be run on a newer processor than gcc knows about!

Determining feature support

Even if you know the target and runtime processors, that doesn't always mean you know that the processor will be able to run your code; features are added to server and client lines at different times, they can be removed (e.g. 3dNow), they depend on vendor (Intel vs. AMD), and some are dependent on OS support (e.g. AVX via OSXSAVE, detected at runtime via XGETBV: does gcc's __builtin_cpu_supports check for OS support?). So it is better to detect whether the ISA features that -march enables are present at runtime, using __builtin_cpu_supports.

You can use the processor_features enumeration to iterate over known features, use the macros set by ix86_target_macros_internal to detect which are in use, and the ISA_NAMES_TABLE table to map between these and feature name strings. Schematically:

void check_features() {
    // ...
#if __F16C__
    if (!__builtin_cpu_supports("f16c"))
        error("f16c not present");
#endif
#if __RDSEED__
    if (!__builtin_cpu_supports("rdseed"))
        error("rdseed not present");
#endif
    // ...
}

By constructing this code, you can be sure that the runtime environment actually does support all the features that you told gcc are available (via the -march= flag).

clang

Unfortunately, things are a bit trickier with clang. Firstly, it does not set __processor__ macros, so if you want to inform the user of the target processor you will need to pass this in via the build system.

Detecting runtime processor version should work, using X86TargetParser.def; __builtin_cpu_is works much the same as on gcc.

Secondly, clang is a long way behind in support for the ISA features that gcc knows about - at present, it doesn't expose support via __builtin_cpu_supports for anything past avx512vp2intersect (e.g., 3dNow, ADX and later instructions). You could examine the __cpu_model and __cpu_features2 globals directly, although that won't work if you're using -rtlib=compiler-rt, since compiler-rt only sets a limited and outdated set of flags.

Hopefully this will be fixed soon, but in the meantime you might need to write the checks yourself consulting cpuid flags; see get_available_features for how libgcc does this, and getAvailableFeatures for the corresponding compiler-rt code.

A just-barely viable technique might be to compile separate TUs with different -march= flags and use that to detect statically which ISA features clang expects to be available for each processor; however that would not work for conditionally-present features or for newer processors that your version of clang does not know about.

Microarchitecture levels

A better alternative may be to set your target architecture to one of the four microarchitecture levels (x86-64-v1 through x86-64-v4) defined by the x86-64 psABI i.e. in your case -march=x86-64-v2 (v3 is Haswell, and v4 is Skylake-AVX512 / Cannon Lake), and use -mtune and ifunc multiversion support (__attribute__ ((target)) / [[gnu::target]]) for intermediate and specialized processor support for performance-critical code.

The advantage of this approach is that you can write __builtin_cpu_supports("x86-64-v2") (since gcc 12) and know that you have at least a processor supporting that microarchitecture level. A downside is that support for this has not yet been released in clang, but it will be in clang 18.

Hybrid architectures

There could be concern over how this will work if you run on a processor (e.g. Alder Lake, Raptor Lake) that has performance and efficiency cores built on different architectures, since their silicon supports different instruction sets. If hardware exposed different feature sets, it could be a problem if your process might start on a P-core and then be rescheduled (or have some threads scheduled) on an E-core, which is why CPUs don't do that. Current software isn't ready for that (and there aren't plans for that to change).

Alder Lake disabled AVX-512 on the P cores, and added AVX2+BMI2 support to the E cores, bringing all cores to x86-64-v3 plus various other extensions other than AVX-512. Do efficiency cores support the same instructions as performance cores? (yes). Appropriate tuning choices can still differ between cores, since performance characteristics are different. (See Agner Fog's blog.)

gcc -mtune=alderlake is (I think) for the P cores. -mtune=gracemont is for the E cores. (Or for CPUs that only have E cores.) As -march settings, they enable the same set of extensions.

Early Alder Lake systems allowed AVX-512 to be enabled if E cores were disabled in the BIOS, or not present at all on that model, but unfortunately Intel changed their mind on that, with newer microcode not allowing AVX-512 (which some BIOS vendors worked around), and newer steppings of the CPU physically fusing off AVX-512 so even an old microcode version couldn't enable it.

Agner Fog's answer on How to detect P/E-Core in Intel Alder Lake CPU? says both P and E cores report the same family/model numbers via cpuid (except on old Alder Lake with AVX-512 enabled). But that some other cpuid leaves have different data, and according to Intel documentation for hybrid CPUs, there's a leaf for detecting core type.

(Unless even this difference is disabled as compatibility with bad DRM that detects the P and E cores as different systems trying to play a game on the same key. There is a legacy game compatibility feature in some BIOSes; it might work by interfering with detection mechanisms.)

Hermaphroditus answered 26/12, 2023 at 0:39 Comment(8)
For now at least, heterogeneous CPUs like Alder Lake support the same ISA extensions across E and P cores. (This is why Alder Lake P-cores are crippled, no AVX-512 support, except with BIOS hacks in systems with the E cores disabled, and Intel's trying to kill that for some unknown reason, probably to make people pay more for Xeon workstation CPUs if they want a desktop they can use to tune code for Sapphire Rapids servers. If they just want AVX-512, joke's on them thanks to Zen 4.) Different performance characteristics, and maybe detect as -march=gracemont vs. -march=alderlake, though.Drais
Heh, you edited about that just as I commented. The software ecosystem doesn't have a solution for having P cores with more features. A high-performance thread that happens to start on an E-core could end up not using AVX-512 if the E-core doesn't report it. But if AVX-512 is available, glibc memcpy will want to use it in all processes because masking is nice even with 256-bit vectors. AVX10 will define a 256-bit-only version of AVX-512, finally allowing the great new instructions like vpermb and vpternlogd to be used on mainstream systems, which often were already only using 256-bit.Drais
(So having all CPUs report a feature the E cores don't have, and having the OS migrate on illegal-instruction faults, will often end with nothing able to run on E cores, until / unless a new strategy for CPU feature detection is developed. Hrm, function pointers that are ABI-visible, which the OS can update when migrating a process between E and P cores? But only if it can tell that it's not already in the middle of an AVX-512 version of something. With Intel introducing AVX10, it looks like that's the plan, instead of ever having heterogeneous ISA features, only performance.)Drais
@PeterCordes thanks, I realized I don't know a huge amount about this area! Edited again.Hermaphroditus
I think you misunderstood. There's never been a way to boot an Alder Lake so it reported AVX-512 on P cores while there were E-core also enabled. That discussion was for the hypothetical case of a CPU that did, and why it wouldn't work well with the software ecosystem and thus isn't done. AVX-512 on Alder Lake was only possible on CPUs with no E-cores, or by disabling the E cores in the BIOS. Intel had planned to support this, and the engineers that designed Alder Lake were reportedly unhappy when Intel pulled the plug on AVX-512, fusing it off in later steppings and blocking it in microcodeDrais
anandtech.com/show/17047/… and phoronix.com/review/alder-lake-avx512 (2021 Nov) / pcgamer.com/intel-kills-alder-lake-avx-512-support-for-good (2022 Mar) / en.wikipedia.org/wiki/AVX-512#endnote_adl-avx512-noteDrais
I edited to clarify the phrasing of your last section. I ended up expanding it significantly with stuff that may only be tangentially related to the actual question, but -march=alderlake vs. -march=gracemont do exist as separate options. (There are E-core-only CPUs like i3-N305, which Intel calls "Alder Lake-N" - Why performance for this index-of-max function over many arrays of 256 bytes is so slow on Intel i3-N305 compared to AMD Ryzen 7 3800X? is an example of something it's bad at, and where GCC and clang -mtune=gracemont doesn't help.)Drais
@PeterCordes That's great, I was considering suggesting you go in and edit the last section. Thanks!Hermaphroditus

© 2022 - 2024 — McMap. All rights reserved.