How to verify at runtime that architecture matches -march=?

This is surprisingly difficulty, as well as being compiler-specific. I'll cover techniques usable with gcc, then clang, then alternatives. Bear in mind that the information presented is correct as of late 2023; it could change in future.

Determining the target processor

Note: you can skip this if you fully control the build system, since you can just pass e.g. -march=skylake -DMARCH=skylake, but it's worth reading anyway since some techniques will be reused later.

gcc (in ix86_target_macros_internal) sets preprocessor macros corresponding to the processor; so if you have a list of supported processors (e.g. extracted from processor_names) you can test each of these macros (using #ifdef or, inside a macro, using https://mcmap.net/q/1045110/-test-if-preprocessor-symbol-is-defined-inside-macro). Schematically:

char const* target_processor() {
    // ...
#if __skylake__
    return "skylake";
#endif
#if __skylake_avx512__
    return "skylake-avx512";
#endif
    // ...
}

Note that you will probably want to ensure that a compile error occurs if none or more than one of the macros is set, since in that case you would want to update your code.

Determining the runtime processor

gcc provides __builtin_cpu_is for exact processor detection. Note that the argument must be a string literal, since internally this is converted to checking integer fields on the __cpu_model global, which is set by cpu_indicator_init called from __builtin_cpu_init. So again you can list known processors:

char const* runtime_processor() {
    // ...
    if (__builtin_cpu_is("skylake"))
        return "skylake";
    if (__builtin_cpu_is("skylake-avx512"))
        return "skylake-avx512";
    // ...
    return "unknown";
}

Note that you will need to handle the case of unknown processors, since your code could be run on a newer processor than gcc knows about!

Determining feature support

Even if you know the target and runtime processors, that doesn't always mean you know that the processor will be able to run your code; features are added to server and client lines at different times, they can be removed (e.g. 3dNow), they depend on vendor (Intel vs. AMD), and some are dependent on OS support (e.g. AVX via OSXSAVE, detected at runtime via XGETBV: does gcc's __builtin_cpu_supports check for OS support?). So it is better to detect whether the ISA features that -march enables are present at runtime, using __builtin_cpu_supports.

You can use the processor_features enumeration to iterate over known features, use the macros set by ix86_target_macros_internal to detect which are in use, and the ISA_NAMES_TABLE table to map between these and feature name strings. Schematically:

void check_features() {
    // ...
#if __F16C__
    if (!__builtin_cpu_supports("f16c"))
        error("f16c not present");
#endif
#if __RDSEED__
    if (!__builtin_cpu_supports("rdseed"))
        error("rdseed not present");
#endif
    // ...
}

By constructing this code, you can be sure that the runtime environment actually does support all the features that you told gcc are available (via the -march= flag).

clang

Unfortunately, things are a bit trickier with clang. Firstly, it does not set __processor__ macros, so if you want to inform the user of the target processor you will need to pass this in via the build system.

Detecting runtime processor version should work, using X86TargetParser.def; __builtin_cpu_is works much the same as on gcc.

Secondly, clang is a long way behind in support for the ISA features that gcc knows about - at present, it doesn't expose support via __builtin_cpu_supports for anything past avx512vp2intersect (e.g., 3dNow, ADX and later instructions). You could examine the __cpu_model and __cpu_features2 globals directly, although that won't work if you're using -rtlib=compiler-rt, since compiler-rt only sets a limited and outdated set of flags.

Hopefully this will be fixed soon, but in the meantime you might need to write the checks yourself consulting cpuid flags; see get_available_features for how libgcc does this, and getAvailableFeatures for the corresponding compiler-rt code.

A just-barely viable technique might be to compile separate TUs with different -march= flags and use that to detect statically which ISA features clang expects to be available for each processor; however that would not work for conditionally-present features or for newer processors that your version of clang does not know about.

Microarchitecture levels

A better alternative may be to set your target architecture to one of the four microarchitecture levels (x86-64-v1 through x86-64-v4) defined by the x86-64 psABI i.e. in your case -march=x86-64-v2 (v3 is Haswell, and v4 is Skylake-AVX512 / Cannon Lake), and use -mtune and ifunc multiversion support (__attribute__ ((target)) / [[gnu::target]]) for intermediate and specialized processor support for performance-critical code.

The advantage of this approach is that you can write __builtin_cpu_supports("x86-64-v2") (since gcc 12) and know that you have at least a processor supporting that microarchitecture level. A downside is that support for this has not yet been released in clang, but it will be in clang 18.

Hybrid architectures

There could be concern over how this will work if you run on a processor (e.g. Alder Lake, Raptor Lake) that has performance and efficiency cores built on different architectures, since their silicon supports different instruction sets. If hardware exposed different feature sets, it could be a problem if your process might start on a P-core and then be rescheduled (or have some threads scheduled) on an E-core, which is why CPUs don't do that. Current software isn't ready for that (and there aren't plans for that to change).

Alder Lake disabled AVX-512 on the P cores, and added AVX2+BMI2 support to the E cores, bringing all cores to x86-64-v3 plus various other extensions other than AVX-512. Do efficiency cores support the same instructions as performance cores? (yes). Appropriate tuning choices can still differ between cores, since performance characteristics are different. (See Agner Fog's blog.)

gcc -mtune=alderlake is (I think) for the P cores. -mtune=gracemont is for the E cores. (Or for CPUs that only have E cores.) As -march settings, they enable the same set of extensions.

Early Alder Lake systems allowed AVX-512 to be enabled if E cores were disabled in the BIOS, or not present at all on that model, but unfortunately Intel changed their mind on that, with newer microcode not allowing AVX-512 (which some BIOS vendors worked around), and newer steppings of the CPU physically fusing off AVX-512 so even an old microcode version couldn't enable it.

Agner Fog's answer on How to detect P/E-Core in Intel Alder Lake CPU? says both P and E cores report the same family/model numbers via cpuid (except on old Alder Lake with AVX-512 enabled). But that some other cpuid leaves have different data, and according to Intel documentation for hybrid CPUs, there's a leaf for detecting core type.

(Unless even this difference is disabled as compatibility with bad DRM that detects the P and E cores as different systems trying to play a game on the same key. There is a legacy game compatibility feature in some BIOSes; it might work by interfering with detection mechanisms.)