SFINAE on assembly?
Asked Answered
D

1

16

Is it possible to use metaprogramming tricks to allow SFINAE on assembly blocks? For example to detect if an instruction like "CPUID" is available on a processor: (this is not valid code, but illustrates what I would like to achieve)

// This should work if `CPUID` is a valid instruction on the target architecture
template <
  class... T,
  class = decltype(sizeof...(T), asm volatile("CPUID":::)
>
bool f(T...) {
    return true;
}

// This should fail because `BLAH` is not an instruction
template <
  class... T,
  class = decltype(sizeof...(T), asm volatile("BLAH":::)
>
bool f(T...) {
    return true;
}
Deem answered 29/1, 2018 at 13:11 Comment(6)
Props for asking, even if it's impossibleHem
Isn't this achievable with conditional compilation based on predefined compiler-specific macros such as __SSE4_2__?Brierwood
I don't even have an idea how this might be implemented. An asm block is basically saying to the C++ front-end "ignore this bit, it goes directly to the backend". SFINAE is fully in the front-end, and eliminated code doesn't make it to the back-end.Dvandva
Are you asking about runtime CPU dispatching, or compile-time target options? Only the latter is even plausible for SFINAE, because as MSalters points out, the whole point of SFINAE is compile-time-only decisions.Forde
Its a good question. However as stated its a little misleading. What you're can find (statically) is whether that asm instruction is available on an architecture. CPUID is always available on the x64 architectures and never on ARM ones for example. movaps is available on x86 and never on ARM - however there are many x86 processors which dont support it and so you must check at runtime. Then you may have compiler macro support (e.g. __SSE4_2__) but this only parrots back at you what you told your build system in the first place. Fun.Ailanthus
@MikeVine: On i386 (32-bit x86) you can usefully detect if CPUID is supported, but usually you'll just assume it's supported, along with cmov and 686 features. But if you care about your code running on ancient CPUs, CPUID will fault there. See wiki.osdev.org/CPUID#Checking_CPUID_availability for a detection sequence, and the notes in sandpile.org/x86/cpuid.htm. Basically; checking which bits in FLAGS stay set after writing can detect 386 vs. 486 vs. 586, and specifically support for CPUID, which appeared in 486-SL and Pentium.Forde
R
1

It is impossible to achieve what is formulated in the question the way it is formulated for multiple reasons listed below. However, by generalizing the idea it may become something that might be making sense to include into some future revision of the language.

The reasons why it will not work:

  1. asm blocks are opaque to the C++ compiler. The syntax of such blocks is compiler-specific. I do not think that MS VC++ accepts clobber lists the way it is supported by GCC and Intel compiler. Moreover, Microsoft's x86_64 compilers stopped supporting assembly blocks as they force people to use intrinsics. By the way, maybe relying on presence of intrinsic functions can be used to offer a compile-time CPU dispatching instead? Could be worth exploring this idea.

  2. asm blocks are target architecture-specific. There are other ways to detect the target architecture at compile-time.

  3. The very notion of an instruction being present/absent is very vague. Which entity is authorized to make a decision on any given asm expression: the assembler program that translates its text into machine code or the target processor itself that runs the actual code? Both choices are problematic.

    • As an example, "MOV" is a popular mnemonic name for a multitude of architectures. But is it the same instruction in all cases? The semantics bound to the mnemonic is unlikely to match between non-related architectures.
    • Merely being successful in assembling does not mean it will execute fine. For example, on Intel 64 architecture an instruction may fault with #UD (undefined instruction signal) even if it is correct, because its behavior depends on runtime values of CR0 and CR4 registers which are controlled by an operating system. An assembler program will process it just fine in any case. One has to run the code. But what if we do cross-compilation, and cannot run it as the target processor does not match the host processor?

As it is, there is no way to know the outcome of an opaque block without executing it first. So, the compiler may want to call an arbitrary program to return a value which then will be used for template expansion. Such a program can then do processor- or instruction-sensing and return its findings to guide compilation further.

Now this looks abstract enough to be a language feature, as we dictate no assumptions on the nature of such an external program. There are still portability (+ cross-compiling) issues and security (running an external program is risky) issues. All in all, it looks better to me to rely on existing macrodefinitions coming into compiler from the environment.

Robbinrobbins answered 6/2, 2018 at 14:39 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.