How to combine constexpr and vectorized code?

I am working on a C++ intrinsic wrapper for x64 and neon. I want my functions to be constexpr. My motivation is similar to Constexpr and SSE intrinsics, but #pragma omp simd and intrinsics may not be supported by the compiler (GCC) in a constexpr function. The following code is just a demonstration (auto-vectorization is good enough for addition).

struct FA{
    float c[4];
};

inline constexpr FA add(FA a, FA b){
    FA result{};
    #pragma omp simd            // clang error: statement not allowed in constexpr function
    for(int i = 0; i < 4; i++){ // GCC error: uninitialized variable 'i' in 'constexpr' function
        result.c[i] = b.c[i] + a.c[i];
    }
    return result;
}
struct FA2{
    __m128 c;
};


inline constexpr FA2 add2(FA2 a, FA2 b){
        FA2 result{};
        result.c = _mm_add_ps(a.c,b.c); // GCC error: call to non-'constexpr' function '__m128 _mm_add_ps(__m128, __m128)'
        return result;                  // fine with clang
}

I have to provide reference C++ code for portability anyway. Is there a code efficient way to let the compiler use the reference code at compile time?

f(){
    if(){
        // constexpr version
    }else{
        // intrinsic version
    }
}

It should work on all compilers that support omp, intrinsics and C++20.

#include <type_traits> struct FA{ float c[4]; }; // Just for the sake of the example. Makes for nice-looking assembly. extern FA add_parallel(FA a, FA b); constexpr FA add(FA a, FA b) { if (std::is_constant_evaluated()) { // do it in a constexpr-friendly manner FA result{}; for(int i = 0; i < 4; i++) { result.c[i] = b.c[i] + a.c[i]; } return result; } else { // can be anything that's not constexpr-friendly. return add_parallel(a, b); } } constexpr FA at_compile_time = add(FA{1,2,3,4}, FA{5,6,7,8}); FA at_runtime(FA a) { return add(a, at_compile_time); }

Recommended topics

Hot tags