UPD: I missed that you mentioned pragmas. The Option1 is fast math for one op as far as I understood. I am not sure about flushing subnormals, I hope it's not affected.
I didn't find the per function options, but I did find 2 pragmas that can help.
Let's say we want dot product.
Option 1.
float innerProductF32(const float* a, const float* b, std::size_t size) {
float res = 0.f;
for (std::size_t i = 0; i != size; ++i) {
#pragma float_control(precise, off)
res += a[i] * b[i];
}
return res;
}
Option2:
float innerProductF32(const float* a, const float* b, std::size_t size) {
float res = 0.f;
_Pragma("clang loop vectorize(enable) interleave(enable)")
for (std::size_t i = 0; i != size; ++i) {
res += a[i] * b[i];
}
return res;
}
The second one is less powerful, it does not generate fma
instructions, but maybe it's not what you want.