Enable fast-math in Clang on a per-function basis?

UPD: I missed that you mentioned pragmas. The Option1 is fast math for one op as far as I understood. I am not sure about flushing subnormals, I hope it's not affected.

I didn't find the per function options, but I did find 2 pragmas that can help.

Let's say we want dot product.

Option 1.

float innerProductF32(const float* a, const float* b, std::size_t size) {
  float res = 0.f;

  for (std::size_t i = 0; i != size; ++i) {
#pragma float_control(precise, off)
    res += a[i] * b[i];
  }
  return res;
}

Option2:

float innerProductF32(const float* a, const float* b, std::size_t size) {
  float res = 0.f;

  _Pragma("clang loop vectorize(enable) interleave(enable)")
  for (std::size_t i = 0; i != size; ++i) {
    res += a[i] * b[i];
  }
  return res;
}

The second one is less powerful, it does not generate fma instructions, but maybe it's not what you want.

Recommended topics

Hot tags