Fused multiply add and default rounding modes
Asked Answered
C

2

16

With GCC 5.3 the following code compield with -O3 -fma

float mul_add(float a, float b, float c) {
  return a*b + c;
}

produces the following assembly

vfmadd132ss     %xmm1, %xmm2, %xmm0
ret

I noticed GCC doing this with -O3 already in GCC 4.8.

Clang 3.7 with -O3 -mfma produces

vmulss  %xmm1, %xmm0, %xmm0
vaddss  %xmm2, %xmm0, %xmm0
retq

but Clang 3.7 with -Ofast -mfma produces the same code as GCC with -O3 fast.

I am surprised that GCC does with -O3 because from this answer it says

The compiler is not allowed to fuse a separated add and multiply unless you allow for a relaxed floating-point model.

This is because an FMA has only one rounding, while an ADD + MUL has two. So the compiler will violate strict IEEE floating-point behaviour by fusing.

However, from this link it says

Regardless of the value of FLT_EVAL_METHOD, any floating-point expression may be contracted, that is, calculated as if all intermediate results have infinite range and precision.

So now I am confused and concerned.

  1. Is GCC justified in using FMA with -O3?
  2. Does fusing violate strict IEEE floating-point behaviour?
  3. If fusing does violate IEEE floating-point beahviour and since GCC returns __STDC_IEC_559__ isn't this a contradiction?

Since FMA can be emulated in software it seems to be there should be two compiler switches for FMA: one to tell the compiler to use FMA in calculations and one to tell the compiler that the hardware has FMA.


Apprently this can be controlled with the option -ffp-contract. With GCC the default is -ffp-contract=fast and with Clang it's not. Other options such as -ffp-contract=on and -ffp-contract=off do no produce the FMA instruction.

For example Clang 3.7 with -O3 -mfma -ffp-contract=fast produces vfmadd132ss.


I checked some permutations of #pragma STDC FP_CONTRACT set to ON and OFF with -ffp-contract set to on, off, and fast. IN all cases I also used -O3 -mfma.

With GCC the answer is simple. #pragma STDC FP_CONTRACT ON or OFF makes no difference. Only -ffp-contract matters.

GCC it uses fma with

  1. -ffp-contract=fast (default).

With Clang it uses fma

  1. with -ffp-contract=fast.
  2. with -ffp-contract=on (default) and #pragma STDC FP_CONTRACT ON (default is OFF).

In other words with Clang you can get fma with #pragma STDC FP_CONTRACT ON (since -ffp-contract=on is the default) or with -ffp-contract=fast. -ffast-math (and hence -Ofast) set -ffp-contract=fast.


I looked into MSVC and ICC.

With MSVC it uses the fma instruction with /O2 /arch:AVX2 /fp:fast. With MSVC /fp:precise is the default.

With ICC it uses fma with -O3 -march=core-avx2 (acctually -O1 is sufficient). This is because by default ICC uses -fp-model fast. But ICC uses fma even with -fp-model precise. To disable fma with ICC use -fp-model strict or -no-fma.

So by default GCC and ICC use fma when fma is enabled (with -mfma for GCC/Clang or -march=core-avx2 with ICC) but Clang and MSVC do not.

Carri answered 23/12, 2015 at 12:57 Comment(6)
Might be a compiler bug. Consider reporting it.Mccarver
I'm pretty sure what gcc is doing is ok. After reading the FLT_EVAL_METHOD doc about contracting FP expressions, I'm surprised clang doesn't do this. I'm not posting this as an answer, since it's not based on any real standards documentation, just my understanding of how I think things should work / should have been designed, given the material in the question.Misreckon
@FUZxxl, do you think the floating point tag would be more appropriate than ieee-754? (if so feel free to change it). I feel like I should be using the floating point tag as well.Carri
"Does fusing violate strict IEEE floating-point behavior?" --> IMO, yes. Use double fma(double x, double y, double z);instead as that is a function call that in an optimized compiler will call the expected assembly code. This does not violate "IEEE floating-point behaviour".Dearing
gcc.gnu.org/bugzilla/show_bug.cgi?id=37845Carri
Does this answer your question? Difference in gcc -ffp-contract optionsThinkable
M
6

It doesn't violate IEEE-754, because IEEE-754 defers to languages on this point:

A language standard should also define, and require implementations to provide, attributes that allow and disallow value-changing optimizations, separately or collectively, for a block. These optimizations might include, but are not limited to:

...

― Synthesis of a fusedMultiplyAdd operation from a multiplication and an addition.

In standard C, the STDC FP_CONTRACT pragma provides the means to control this value-changing optimization. So GCC is licensed to perform the fusion by default, so long as it allows you to disable the optimization by setting STDC FP_CONTRACT OFF. Not supporting that means not adhering to the C standard.

Mariel answered 15/1, 2016 at 19:3 Comment(7)
What do you mean by "Not supporting that means not adhering to the C standard"? Incidentally, GCC seems to ignore STDC FP_CONTRACT. Instead it only uses -ffp-contract. Clang recognizes both.Carri
I mean that FP_CONTRACT is part of the C standard. To ignore it is to not conform.Mariel
Oh, I did you realized you were referring to GCC not supporting FP_CONTRACT (or any compiler which does not support it). Now I understand.Carri
So this answer is wrong then "the compiler will violate strict IEEE floating-point behavior by fusing"? That's what through me off.Carri
The standard defers to languages to set policy for this, so if an implementation doesn't adhere to the language standard, it's definitely at least violating the spirit of IEEE 754.Mariel
If GCC did recognize FP_CONTRACT it would be free to have ON as default so then the answer would be wrong. And in any case GCC supports -ffp-contract which effectively does the same thing. Let me put it a different way. Clang recognizes FP_CONTRACT and defaults to OFF. Would Clang violate IEEE if it defaulted to FP_CONTRACT ON?Carri
The default can be either ON or OFF. But you need to support the pragma to conform to the standard.Mariel
I
4

When you quoted that fused multiply-add is allowed, you left out the important condition "unless pragma FP_CONTRACT is off". Which is a newish feature in C (I think introduced in C99) and was made absolutely necessary by PowerPC, which all had fused multiply-add from the start - actually, x*y was equivalent to fma (x, y, 0) and x+y was equivalent to fma (1.0, x, y).

FP_CONTRACT is what controls fused multiply/add, not FLT_EVAL_METHOD. Although if FLT_EVAL_METHOD allows higher precision, then contracting is always legal; just pretend that the operations were performed with very high precision and then rounded.

The fma function is useful if you don't want the speed, but the precision. It will calculate the contracted result slowly but correctly even if it isn't available in hardware. And should be inlined if it is available in hardware.

Injustice answered 23/12, 2015 at 13:40 Comment(3)
I think this to some degree answers my first question about if GCC is justified in just fma with -O3. But it's still not clear if it's IEEE compliant. And since GCC defines __STDC_IEC_559__ then I can assume it's IEEE compliant but other people claim fma breaks this (which would argue GCC is not justified in doing this when __STDC_IEC_559__ is defined). So I am still confused.Carri
@Zboson: I noticed that stuff about the pragma in the doc I linked you, but didn't know how new or widely supported that was. That's why I didn't mention it earlier.Misreckon
@PeterCordes, that's okay, GCC does not seem to care about that pragma anyway so it's a moot issue. And in anycase it says nothing about it being IEEE compliant. GCC returns __STDC_IEC_559__ and at the same uses -ffp-contract=fast so I still want to know if this is a contradiction.Carri

© 2022 - 2024 — McMap. All rights reserved.