Consider the following functions:
static inline float Eps(const float x) {
const float eps = std::numeric_limits<float>::epsilon();
return (1.0f + eps) * x - x;
}
float Eps1() {
return Eps(0xFFFFFFp-24f);
}
float Eps2() {
const float eps = std::numeric_limits<float>::epsilon();
const float x = 0xFFFFFFp-24f;
return (1.0f + eps) * x - x;
}
At -O2
with -std=c++20
, both of these functions compile down to a single movss
followed by a ret
using clang 16.0.0 targetting x86 and a mov
followed by a bx
with gcc 11.2.1 targeting ARM. The assembly generated for ARM is consistent with a returned value of ~5.96e-8, but the assembly generated for x86 is not. Eps1()
(using the inline function) returns ~1.19e-7 while Eps2()
returns ~5.96e-8. [Compiler Explorer / Godbolt]
.LCPI0_0:
.long 0x33ffffff # float 1.19209282E-7
Eps1(): # @Eps1()
movss xmm0, dword ptr [rip + .LCPI0_0] # xmm0 = mem[0],zero,zero,zero
ret
.LCPI1_0:
.long 0x33800000 # float 5.96046448E-8
Eps2(): # @Eps2()
movss xmm0, dword ptr [rip + .LCPI1_0] # xmm0 = mem[0],zero,zero,zero
ret
I can sort of understand the compiler choosing either option. With x = 0xFFFFFFp-24f
(i.e. the next representable value below 1.0f
), both compilers consistently round (1.0f + eps) * x
to 1.0f
which means that (1.0f + eps) * x - x
will give the smaller value. However, machine precision of 1.0f
is twice that of 0xFFFFFFp-24f
so something like a multiply-add instruction that preserves extra precision would have an intermediate value of roughly 1.0 + 0.5 * eps
which will yield the larger value.
The thing I don't understand is why the answer changes depending on whether the math is in an inline function or directly invoked. Is there somewhere in the standard that rationalizes this, is this undefined behavior, or is this a Clang bug?