Note: this question is about CPU instructions, not high-level languages (where you are at the mercy of the compiler)
From a popular answer:
The same floating-point operations, run on the same hardware, always produces the same result.
Can we make a stronger guarantee though, on x86-64? What if the hardware is a bit different? Are CPU instructions reproducible within the same family of CPUs? Where is the boundary of reproducibility?
What if the hardware is a bit different?
... then that's not the same hardware – Gulgeersqrtss
) can vary quite a bit depending on µarch, even for processors from the same vendor. – Tamatamablefsin
that aren't required to be "correctly rounded" (precise to the last mantissa bit) the way IEEE basic ops are? (+ - * / and sqrt). Most x86-64 math libraries don't use 387 instructions because they're not fast. Outside of x87, the only SSE/AVX instructions that leave room for implentation-dependent results are I thinkrsqrtss/ps
andrcpss/ps
like @Tamatamable mentioned. And with AVX-512,VRSQRT14PS/pd
andvrcp14ps/pd
. (And probably the Xeon Phi 28-bit versions and Xeon Phivexp2ps/pd
) – Functionalismlscpu
) and the CPU family numbers are the same, can we expect reproducible results? – Dunkirk6
for Intel CPUs from PPro to current, other than Pentium 4. Sandybridge is basically a new microarchitecture family, but it inherits a lot from P6 and they didn't bump the family number. The "model" changes every microarchitecture but not "family". With the same CPUID feature flags, Haswell through current Alder Lake haven't added any FP-related stuff (if we skip Ice Lake / Tiger Lake that have AVX-512 even on client CPUs, or look at "Celeron" versions of those), although there are other features that would makelscpu
output different. – Functionalismfsincos
etc. microcode might have changed at any point between those, orrsqrtps
, but Haswell didn't have AVX-512 at all, Skylake and later do but they leave it disabled. If there is an increase in precision ofrsqrtps
to make it the same asvrsqrt14ps
, I'd expect it between Haswell and some later CPU, perhaps Skylake (1st gen with AVX-512). That's an interesting question I don't know the answer to. I'd encourage you to flesh out your question with an awareness of this kind of detail. – Functionalismfdiv
bug in P5 was one of the motivations for having microcode in P6 that's loaded from the firmware and updateable, but I don't think they've made a similar mistake since.) – Functionalismrsqrt
), all instructions are accurate within the limits of floating point. Addition, multiplication, division, square root all produce the correctly rounded result and do so fully deterministically. They have done so for ages (not counting CPU bugs) and will continue to do so in future generations of x86 chips. – AncelinDE UE PE
(Denormal, Underflow, and Precision exceptions) in MXCSR after7.0064923216e-39f / 5005000.0f
(withdivss
in asm) which produces the binary32 bit-pattern0x00000001
. (The first number ismin_subnormal * 5000000
). vs. only DE w.x / 5000000.0
– Functionalism