The err
and res
requested in the question are provided by the Fast2Sum algorithm in Handbook of Floating-Point Arithmetic by Jean-Michel Muller et al, Birkhäuser, 2009, page 126, section 4.3.1, “The Fast2Sum Algorithm.” The book credits it to Dekker in 1971 with earlier appearance of its operations by Kahan in 1965:
Given a floating-point format with a base less than or equal to 3, with subnormal numbers, and numbers a
and b
representable in that format with |a
| ≥ |b
|, then, using round-to-nearest:
s = a+b;
z = s-a;
t = b-z;
computes s
and t
such that s
is the floating-point number nearest to a
+b
and s
+t
= a
+b
exactly. (Hence s
and t
are the res
and err
requested in the question.)
|a
| ≥ |b
| is more than sufficient; the algorithm only requires the floating-point exponent of a
be at least the exponent of b
, but merely comparing the values may be easier. So a complete implementation needs something like if (fabs(b) > fabs(a)) swap(&a, &b);
before the above code.
There is a proof in the book. (There is an erratum for the proof; it assumes, without loss of generality, that a
> 0. That may be corrected in the second edition.)
This does not provide the suggested general add3
function, just the specific case. add3
is provided by the CorrectRoundedSum3
function by Boldo and Melquiond on page 201, section 6.3.4. It manipulates the encoding of floating-point numbers, which raises performance and portability issues. That manipulation is limited to an increment or decrement, so the standard C nexttoward
function might serve in its place, although that is not necessarily any better for performance.
sse
tag and a Wikipedia link to SIMD FMA extensions in AMD/Intel CPUs, but your example above is for the scalarfma()
function in C/C++ ? – Matsadd3
", notfma
. – Reckonsse
tag ? – Matssse
was inappropriate, I removed it. – Reckonadd3
for any three representable numbers, or do you just want to compute theerr
andres
shown? – Farron