add3 instruction for a+b+c with one single rounding

Background

It is well known that the exact product of two floating point numbers is not always a floating point number, but the error exact(a*b) - float(a*b) is. Some codes for exact multiplication exploit this by returning the two numbers

res = a * b
err = fma(a, b, -res)

This makes use of the fused-multiply-add instruction which returns the expression (a*b)+c with one single rounding.

Question

Now, I would like to do the same for sums, i.e.

res = a + b
err = add3(a, b, -res)

add3 is supposed to return the expression (a+b)+c with one single rounding.

I wasn't able to find hints that add3 actually exists in the real world except in this article.

Is there a CPU instruction set that contains add3? Are there languages implementing it?

The err and res requested in the question are provided by the Fast2Sum algorithm in Handbook of Floating-Point Arithmetic by Jean-Michel Muller et al, Birkhäuser, 2009, page 126, section 4.3.1, “The Fast2Sum Algorithm.” The book credits it to Dekker in 1971 with earlier appearance of its operations by Kahan in 1965:

Given a floating-point format with a base less than or equal to 3, with subnormal numbers, and numbers a and b representable in that format with |a| ≥ |b|, then, using round-to-nearest:

s = a+b;
z = s-a;
t = b-z;

computes s and t such that s is the floating-point number nearest to a+b and s+t = a+b exactly. (Hence s and t are the res and err requested in the question.)

|a| ≥ |b| is more than sufficient; the algorithm only requires the floating-point exponent of a be at least the exponent of b, but merely comparing the values may be easier. So a complete implementation needs something like if (fabs(b) > fabs(a)) swap(&a, &b); before the above code.

There is a proof in the book. (There is an erratum for the proof; it assumes, without loss of generality, that a > 0. That may be corrected in the second edition.)

This does not provide the suggested general add3 function, just the specific case. add3 is provided by the CorrectRoundedSum3 function by Boldo and Melquiond on page 201, section 6.3.4. It manipulates the encoding of floating-point numbers, which raises performance and portability issues. That manipulation is limited to an increment or decrement, so the standard C nexttoward function might serve in its place, although that is not necessarily any better for performance.

Background

Question

Recommended topics

Hot tags