add3 instruction for a+b+c with one single rounding
Asked Answered
P

1

6

Background

It is well known that the exact product of two floating point numbers is not always a floating point number, but the error exact(a*b) - float(a*b) is. Some codes for exact multiplication exploit this by returning the two numbers

res = a * b
err = fma(a, b, -res)

This makes use of the fused-multiply-add instruction which returns the expression (a*b)+c with one single rounding.

Question

Now, I would like to do the same for sums, i.e.

res = a + b
err = add3(a, b, -res)

add3 is supposed to return the expression (a+b)+c with one single rounding.

I wasn't able to find hints that add3 actually exists in the real world except in this article.

Is there a CPU instruction set that contains add3? Are there languages implementing it?

Presber answered 21/2, 2018 at 12:4 Comment(9)
Are you talking about SIMD or scalar operations here ? You have an sse tag and a Wikipedia link to SIMD FMA extensions in AMD/Intel CPUs, but your example above is for the scalar fma() function in C/C++ ?Mats
The question is about "add3", not fma.Reckon
Fine - but is the context scalar or SIMD, and are you looking for a specific architecture (e.g. x86) or is it a more general question ? And why the sse tag ?Mats
I was just wondering if such an instruction exists anywhere at all or if it's just a numerical analyst's dream right now. The flag sse was inappropriate, I removed it.Reckon
OK - thanks for the clarification - I'm not aware of any such instruction, but it's somewhat clearer now what you're looking for.Mats
Ping @EricPostpischil...Mats
Not available as an instruction on any major processor architecture (yet), but this publication shows how it can be emulated quite easily: Sylvie Boldo and Guillaume Melquiond, "Emulation of FMA and Correctly Rounded Sums: Proved Algorithms Using Rounding to Odd", IEEE Transactions on Computers, Vol. 57, No. 4, April 2008, pp. 462-470Trimerous
@PaulR: Eh, nearly six-year ping response.Farron
Do you need a fully general add3 for any three representable numbers, or do you just want to compute the err and res shown?Farron
F
2

The err and res requested in the question are provided by the Fast2Sum algorithm in Handbook of Floating-Point Arithmetic by Jean-Michel Muller et al, Birkhäuser, 2009, page 126, section 4.3.1, “The Fast2Sum Algorithm.” The book credits it to Dekker in 1971 with earlier appearance of its operations by Kahan in 1965:

Given a floating-point format with a base less than or equal to 3, with subnormal numbers, and numbers a and b representable in that format with |a| ≥ |b|, then, using round-to-nearest:

s = a+b;
z = s-a;
t = b-z;

computes s and t such that s is the floating-point number nearest to a+b and s+t = a+b exactly. (Hence s and t are the res and err requested in the question.)

|a| ≥ |b| is more than sufficient; the algorithm only requires the floating-point exponent of a be at least the exponent of b, but merely comparing the values may be easier. So a complete implementation needs something like if (fabs(b) > fabs(a)) swap(&a, &b); before the above code.

There is a proof in the book. (There is an erratum for the proof; it assumes, without loss of generality, that a > 0. That may be corrected in the second edition.)

This does not provide the suggested general add3 function, just the specific case. add3 is provided by the CorrectRoundedSum3 function by Boldo and Melquiond on page 201, section 6.3.4. It manipulates the encoding of floating-point numbers, which raises performance and portability issues. That manipulation is limited to an increment or decrement, so the standard C nexttoward function might serve in its place, although that is not necessarily any better for performance.

Farron answered 1/12, 2023 at 21:41 Comment(1)
This is Algorithm 1.1 in the article I linked. The idea of an add3 instruction would be to merge the last two steps into a single operation add3(a, b, -s).Reckon

© 2022 - 2024 — McMap. All rights reserved.