double-double implementation resilient to FPU rounding mode
Asked Answered
L

1

2

Context: double-double arithmetic

“Double-double” is a representation of numbers as the sum of two double-precision numbers without overlap in the significands. This representation takes advantage of existing double-precision hardware implementations for “near quadruple-precision” computations.

One typical low-level C function in a double-double implementation may take two double-precision numbers a and b with |a| ≥ |b| and compute the double-double number (s, e) that represents their sum:

s = a + b;
e = b - (s - a);

(Adapted from this article.)

These implementations typically assume round-to-nearest-even mode.

In the above computation, (s, e) is a normalized double-double only because of this assumption. Without it, with a == 0x1.0p60, b == 1, in round-upward mode, s is computed as 0x1.0000000000001p60 and e a bit above -0x0.0000000000001p60. Their sum is equal to the mathematical sum of a and b but their significands overlap.

Take a == 0x1.0p120 and the mathematical sums of a and b on the one hand and s and e on the other hand do not even coincide any more.

Question

Is there a way to build a double-double-like library with the same properties that a typical double-double library has in round-to-nearest-even (that is, relatively fast and relatively accurate), but that works whatever the rounding mode happens to be?

Does such a library already exist?

More general context: correctly rounded elementary functions

Implementations of the double-double sort are used for intermediate computations in the implementation of libraries of correctly rounded elementary functions. As a result, libraries implemented this way tend to fail spectacularly when a function is called while the FPU is not in round-to-nearest-even mode. Changing the rounding mode inside the function is not very palatable, for performance reasons and because a signal arriving while the function is executing would leave the FPU in round-to-nearest-even mode. The simplest way I see to have fast, correctly rounded elementary functions that work in any rounding mode would be if one could somehow rely on a double-double-kind of arithmetic that worked in any rounding mode.

Lornalorne answered 6/4, 2013 at 16:35 Comment(1)
I am not aware of any such code that works correctly independent of rounding mode. There is one paper that describes how to implement double-native arithmetic when the underlying machine arithmetic truncates (rounds to zero) instead of using round-to-nearest: Hong Diep Nguyen, Stef Graillat and Jean-Luc Lamotte. Extended precision with a rounding mode toward zero environment. Application to the Cell processor. Int. J. Reliability and Safety, Vol. 3, Nos. 1/2/3, 2009. www-anp.lip6.fr/~graillat/papers/IJRS.pdfLecher
L
2

The article referred to by njuffa offers the function below, with very similar notations to those of my question, except that what is denoted fl (a+b) there is simply denoted a+b in my question:

Two−Sum−toward−zero2 (a, b)

if (|a| < |b|)
  swap (a , b)
s = fl (a + b)
d = fl (s − a)
e = fl (b − d)
if(|2 ∗ b|<|d|)
  s = a, e = b
return (s, e)

This is a very neat fix for this particular elementary computation when in round-toward-zero mode. It makes one hope that something would be possible for implementing a correctly rounded elementary function, at the very least by testing the rounding mode early and picking separate algorithms, or perhaps by writing very careful code that works for all rounding modes.

Lornalorne answered 6/4, 2013 at 16:35 Comment(8)
Depending on the context of your question, one solution may be to use a platform that does not implement dynamic rounding modes, but instead encodes the rounding mode directly into each floating-point instruction. An example would be the DEC Alpha architecture (which also offered support for dyanmic rounding), or GPUs. For example, the CUDA environment for NVIDIA GPUs provides intrinsics for all basic math operations for each of the original four IEEE roundings modes. This is very handy when coding interval arithmetic routines, for example.Lecher
@Lecher I dream of a libm that (1) is fast, (2) works regardless of rounding mode, (3) is correctly rounded, but implicitly also (4) runs on my hardware. I know where to get (1)&(3)&(4) or (1)&(2)&(4) or (2)&(3)&(4). Architectures with selectable rounding modes are nice, and I was slightly disappointed that IA64 (which also features them) did not take. The hint about CUDA is very good news indeed.Lornalorne
@PascalCuoq, the link to the article is broken.Vessel
@Zboson Updated. If even academics cannot keep URLs of their articles working, …Lornalorne
@PascalCuoq, thanks, that link I find a bit easier to read then some of the others. BTW, why did you make your answer community wiki? I just realized you answered your own question. Is this why you made it community wiki? Is this considered good etiquette if you answer your own question?Vessel
@Zboson It is not necessary to make “community wiki” an answer you write to your own question, but in this case all I did was write up something that Norbert had said in a comment, to clarify for every future reader that his comments had solved my problem. This reminds me that he did it again and that I need to apply the same procedure for this newer question: #29579320Lornalorne
I guess Norbert is @njuffa? There are a lot of good answers by him on this subject!Vessel
@Zboson Yes sorry I meant njuffa.Lornalorne

© 2022 - 2024 — McMap. All rights reserved.