float128 and double-double arithmetic
Asked Answered
B

1

3

I've seen in wikipedia that someway to implement quad-precision is to use double-double arithmetic even if it's not exactly the same precision in terms of bits: https://en.wikipedia.org/wiki/Quadruple-precision_floating-point_format

In this case, we use two double to store the value. So we make two operations to compute the result, one for each double of the result.

In this case we can have round-off errors on each double or their is a mechanism that avoid this?

Breazeale answered 27/7, 2015 at 7:38 Comment(1)
I'm afraid your question is both unclear and too broad. Download lipforge.ens-lyon.fr/frs/download.php/162/… and read the definitions of Add22 and Mul22 in crlibm_private.h, that will give you an idea of how these things work.Sapindaceous
S
11

“In this case, we use two double to store the value. So we need to make two operations at each time.”

This is not how double-double arithmetic works. You should expect one double-double operation to be implemented in anywhere from 6 to 20 double operations depending on the actual operation being implemented, the availability of a fused-multiply-add operation, the assumption that one operand is larger than the other, …

For instance, here is one implementation of a double-double multiplication for when an FMA instruction is not available, taken from CRlibm:

#define Mul22(zh,zl,xh,xl,yh,yl)                      \
{                                                     \
double mh, ml;                                        \
                              \
  const double c = 134217729.;                \
  double up, u1, u2, vp, v1, v2;              \
                              \
  up = (xh)*c;        vp = (yh)*c;            \
  u1 = ((xh)-up)+up;  v1 = ((yh)-vp)+vp;          \
  u2 = (xh)-u1;       v2 = (yh)-v1;                   \
                              \
  mh = (xh)*(yh);                     \
  ml = (((u1*v1-mh)+(u1*v2))+(u2*v1))+(u2*v2);        \
                              \
  ml += (xh)*(yl) + (xl)*(yh);                \
  *zh = mh+ml;                        \
  *zl = mh - (*zh) + ml;                              \
}

The first 8 operations alone are for dividing exactly each double from the operands into two halves so that one half from each side can be multiplied with one half from the other side and the result obtained exactly as a double. The computations u1*v1, u1*v2, … do exactly that.

The values obtained in mh and ml can overlap, so the last 3 operations are there to renormalize the result into the sum of two floating-point numbers.

In this case we can have round-off errors on each double or their is a mechanism that avoid this?

As the comment says:

/*
 * computes double-double multiplication: zh+zl = (xh+xl) *  (yh+yl)
 * relative error is smaller than 2^-102
 */

You can find about all the mechanisms used to achieve these results in the Handbook of Floating-Point Arithmetic.

Sapindaceous answered 27/7, 2015 at 8:9 Comment(3)
Ok so double-double arithmetic is based on compensated algorithm. So of course you need more than two op. It's what I haven't found. Thank you. Just a little modification if you use FastTwoSum you have 3 op (and absolute value and a branch) instead of 6 op (for this part: 6 to 20 double operations)Breazeale
@RomainPicot It seems to me that the one that takes only 3 operations (not counting the conditional if we assume that the order of arguments is known) is Add12, which is not a double-double operation but a convenient auxiliary function for writing them. Anyway, counting the number of operations regardless of the actual operation being implemented and the hypotheses available is not an exact science :)Sapindaceous
You're right. And even if their is less op, you're slower in most cases due to the branch. (I don't remenber if this come from The handbook of Floating-Point Arithmetic or The art of computer programming). The most important for me is to see that compensated algorithm is used :-)Breazeale

© 2022 - 2024 — McMap. All rights reserved.