Does a floating-point reciprocal always round-trip?
Asked Answered
N

1

9

For IEEE-754 arithmetic, is there a guarantee of 0 or 1 units in the last place accuracy for reciprocals? From that, is there a guaranteed error-bound on the reciprocal of a reciprocal?

Nieves answered 19/6, 2017 at 6:11 Comment(1)
This question is vaguely related to the question of implementing single-precision multiplication by a constant by a double-precision multiplication by the reciprocal followed by rounding to single-precision, also answered by Mark: #19590473Fogdog
W
16

[Everything below assumes a fixed IEEE 754 binary format, with some form of round-to-nearest as the rounding-mode.]

Since reciprocal (computed as 1/x) is a basic arithmetic operation, 1 is exactly representable, and the arithmetic operations are guaranteed correctly rounded by the standard, the reciprocal result is guaranteed to be within 0.5 units in the last place of the true value. (This applies to any of the basic arithmetic operations specified in the standard.)

The reciprocal of the reciprocal of a value x is not guaranteed to be equal to x, in general. A quick example with the IEEE 754 binary64 format:

>>> x = 1.8
>>> 1.0 / (1.0 / x)
1.7999999999999998
>>> x == 1.0 / (1.0 / x)
False

However, assuming that overflow and underflow are avoided, and that x is finite, nonzero, and exactly representable, the following results are true:

  1. The value of 1.0 / (1.0 / x) will differ from x by no more than 1 unit in the last place.

  2. If the significand of x (normalised to lie in the range [1.0, 2.0) as usual) is smaller than sqrt(2), then the reciprocal does roundtrip: 1.0 / (1.0 / x) == x.

Sketch of proof: without loss of generality we can assume that x is positive, and scale x by a power of two so that it lies in the range [1.0, 2.0). The results above are clearly true in the case that x is an exact power of two, so let's assume it's not (this will be useful in the second step below). The proof below is given for precision specific to the IEEE 754 binary64 format, but it adapts directly to any IEEE 754 binary format.

Write 1 / x for the true value of the reciprocal, before rounding, and let y be the (unique, as it turns out) nearest representable binary64 float to 1 / x. Then:

  • since y is the closest float to 1 / x, and both y and 1/x are in the interval [0.5, 1.0], where the spacing between successive floats is exactly 2^-53, we have |y - 1/x| <= 2^-54. In fact, we can do a bit better:

  • we actually have a strict inequality above: |y - 1/x| < 2^-54. If |y - 1/x| were exactly equal to 2^-54, then 1/x would be exactly representable in arbitrary-precision binary floating-point (since both y and 2^-54 are). But the only binary floats x for which 1/x is exactly representable at some precision are powers of two, and we already excluded this case.

  • if x < sqrt(2) then 1 / x > x / 2, hence (rounding both to the nearest representable float), we have y >= x / 2, so x / y <= 2.

  • now x - 1/y = (y - 1/x) x/y, and from the bounds on |y - 1/x| and x/y (still assuming that x < sqrt(2)) we get |x - 1/y| < 2^-53. It follows that x is the closest representable float to 1/y, 1/y rounds to x and the roundtrip succeeds. This completes the proof of part 2.

  • in the general case x < 2, we have x / y < 4, so |x - 1/y| < 2^-52. That makes 1/y at most 1 ulp away from x, which completes the proof of part 1.

Here's a demonstration of the sqrt(2) threshold: using Python, we take a million random floats in the range [1.0, 2.0), and identify those that don't roundtrip through the reciprocal. All of the samples smaller than sqrt(2) pass the roundtrip.

>>> import random
>>> samples = [random.uniform(1.0, 2.0) for _ in range(10**6)]
>>> bad = [x for x in samples if 1.0 / (1.0 / x) != x]
>>> len(bad)
171279
>>> min(bad)
1.4150519879892107
>>> import math
>>> math.sqrt(2)
1.4142135623730951

In comments, @Kelly Bundy finds that the IEEE 754 binary64 value x closest to √2 for which 1.0 / (1.0 / x) != x is 0x1.6a09e6964b6acp+0, or approximately 1.4142135731630487, roughly 48.6 million ulps away from √2.

And a demonstration that the maximum error is no more than 1 ulp, in general (for the binary64 format, in the interval [1.0, 2.0), 1 unit in the last place is 2^-52):

>>> samples = [random.uniform(1.0, 2.0) for _ in range(10**6)]
>>> max(abs(x - 1.0 / (1.0 / x)) for x in samples)
2.220446049250313e-16
>>> 2**-52
2.220446049250313e-16

Here's an example with the IEEE 754 binary64 format showing that the restriction about avoiding underflow is necessary:

>>> x = 1.3e308
>>> x_roundtrip = 1.0 / (1.0 / x)
>>> x.hex()
'0x1.72409614c1e6ap+1023'
>>> x_roundtrip.hex()
'0x1.72409614c1e6cp+1023'

Here the x_roundtrip result differs from the original by two units in the last place, because 1 / x was smaller than the smallest normal representable float, and so was not represented to the same precision as x.

Final note: since IEEE 754-2008 covers decimal floating-point types as well, I should mention that the above proof carries over almost verbatim to the decimal case, establishing that for floats with significand smaller than sqrt(10), roundtrip occurs, while for general decimal floats (again avoiding overflow and underflow) we're never off by more than one unit in the last place. However, there's some number-theoretic finesse required to show that the key inequality |x - 1/y| < 1/2 10^(1-p) is always strict: one ends up having to show that the quantity 1 + 16 10^(2p-1) is never a square number (which is true, but it's probably outside the ambit of this site to include the proof here).

>>> from decimal import Decimal, getcontext
>>> import random
>>> getcontext().prec = 6
>>> samples = [+Decimal(random.uniform(1.0, 10.0)) for _ in range(10**6)]
>>> bad = [x for x in samples if 1 / (1 / x) != x]
>>> min(bad)
Decimal('3.16782')
Wetnurse answered 19/6, 2017 at 8:50 Comment(3)
Hmmm... I had to look up "ambit".Counterword
The violator with normalized significand closest to sqrt(2) that I found so far is 6369051721119404 (that is 1.4142135731630487 × 2^52) from my new question. Trying to understand your sqrt proof part now...Pediment
Actually I now think among floats, 1.4142135731630487 is the violator closest to sqrt(2). Brute force search.Pediment

© 2022 - 2024 — McMap. All rights reserved.