I'll provide more-or-less real example of legitimate, meaningful and useful testing for float equality.
#include <stdio.h>
#include <math.h>
/* let's try to numerically solve a simple equation F(x)=0 */
double F(double x) {
return 2 * cos(x) - pow(1.2, x);
}
/* a well-known, simple & slow but extremely smart method to do this */
double bisection(double range_start, double range_end) {
double a = range_start;
double d = range_end - range_start;
int counter = 0;
while (a != a + d) // <-- WHOA!!
{
d /= 2.0;
if (F(a) * F(a + d) > 0) /* test for same sign */
a = a + d;
++counter;
}
printf("%d iterations done\n", counter);
return a;
}
int main() {
/* we must be sure that the root can be found in [0.0, 2.0] */
printf("F(0.0)=%.17f, F(2.0)=%.17f\n", F(0.0), F(2.0));
double x = bisection(0.0, 2.0);
printf("the root is near %.17f, F(%.17f)=%.17f\n", x, x, F(x));
}
I'd rather not explain the bisection method used itself, but emphasize on the stopping condition. It has exactly the discussed form: (a == a+d)
where both sides are floats: a
is our current approximation of the equation's root, and d
is our current precision. Given the precondition of the algorithm — that there must be a root between range_start
and range_end
— we guarantee on every iteration that the root stays between a
and a+d
while d
is halved every step, shrinking the bounds.
And then, after a number of iterations, d
becomes so small that during addition with a
it gets rounded to zero! That is, a+d
turns out to be closer to a
then to any other float; and so the FPU rounds it to the closest representable value: to a
itself. Calculation on a hypothetical machine can illustrate; let it have 4-digit decimal mantissa and some large exponent range. Then what result should the machine give to 2.131e+02 + 7.000e-3
? The exact answer is 213.107
, but our machine can't represent such number; it has to round it. And 213.107
is much closer to 213.1
than to 213.2
— so the rounded result becomes 2.131e+02
— the little summand vanished, rounded up to zero. Exactly the same is guaranteed to happen at some iteration of our algorithm — and at that point we can't continue anymore. We have found the root to maximum possible precision.
Addendum
No you can't just use "some small number" in the stopping condition. For any choice of the number, some inputs will deem your choice too large, causing loss of precision, and there will be inputs which will deem your choiсe too small, causing excess iterations or even entering infinite loop. Imagine that our F
can change — and suddenly the solutions can be both huge 1.0042e+50
and tiny 1.0098e-70
. Detailed discussion follows.
Calculus has no notion of a "small number": for any real number, you can find infinitely many even smaller ones. The problem is, among those "even smaller" ones might be a root of our equation. Even worse, some equations will have distinct roots (e.g. 2.51e-8
and 1.38e-8
) — both of which will get approximated by the same answer if our stopping condition looks like d < 1e-6
. Whichever "small number" you choose, many roots which would've been found correctly to the maximum precision with a == a+d
— will get spoiled by the "epsilon" being too large.
It's true however that floats' exponent has finite limited range, so one actually can find the smallest nonzero positive FP number; in IEEE 754 single precision, it's the 1e-45
denorm. But it's useless! while (d >= 1e-45) {…}
will loop forever with single-precision (positive nonzero) d
.
At the same time, any choice of the "small number" in d < eps
stopping condition will be too small for many equations. Where the root has high enough exponent, the result of subtraction of two neighboring mantissas will easily exceed our "epsilon". For example, 7.00023e+8 - 7.00022e+8 = 0.00001e+8 = 1.00000e+3 = 1000
— meaning that the smallest possible difference between numbers with exponent +8 and 6-digit mantissa is... 1000! It will never fit into, say, 1e-4
. For numbers with relatively high exponent we simply have not enough precision to ever see a difference of 1e-4
. This means eps = 1e-4
will be too small!
My implementation above took this last problem into account; you can see that d
is halved each step — instead of getting recalculated as difference of (possibly huge in exponent) a
and b
. For reals, it doesn't matter; for floats it does! The algorithm will get into infinite loops with (b-a) < eps
on equations with huge enough roots. The previous paragraph shows why. d < eps
won't get stuck, but even then — needless iterations will be performed during shrinking d
way down below the precision of a
— still showing the choice of eps
as too small. But a == a+d
will stop exactly at precision.
Thus as shown: any choice of eps
in while (d < eps) {…}
will be both too large and too small, if we allow F
to vary.
... This kind of reasoning may seem overly theoretical and needlessly deep, but it's to illustrate again the trickiness of floats. One should be aware of their finite precision when writing arithmetic operators around.
foo == bar
butbar != pi
:) – Christoffer