If the result of an individual floating-point addition, subtraction, multiply, or divide, is immediately stored to a float
, there will be no accuracy improvement using double
for intermediate values. In cases where operations are chained together, however, accuracy will often be improved by using a higher-precision intermediate type, provided that one is consistent in using them. In Turbo Pascal circa 1986 code like:
Function TriangleArea(A: Single, B:Single, C:Single): Single
Begin
Var S: Extended; (* S stands for Semi-perimeter *)
S := (A+B+C) * 0.5;
TriangleArea := Sqrt((S-A)*(S-B)*(S-C)*S)
End;
would extend all operands of floating-point operations to type Extended (80-bit float), and then convert them back to single- or double-precision when storing to variables of those types. Very nice semantics for numerical processing. Turbo C of that area behaved similarly, but rather unhelpfully failed to provide any numeric type capable of holding intermediate results; the failure of languages to provide a variable type which could hold intermediate results led to people's unfairly criticizing the concept of a higher-precision intermediate result type, when the real problem was that languages failed to support it properly.
Anyway, if one were to write the above method into a modern language like C#:
public static float triangleArea(float a, float b, float c)
{
double s = (a + b + c) * 0.5;
return (double)(Math.Sqrt((s - a) * (s - b) * (s - c) * s));
}
the code would work well if the compiler happens to promote the operands of the addition to double
before performing the computation, but that's something it may or may not do. If the compiler performs the calculation as float
, precision may be horrid. When using the above formula to compute the area of an isosceles triangle with long sides of 16777215 and a short side of 4, for example, eager promotion will yield a correct result of 3.355443E+7 while performing the math as float
will, depending upon the order of the operands, yield 5.033165E+7 [more than 50% too big] or 16777214.0 [more than 50% too small].
Note that even though code like the above will work perfectly on some environments, but yield completely bogus results on others, compilers will generally not give any warning about the situation.
Although individual operations on float
which are going to be immediately stored to float
can be done just as accurately with type float
as they could be with type double
, eagerly promoting operands will often help considerably when operations are combined. In some cases, rearranging operations may avoid problems caused by loss of promotion (e.g. the above formula uses five additions, four multiplications, and a square root; rewriting the formula as:
Math.Sqrt((a+b+c)*(b-a+c)*(a-b+c)*(a-c+b))*0.25
increases the number of additions to eight, but will work correctly even if they are performed at single precision.