How does floating point error propagate when doing mathematical operations in C++?
Asked Answered
C

3

7

Let's say that we have declared the following variables

float a = 1.2291;

float b = 3.99;

float variables have precision 6, which (if I understand correctly) means that the difference between the number that the computer actually stores and the actual number that you want will be less than 10^-6

that means that both a and b have some error that is less than 10^-6

so inside the computer a could actually be 1.229100000012123 and b could be 3.9900000191919

now let's say that you have the following code

float c = 0;
for(int i = 0; i < 1000; i++)
      c += a + b;

my question is,

will c's final result have a precision error that is less than 10^-6 as well or not?

and if the answer is negative, how can we actually know this precision error and what exactly happens if you apply any kind of operations, as many times you wish and in any order?

Canyon answered 6/9, 2014 at 21:4 Comment(5)
Read this, it will answer all your questions and more: docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.htmlWraf
the article seems informative, I will study it because floating point errors have given me a headache many times, thanks!Canyon
Your definition of precision isn't correct. Precision of six decimal digits means that the number will be accurate to that many digits, no more, regardless of its magnitude.Mehala
and since floats in IEEE have a binary mantissa, you should think of the precision in binary as well (so for 32bit IEEE float it's 24bits of precision, that's 23bits from the mantissa and the implicit first 1)Homburg
You have too many misconceptions about floating-point numbers to correct them all in a single answer. First, floats are binary floating-point. Second, you can get a priori bounds on how wrong your calculation is by conducting a careful analysis of the arithmetic you're performing, and there's no easier way than that. In particular, you can't just assume that your result is within 1e-6 of what you want. Third, the implementation replaces your decimal constants with the closest number of the appropriate floating-point type at compilation time.Quicken
B
6

float variables have precision 6, which (if I understand correctly) means that the difference between the number that the computer actually stores and the actual number that you want will be less than 10^-6

that means that both a and b have some error that is less than 10^-6

The 10-6 figure is a rough measure of the relative accuracy when representing arbitrary constants as floats. Not all numbers will be represented with an absolute error of 10-6. The number 8765432.1, for instance, can be expected to be represented approximately to the unit. If you are at least a little bit lucky, you will get 8765432 when representing it as a float. On the other hand, 1E-15f can be expected to be represented with an absolute error of at most about 10-21.

so inside the computer a could actually be 1.229100000012123 and b could be 3.9900000191919

No, sorry, the way it works is not that you write the entire number and add six zeroes for the possible error. The error can be estimated by counting six zeroes from the leading digit, not from the last digit. Here, you could expect 1.22910012123 or 3.990000191919.

(Actually you would get exactly 1.2290999889373779296875 and 3.9900000095367431640625. Don't forget that representation error can be negative as well as positive, as it is for the first number.)

now let's say that you have the following code […]

my question is,

will c's final result have a precision error that is less than 10^-6 as well or not?

No. The total absolute error will be the sum of all the representation errors for a and b for each of the thousand times you used them, plus the errors of the 2000 additions you did. That's 4000 different sources of error! Many of them will be identical, some of them will happen to compensate each other, but the end result will probably not be to 10-6 relative accuracy, more like 10-5 relative accuracy (suggestion done without counting).

Belisle answered 6/9, 2014 at 23:38 Comment(0)
E
2

This is a very good question and one that's been addressed for decades by many authorities and is a computer science discipline (for example) in itself. From What Every Computer Scientist Should Know About Floating-Point Arithmetic:

Floating-point arithmetic is considered an esoteric subject by many people. This is rather surprising because floating-point is ubiquitous in computer systems. Almost every language has a floating-point datatype; computers from PCs to supercomputers have floating-point accelerators; most compilers will be called upon to compile floating-point algorithms from time to time; and virtually every operating system must respond to floating-point exceptions such as overflow. This paper presents a tutorial on those aspects of floating-point that have a direct impact on designers of computer systems. It begins with background on floating-point representation and rounding error, continues with a discussion of the IEEE floating-point standard, and concludes with numerous examples of how computer builders can better support floating-point.

(Emphasis mine)

Enure answered 6/9, 2014 at 21:25 Comment(2)
Very little of “What Every Computer Scientist Should Know About Floating-Point Arithmetic” is about numerical analysis, which is as you rightly point out a discipline in itself, and predates any computer floating-point standard by a good 3000 years, if Wikipedia is to be believed: en.wikipedia.org/wiki/Numerical_analysisBelisle
The other reference you provide, csee.wvu.edu/swarch/SARATool/docs/… , is about the propagation of failures in distributed systems. No relationship to the subject at hand, except is has the words “error” and “propagation” in the title.Belisle
E
-2

The short answer is that you cannot easily determine the precision of a long chain of floating point operations.

The precision of an operation like "c += a + b" depends not only on the raw precision of the floating point implementation (which these days almost always is IEEE), but also on the actual values of a,b and c.

Further to that the compiler may chose to optimize the code in different ways which can result in unexpected issues, like transforming it to "c+=a; c+=b;" or simply do the loop as "tmp = a*1000; tmp += b*1000; c += tmp;" or some other variant which the compiler would determine resulting in faster execution time but the same result.

Bottom line is that analysis of precision is not possible by inspecting source code alone.

For that reason many simply just uses a higher precision implementation like double or long-double and then checks that precision issues are gone for all practical purposes.

If that does not suffice, then a fallback is always to implement all logic in integers and avoid floats.

Eleonoreeleoptene answered 6/9, 2014 at 21:35 Comment(3)
transformations like the ones you describe (a + (b + c) -> (a + b) + c) are only allowed if you compile with fast float optimisation, i.e. the compiler is not allowed to screw with the order of float operations when you compile in normal floating point mode (this is true for GCC and for MSVC)Homburg
Your "fallback" option almost always makes everything worse.Quicken
In very specific contexts, fixed-point arithmetic can be the right way to go. Money is a decent example - you have an indivisible 'atom', let's say pennies, that are less than the standard unit but not by powers of 2. Quickly in Bash: echo $((0.10)) --> 0.10000000000000001. Fixed point solves this handily - deal only in pennies. In general, you're competing with IEEE and all its edge cases just to achieve the same error. (FYI, same method: $((0.1+$((0.2-0.2)))) --> 0.10...1, $(($((0.1+0.2))-0.2)) --> 0.10...3.) SO regarding fast floatsOvate

© 2022 - 2024 — McMap. All rights reserved.