C fundamentals: double variable not equal to double expression?
Asked Answered
M

1

9

I am working with an array of doubles called indata (in the heap, allocated with malloc), and a local double called sum.

I wrote two different functions to compare values in indata, and obtained different results. Eventually I determined that the discrepancy was due to one function using an expression in a conditional test, and the other function using a local variable in the same conditional test. I expected these to be equivalent.

My function A uses:

    if (indata[i]+indata[j] > max) hi++;

and my function B uses:

    sum = indata[i]+indata[j];
    if (sum>max) hi++;

After going through the same data set and max, I end up with different values of hi depending on which function I use. I believe function B is correct, and function A is misleading. Similarly when I try the snippet below

    sum = indata[i]+indata[j];
    if ((indata[i]+indata[j]) != sum) etc.

that conditional will evaluate to true.

While I understand that floating point numbers do not necessarily provide an exact representation, why does that in-exact representation change when evaluated as an expression vs stored in a variable? Is recommended best practice to always evaluate a double expression like this prior to a conditional? Thanks!

Marhtamari answered 4/6, 2016 at 5:24 Comment(10)
It's basically because computers can't represent numbers with total precision. Read about floating point.Syllogism
@iharob He acknowledged that in his last paragraph. But it doesn't explain why it's different depending on whether you assign the result to a variable.Antidote
Is this compiled for x86 or x86-64?Astraphobia
@iharob I think the main question is why it changes "when evaluated as an expression vs stored in a variable"? It shouldn't be different when stored in a variable (assuming the variable is off same type). OP obviously did read about floating point representations.Forehanded
When the assignment takes place in B, the value is required to round to the nearest double value (typically 64 bits). In function A, the conditional expression may be evaluated using a higher precision (e.g. 80 bits).Turret
@user3386109: If the code is indeed compiled for x86 rather than x86-64 so that it uses x87 rather than SSE floating-point instructions, then that is indeed the most likely explanation.Astraphobia
for me, the biggest stopper to start answerting the question, is that all variables are undeclared, so I can only guess if they are all doubleWarden
You could force the compiler to use SSE math (-mfpmath=sse) on 32-bit code if you know it is supported by the target operating system and the processor in question, or to remove the extra roundingGd
See exploringbinary.com/when-doubles-dont-behave-like-doublesImbecility
Thanks for the link, which provided good information. Another good source for those interested is link. The article includes considerable detail and also two good discussions about an example very similar to mine.Marhtamari
A
11

I suspect you're using 32-bit x86, the only common architecture subject to excess precision. In C, expressions of type float and double are actually evaluated as float_t or double_t, whose relationships to float and double are reflected in the FLT_EVAL_METHOD macro. In the case of x86, both are defined as long double because the fpu is not actually capable of performing arithmetic at single or double precision. (It has mode bits intended to allow that, but the behavior is slightly wrong and thus can't be used.)

Assigning to an object of type float or double is one way to force rounding and get rid of the excess precision, but you can also just add a gratuitous cast to (double) if you prefer to leave it as an expression without assignments.

Note that forcing rounding to the desired precision is not equivalent to performing the arithmetic at the desired precision; instead of one rounding step (during the arithmetic) you now have two (during the arithmetic, and again to drop unwanted precision), and in cases where the first rounding gives you an exact-midpoint, the second rounding can go in the 'wrong' direction. This issue is generally called double rounding, and it makes excess precision significantly worse than nominal precision for certain types of calculations.

Alcuin answered 4/6, 2016 at 5:33 Comment(3)
Thanks for the explanation. I am running the code on an i7-3770 cpu running 64-bit Windows 7. However, my compiler is minGW, which is a 32 bit application. I will investigate compiler settings. I understand that there is a hardware level of precision higher than double, and will be more careful with using expressions vs variables. FYI, type casting the expression actually doesn't work in this case - same behavior as without it.Marhtamari
Try the cast with -std=c99 or -fexcess-precision=standard. In some non-standards-conforming modes GCC gets the behavior wrong.Alcuin
Thank you - either of those compiler command line options fixes the problem, so that this code behaves correctly: if ((double)(indata[i]+indata[j]) > max) hi++;Marhtamari

© 2022 - 2024 — McMap. All rights reserved.