Best practices for float multiplication in C++ or C?
Asked Answered
V

3

6

I need to perform a simple multiplication of 400 * 256.3. The result is 102520. Straight forward and simple. But to implement this multiplication in C++ (or C) is a little tricky and confusing to me.

I understand floating point number is not represented as it is in computer. I wrote the code to illustrate the situation. Output is attached too.

So, if I do the multiplication using float type variable, I am subjected to rounding error. Using double type variable would have avoided the problem. But let's say I have a very limited resource on the embedded system and I have to optimize the variable type to the very best I could, how can I perform the multiplication using float type variable and not susceptible to rounding error?

I knew the floating point math done by computer is not broken at all. But I am curious for best practice to perform floating point math. 256.3 is just a value for illustration. I would not know what floating point value I will get during runtime. But it is for sure, a floating point value.

int main()
{
    //perform 400 * 256.3
    //result should be 102520

    float floatResult = 0.00f;
    int intResult = 0;
    double doubleResult = 0.00;

    //float = int * float
    floatResult = 400 * 256.3f;
    printf("400 * 256.3f = (float)->%f\n", floatResult);

    //float = float * float
    floatResult = 400.00f * 256.3f;
    printf("400.00f * 256.3f = (float)->%f\n", floatResult);

    printf("\n");

    //int = int * float
    intResult = 400 * 256.3f;
    printf("400 * 256.3f = (int)->%d\n", intResult);

    //int = float * float;
    intResult = 400.00f * 256.3f;
    printf("400.00f * 256.3f = (int)->%d\n", intResult);

    printf("\n");

    //double = double * double
    doubleResult = 400.00 * 256.3;
    printf("400.00 * 256.3 = (double)->%f\n", doubleResult);

    //int = double * double;
    intResult = 400.00 * 256.3;
    printf("400.00 * 256.3 = (int)->%d\n", intResult);

    printf("\n");

    //double = int * double
    doubleResult = 400 * 256.3;
    printf("400 * 256.3 = (double)->%f\n", doubleResult);

    //int = int * double
    intResult = 400 * 256.3;
    printf("400 * 256.3 = (int)->%d\n", intResult);

    printf("\n");

    //will double give me rounding error?
    if (((400.00 * 256.3) - 102520) != 0) {
        printf("Double give me rounding error!\n");
    }

    //will float give me rounding error?
    if (((400.00f * 256.3f) - 102520) != 0) {
        printf("Float give me rounding error!\n");
    }

    return 0;
}

Output from the code above

Vatic answered 14/11, 2016 at 9:39 Comment(10)
The answer is simple: You can't. The rounding problem is "built in" on computers using the IEEE floating point format, which is just about all.Burgee
Possible duplicate of Is floating point math broken?Pyoid
The value 256.3f is also lying to you and you cannot avoid that.Remy
You can use an alternate data type e.g. a fixed-precision type or a rational/fractions library.Darla
@Olaf I mean to say C++ or C. The slash is a symbol for "or". I know floating point implementation in computer and I know it is not broken. I am just curious if there is a way for me to use float variable type and not exposed to rounding error.Vatic
@Remy yeah, I understand that. If I could not avoid it, is there any best practice for working with floating point number that you will certainly recommend?Vatic
@LJSeng Nothing specific. Follow best practices.Remy
You might know what the decimal input/output means, but you apparently don't understand them. Otherwise you had not asked. Your question is definitively a dup. What you might notice if you you had read the linked question and the answer(s).If that is an MCU, you should avoid floating point at all.Pyoid
avoid float unless you know you really need it. Use double insteadIngate
Always amazing to see how programmers can be terrorized by floating-point truncation and "inexact" arithmetic, while not caring at all about the exactness of their input data. Why worry if 256.3 only has four exact digits, while a single-precision float supports 8 of them ?Diadromous
J
6

If you have a fixed number of decimal digits (1 in the case of 256.3) as well as a bounded range of the results, you can use integer multiplication, and adjust for the shift in decimal digits through integer division:

int result = (400 * 2563) / 10;

Rounding errors are inherent to floating point arithmetics, except for a few cases where all operands can be represented exactly. Whether you choose float or double just influences when the error occurs, not if.

Jejune answered 14/11, 2016 at 9:50 Comment(2)
Thank you very much for the suggestion! But what if I do not know the operand values until runtime?Vatic
What do you know about the float inputs? What is their domain? Why are rounding errors a problem?Jejune
F
5

First of all, understand that type double has all the same problems as type float. Neither type has infinite precision, so both types are susceptible to precision loss and other problems.

As to what you can do: there are many different problems that come up, depending on what you're doing, and many techniques to overcome them. Many, many words have been written on these techniques; I suggest doing a web search on "avoiding floating point error". But the basic points are:

  • Know that floating-point results are never exact
  • Don't try to compare floating-point numbers for exact equality
  • When comparing floating-point numbers for equality, use an appropriate "epsilon" range
  • After calculation, it is often appropriate to explicitly round the final value to the desired precision (especially when printing it out)
  • Beware of algorithms which cause the precision loss to increase with each step

See also https://www.eskimo.com/~scs/cclass/handouts/sciprog.html .

Forte answered 14/11, 2016 at 10:57 Comment(1)
Thank you very much for your suggestion! I appreciate! :)Vatic
S
2

A key weakness to displaying the problem is the conversion to int intResult. The posted problem is about multiplying and comparing, but code only shows issues surrounding int conversion.

If code needs to convert a FP value to the nearest whole number, uses rint(), round(), nearbyint() or lround(), not integer assignment.

Sendoff answered 14/11, 2016 at 19:1 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.