float to double assignment

Asked 11/11, 2010 at 12:36 Answered 11/11, 2010 at 21:53

Solved java floating-accuracy approximation double-precision

Consider the following code snippet

float num = 281.583f;
int amount = (int) Math.round(num*100f);
float rounded = amount/100.0f;
double dblPrecision = rounded;
double dblPrecision2 = num;
System.out.println("num : " + num + " amount: " + amount + " rounded: " + rounded + " dbl: " + dblPrecision + " dbl2: " + dblPrecision2);

The output I get is

num : 281.583 amount: 28158 rounded: 281.58 dbl: 281.5799865722656 dbl2: 281.5830078125

Why is there the approximation when a float number is assigned to a double variable?

Mooneye answered 11/11, 2010 at 12:36 Comment(0)

Approximation actually takes place when you convert decimal fraction to float. I might surprise you, but 281.583 can't be represented exactly as floating point number in PC. it happens because floating point numbers are represented as sum of binary fractions in PC. 0.5, 0.25 and 0.125 can be converted precisely, but not 0.583.

Floats (and doubles) are represented as Σ( 1/2^i*Bi ), where Bi is i-th bit (0|1). 0.625 = 1/2 + 1/4 for example. The problem is that not all decimal fraction can be converted to finitie sum of binary fractions.

Here is how this number is converted (first line is columns definition).

i|  *2 and trim|    Bit value|  (2^-1)*bit
    0,583       
1   1,166   1   0,5
2   0,332   0   0
3   0,664   0   0
4   1,328   1   0,0625
5   0,656   0   0
6   1,312   1   0,015625
7   0,624   0   0
8   1,248   1   0,00390625
9   0,496   0   0
10  0,992   0   0
11  1,984   1   0,000488281
12  1,968   1   0,000244141
13  1,936   1   0,00012207
14  1,872   1   6,10352E-05
15  1,744   1   3,05176E-05
16  1,488   1   1,52588E-05
17  0,976   0   0
18  1,952   1   3,8147E-06
19  1,904   1   1,90735E-06
        SUM=    0,582998276

Emeryemesis answered 11/11, 2010 at 12:42 Comment(2)

Yes, this is an issue for all doubles and floats. It doesn't have anything to do with the assignment. – Thrust 11/11, 2010 at 12:48

@Erick Robertson that is what i am trying to explain. the loss of precision just showed up after assignment to double – Emeryemesis 11/11, 2010 at 12:52

Because floats are binary fractions and thus can only represent your decimal number approximately. The approximation happens when the literal 281.583f in the source code is parsed into an IEEE 754 float value.

With the floats themselves, this is glossed over because println prints

as many, but only as many, more digits as are needed to uniquely distinguish the argument value from adjacent values of type float.

In many cases, that means the decimal value of the literal will be printed. However, when you assign the value to a double, the "adjacent values of type double" are usually much, much closer than those of type float, so you get to see the true value of you approximated float.

For more details, read The Floating-Point Guide.

Revert answered 11/11, 2010 at 12:59 Comment(0)

The approximation is there the whole time. It just happens that the double gives enough extra bits that extra stuff shows up.

281.583, for example is in binary (to a lot of digits, but less than double precision): 100011001.1001_0101_0011_1111_0111_1100_1110_1101_1001...

Float allows about 23 bits while double allows about 52 bits. (Can't remember exactly) 100011001.1001_0101_0011_11 which is 281.582946777 in decimal.

As a reference, single precision stores to about 7 decimal digits and double precision to about 16 decimal digits. That includes all the numbers, so yours is only about 1 digit less than the precision of float.

Openeyed answered 11/11, 2010 at 12:58 Comment(0)

As I understand it your concern is, why this code...

float f = 281.583f;
System.out.println(f);
System.out.println((double) f);

...prints

281.583
281.5830078125

(hey, double provides more precision!)

Here is why...

Feed 438ccaa0 (the hex-format of the bits representing 281.583f, as given by Integer.toHexString(Float.floatToRawIntBits(281.583f))) into the form here. What you'll see is that the float is actually represented as 281.58301. (@Michael Borgwardt answers why it's not printed like that.)

So 281.583 is being printed for 281.58301 when represented as a float. But when you convert 281.58301 to a double, you can actually get closer to 281.58301 than 281.583!

Looking at the computations of the above webpage, you can get as close as 281.58300781250000 which is why you see the value 281.5830078125 being printed.

Josiejosler answered 11/11, 2010 at 13:44 Comment(4)

converting a float to a double does not "get you closer"; you get the same approximation (see my answer). – Favian 12/11, 2010 at 2:1

@Rick, read carefully. It gets you closer than the value printed for the float-representation. – Josiejosler 12/11, 2010 at 8:9

But you wrote "But when you convert 281.58301 to a double, you can actually get closer to 281.58301 than 281.583". Printing, not conversion, is the key. If you print each with "printf("%.17f")" instead, both give the same answer -- 281.58300781250000000 . – Favian 12/11, 2010 at 13:31

BTW, I was assuming you meant the float to double conversion in the question -- of 281.583. Were you making a different point, about assigning a different number, 281.58301, directly to a double? If so, that gets you 281.58301000000000158252078108489513397216796875. – Favian 12/11, 2010 at 13:52

In short, don't use float unless you really have to. You will lose precision and are likely to save very little. Use double and you will save yourself alot of grief.

double num = 281.583;
long amount = (long) (num*100);
double rounded = (double) amount/100;
double dblPrecision = rounded;
double dblPrecision2 = num;

prints

num : 281.583 amount: 28158 rounded: 281.58 dbl: 281.58 dbl2: 281.583

Turtledove answered 11/11, 2010 at 17:31 Comment(0)

The floats and doubles actually have the same value internally; they are just printed differently. Add these lines to your program to view them in hex:

System.out.printf("num:           %a\n",num);
System.out.printf("dblPrecision2: %a\n",dblPrecision2);

System.out.printf("rounded:       %a\n",rounded);
System.out.printf("dblPrecision:  %a\n",dblPrecision);

This prints

num:           0x1.19954p8
dblPrecision2: 0x1.19954p8
rounded:       0x1.19947ap8
dblPrecision:  0x1.19947ap8

num = dblPrecision2 and rounded = dblPrecision.

Now 0x1.19954p8 = 100011001.100101010100 = 281.5830078125, and 0x1.19947ap8 = 100011001.1001010001111010 = 281.579986572265625. All that is happening is that they are getting rounded differently upon printing (floats are rounded to a smaller number of digits than doubles).

Favian answered 11/11, 2010 at 21:53 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags