Is it 52 or 53 bits of floating point precision?
Asked Answered
A

3

3

I keep on seeing this nonsense about 53 bits of precision in 64-bit IEEE floating point representation. Would someone please explain to me how in the world a bit that is stuck with a 1 in it contributes ANYTHING to the numeric precision? If you had a floating point unit with bit0 stuck-on with 1, you would of course know that it produces 1 less bit of precision than normally. Where are those sensibilities on this?

Further, just the exponent, the scaling factor without the mantissa, completely specifies exactly where the leading bit of the number is, so no leading bit is ever used. The 53th bit is about as real as the 19th hole. It is merely a (useful) crutch to aid the human mind and the logic for accessing such values in binary. To claim otherwise is double counting.

Either all the books and articles claiming this 53rd bit nonsense are wrong, or I am an idiot. But a stuck bit is a stuck bit. Let's hear the arguments to the contrary.

Athletic answered 23/8, 2013 at 18:27 Comment(4)
Based on your discussion with @Eric Postpischil in the comments on his answer, it’s not so much that you’re wrong as it is that you think “bits of precision” means something rather different than what everyone else (including the authors of IEEE-754) think it means.Polydeuces
It is perfectly reasonable to ask how a bit that is always 1 can affect precision, and your comment doesn't illuminate anything on that score. My last comment to Erik shows that 53-bits of precision are indeed available, in spite of the always-on bit. Your down-grade of my question is out of line, please correct it.Athletic
I didn’t down vote your question; your quibble is with someone else. Users with sufficient rep to down vote are free to exercise that privilege, and they’re not required to explain themselves. Similarly, one is free to comment without voting.Polydeuces
Please accept my sincerest apologies, I made an unwarranted assumption... On the topic at hand, since the difference between 15.65 and 15.95 digits can be crucial to some numerical applications, finding out the truth sometimes means calling out accepted doctrine and following up on it.Athletic
E
19

The mathematical significand1 of an IEEE-754 64-bit binary floating-point object has 53 bits. It is encoded with the combination of a 52-bit field exclusively for the significand and some information from the exponent field that indicates whether the 53rd bit is 0 or 1.

Since the main significand field is 52 bits, some people refer to the significand as 52 bits, but this is sloppy terminology. The significand field does not contain all the information about the significand, and the complete significand is 53 bits.

It is not true that the leading bit of the significand is never used (as anything other than 1). When the encoding of the exponent is zero, the leading bit of the significand is 0 instead of the more frequent 1.


1 “Significand” is the preferred term, not “mantissa.” A significand is linear, a mantissa is logarithmic.

Ermin answered 23/8, 2013 at 18:36 Comment(10)
Floating point can represent only 2^52 different mantissas. There is no encoding of the 53rd bit, it is implied for all values without underflow except for the single value of 0.0. This can be seen in any diagram of the representation. 53 bits minus one stuck bit = 52 bits of actual precision.Athletic
@ChrisCochran: Per IEEE 754-2008 clause 3.4, if E (the encoding of the exponent) is zero and T (the trailing significand field) is not zero, the value of the floating-point number is (–1)**s•2**emin•(0+2**(1-p)•T), where s is the sign bit, emin is the minimum exponent (–1022 for 64-bit binary), and p is the precision in bits (53 for 64-bit binary, per table 3.5). Thus, there are numbers other than zero for which the leading bit is zero (the leading bit is the “0” in the above expression).Ermin
@ChrisCochran: Regarding “all values without underflow”. Values do not have underflow; underflow is something that happens during an operation, when a tiny non-zero result is detected according to certain rules. You might mean “subnormal numbers”. However, subnormal numbers are numbers and are represented in floating-point.Ermin
Additionally, consider that the difference between adjacent representable values in 64-bit floating-point is the same as the difference between adjacent representable values in 53-bit integers, when the floating-point values are in the same range. Given 252 in 64-bit floating-point, the next representable value is 252+1. Given 252 in a 53-bit integer, the next representable value is 252+1. Thus, when the leading bit is in the same place in floating-point as it is in the most significant bit of a 53-bit integer, floating-point has the same precision as 53-bit integers.Ermin
Yes, but all such values are underflow values, having leading zeros and an explicit leading bit, instead of an implied 53rd bit, so no 53rd bit exists there either. The normal situation has only 52 bits of freedom, underflow values don't add to that.Athletic
@Chris: Again, they are subnormal, not underflow. These terms have precise technical meanings, defined in IEEE 754-2008. In fact, the significand has 53 bits of freedom because, in the normal case, when the leading bit of the significand would be zero, you simply decrease the exponent, which has the effect of setting the formerly 1 bit to zero and giving you a new bit on the low end. You are arguing about words, but mathematically you can do everything with the floating-point values that you can do with 53 bits of precision. The mathematical effect is that of 53 bits of precision.Ermin
The mathematical effect is of interest, not the words or other hearsay, i.e. what precision is really available. The log10(2^52) is 15.65, and 10^15.65 = 4.47e+15. The log10(2^53) is 15.95, and 10^15.95 = 8.91e+15. Experimenting finds that 9.00e+15 +- 1.0 is exactly correct, but 9.10e+15 +- 1.0 is not exactly right, off by one. This experiment shows that 53 bits are in effect mathematically, despite only 52-bits in the mantissa. Otherwise we would see the cutoff point around 4.48e+15. So the answer is: the scaling factor itself contributes 1 extra bit beyond the 52-bit mantissa.Athletic
@ChrisCochran: That interpretation is incorrect; the extra bit really does come from the implicit bit, not from the scaling factor. Consider the Intel 80-bit format, which does not have an implicit bit; it has a 64-bit significand field, and numbers in that format have 64 bits of precision. Scaling alone does not magically grant an extra bit of precision; the implicit bit is necessary.Polydeuces
That seems to shed more light on this. The math above proves the effect of 53 bits is present in fact, and since the scaling factor multiplies the significand, the leading one must be included in it, implicit or not. No other interpretation appears possible. I am convinced, 53 bits it is. The notion of being an idiot remains open to conjecture however.Athletic
For example, all integers that fit int 53 bits plus sign bit can be represented exactly. There are 2^52 multiples of 2^-52 that are exact floating point numbers between 1 and 2, but there are another 2^52 multiples of 2^-52 from 0 to 1 as well.Indies
C
6

The key concept here is "normalization". In general scientific notation, every value has many representations. That makes arithmetic, especially comparisons, more difficult than necessary. The common solution is to require the most significant digit of the significand to be non-zero. For example, the first floating point system I worked with was base 16, and the leading digit of the significand was in the range 1 through F.

That has a special effect for binary floating point. The most significant bit of the significand is a non-zero bit. There is no point wasting one of the limited number of bits in the physical representation on a bit that is known to be non-zero.

Normal numbers in IEEE 754 64-bit binary have a 53 bit significand whose implicit leading bit is known to be 1, and with the remaining 52 bits stored in the physical representation.

There being no such thing as a free lunch, there is a cost to this. The cost is a limitation on how small a number can be stored with a given exponent. For most exponents that does not matter - the number just gets stored with a smaller exponent, and still with a leading one bit that does not need to be stored.

It would be a real limitation for zero exponent, because there is no smaller exponent to use. IEEE 754 binary floating point solves that by storing very small magnitude numbers, with a zero exponent, differently. They have at most 52 significant bits, all stored, with leading zeros permitted. That allows very small magnitude numbers to be represented as non-zero numbers, at a cost of reduced precision.

Infinities and NaNs are stored differently, with the all ones exponent.

Come answered 23/8, 2013 at 20:21 Comment(0)
N
0

It's not stuck. The exponent will move the "stuck" bit around so it's not trapped at a fixed position

In fact that hidden bit is always the most significant bit after normalization. As it's always set in a normalized value, there's no point saving it explicitly. Leaving it out increases the precision a little bit, which is a good thing

So instead of 1.xxx...xxx × 2exp we'll store 0.xxx...xxx × 2exp and set the integer part to 1 before operating on it. The hidden bit does affect the result's value instead of just lying there meaninglessly. It's the same as when we normalize a range of decimal values to [0, 1). In that case the integer part is always zero and can be omitted (in some cultures). The freed space can then be used for another digit of precision

Nonentity answered 18/4, 2015 at 7:17 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.