Why does the bit-width of the mantissa of a floating point represent twice as many numbers compared to an int?
Asked Answered
G

1

6

I am told that a double in C++ has a mantissa that is capable of safely and accurately representing [-(253 − 1), 253 − 1] integers.

How is this possible when the mantissa is only 52 bits? Why is it that an int16 is only capable of having a range of [-32,768, +32,767], or [-215, 215-1], when the same thing could be used for int16 to allow twice as many representable numbers?

Gottwald answered 19/9, 2022 at 17:34 Comment(8)
One of the bits is explicitPadang
[-(2^{53} − 1), 2^{53} − 1] integers take 54 bits (53 bits + sign).Blither
@Padang That page didn't explain what is meant by an "explicit bit".Gottwald
"Why is it that an int16 is only capable of having a range of $[-32,768, +32,767]$ when the same thing could be used for int16 to allow twice as many representable numbers?" - Would it be fair to rephrase your question as "why do int types use a simpler binary format than float/double?"Rikki
@Gottwald From that link: The format is written with the significand having an implicit integer bit of value 1 (except for special data, see the exponent encoding below). With the 52 bits of the fraction (F) significand appearing in the memory format, the total precision is therefore 53 bits (approximately 16 decimal digits, 53 log10(2) ≈ 15.955). The bits are laid out as follows:Padang
That's what I got from reading thisGottwald
@273K Source for your claim? double (in most languages, using the standard IEEE representation, can safely store integers in -2^53 - 1 ... 2^53 - 1. Check the documentation of your favourite language or try it out yourself.Spall
@Gottwald "that is capable of safely and accurately representing [-(2^53 − 1), 2^53 − 1] integers." is more like "that is capable of safely and accurately representing [-(2^53), 2^53] integers. The -1 not needed.Legnica
S
7

The format of the double (64 bits) is as follows:

1 bit: sign
11 bits: exponent
52 bits: mantissa

We only need to look at the positive integers we can represent with the mantissa, since the sign bit takes care of the negative integers for us.

Naively, with 52 bits, we can store unsigned integers from 0 to 2^52 - 1. With a sign bit, this lets us store from -2^52 - 1 to 2^52 - 1.

However, we have a little trick we can use. We say that the first digit of our integer is always a 1, which gives us an extra bit to work with.

To see why this works, let's dig a little deeper.

Every positive integer will have at least one 1 in its binary representation. So, we shift the mantissa left or right until we get a 1 at the start, using the exponent. An example might help here:

9, represented as an unsigned int: 000...0001001 (dots representing more 0s).

Written another way: 1.001 * 2^3. (1.001 being in binary, not decimal.)

And, we'll agree to never use a 0 as the first bit. So even though we could write 9 as 0.1001 * 2^4 or 0.01001 * 2^5, we won't. We'll agree that when we write the numbers out in this format, we'll always make sure we use the exponent to "shift" bits over until we start with a 1.

So, the information we need to store to get 9 is as follows:

e: 3
i: 1.001

But if i always starts with 1, why bother writing it out every time? Let's just keep the following instead:

e: 3
i: 001

Using precisely this information, we can reconstruct the number as: 1.i * 2^e == 9.

When we get up to larger numbers, our "i" will be bigger, maybe up to 52 bits used, but we're actually storing 53 bits of information because of the leading 1 we always have.

Final Note: This is not quite what is stored in the exponent and mantissa of a double, I've simplified things to help explain, but hopefully this helps people understand where the missing bit is from. Also, this does not cover 0, which gets a special representation (the trick used above will not work for 0, since the usual representation of 0 doesn't have any 1s in it).

Spall answered 19/9, 2022 at 18:9 Comment(1)
Consider adding where value zero (either sign) comed in to complete the explanation.Brittan

© 2022 - 2024 — McMap. All rights reserved.