A haskell floating point calculation anomaly?
Asked Answered
V

2

7

2022 Update: This bug was filed as a GHC ticket and is now fixed: https://gitlab.haskell.org/ghc/ghc/issues/17231 so this is no longer an issue.

Using ghci 8.6.5

I want to calculate the square root of an Integer input, then round it to the bottom and return an Integer.

square :: Integer -> Integer
square m = floor $ sqrt $ fromInteger m

It works. The problem is, for this specific big number as input:

4141414141414141*4141414141414141

I get a wrong result.

Putting my function aside, I test the case in ghci:

> sqrt $ fromInteger $ 4141414141414141*4141414141414141
4.1414141414141405e15

wrong... right?

BUT SIMPLY

> sqrt $ 4141414141414141*4141414141414141
4.141414141414141e15

which is more like what I expect from the calculation...

In my function I have to make some type conversion, and I reckon fromIntegral is the way to go. So, using that, my function gives a wrong result for the 4141...41 input.

I can't figure out what ghci does implicitly in terms of type conversion, right before running sqrt. Because ghci's conversion allows for a correct calculation.

Why I say this is an anomaly: the problem does not occur with other numbers like 5151515151515151 or 3131313131313131 or 4242424242424242 ...

Is this a Haskell bug?

Vaporimeter answered 20/9, 2019 at 21:13 Comment(10)
It looks like floating point overflow.Graupel
related: #588504Graupel
I don't think this is a duplicate. This seems like a clear bug: fromInteger $ 4141414141414141*4141414141414141 produces 1.7151311090705025e31, but 1.7151311090705027e31 is a valid Double and is closer to the correct integer 17151311090705026668707274767881. So fromInteger is to blame here, not rounding!Magnusson
For exact bignum square roots (i.e. staying within Integer and never passing through Double), you might like arithmoi.Magnusson
in ghci 7.8.3 I get same correct result for both calculations.Overrun
IMO, all languages should care of ISO/IEC 10967-2 (Language Independent Arithmetic (LIA), Part II, 1994) which convers float<->int conversions including case of unbounded integer. See open-std.org/JTC1/SC22/WG11/docs/n462.pdf §5.4.3. At least, level of conformance to 10967 should be reported.Presurmise
I filed a GHC bug here: gitlab.haskell.org/ghc/ghc/issues/17231York
Thank you all for the input. Learned a lot, I appreciate it!Vaporimeter
You can use (read . show) instead of fromInteger to circumvent the rounding issue. This is how dhall implements its Integer/toDouble operation.Wulfila
sjakobi, now that's a neat workaround. thank you.Vaporimeter
Y
8

TLDR

It comes down to how one converts an Integer value to a Double that is not exactly representable. Note that this can happen not just because Integer is too big (or too small), but Float and Double values by design "skip around" integral values as their magnitude gets larger. So, not every integral value in the range is exactly representable either. In this case, an implementation has to pick a value based on the rounding-mode. Unfortunately, there are multiple candidates; and what you are observing is that the candidate picked by Haskell gives you a worse numeric result.

Expected Result

Most languages, including Python, use what's known as "round-to-nearest-ties-to-even" rounding mechanism; which is the default IEEE754 rounding mode and is typically what you would get unless you explicitly set a rounding mode when issuing a floating-point related instruction in a compliant processor. Using Python as the "reference" here, we get:

>>> float(long(4141414141414141)*long(4141414141414141))
1.7151311090705027e+31

I haven't tried in other languages that support so called big-integers, but I'd expect most of them would give you this result.

How Haskell converts Integer to Double

Haskell, however, uses what's known as truncation, or round-towards-zero. So you get:

*Main> (fromIntegral $ 4141414141414141*4141414141414141) :: Double
1.7151311090705025e31

Turns out this is a "worse" approximation in this case (cf. to the Python produced value above), and you get the unexpected result in your original example.

The call to sqrt is really red-herring at this point.

Show me the code

It all originates from this code: (https://hackage.haskell.org/package/integer-gmp-1.0.2.0/docs/src/GHC.Integer.Type.html#doubleFromInteger)

doubleFromInteger :: Integer -> Double#
doubleFromInteger (S# m#) = int2Double# m#
doubleFromInteger (Jp# bn@(BN# bn#))
    = c_mpn_get_d bn# (sizeofBigNat# bn) 0#
doubleFromInteger (Jn# bn@(BN# bn#))
    = c_mpn_get_d bn# (negateInt# (sizeofBigNat# bn)) 0#

which in turn calls: (https://github.com/ghc/ghc/blob/master/libraries/integer-gmp/cbits/wrappers.c#L183-L190):

/* Convert bignum to a `double`, truncating if necessary
 * (i.e. rounding towards zero).
 *
 * sign of mp_size_t argument controls sign of converted double
 */
HsDouble
integer_gmp_mpn_get_d (const mp_limb_t sp[], const mp_size_t sn,
                       const HsInt exponent)
{
...

which purposefully says the conversion is done rounding-toward zero.

So, that explains the behavior you get.

Why does Haskell do this?

None of this explains why Haskell uses round-towards-zero for integer-to-double conversion. I'd strongly argue that it should use the default rounding mode, i.e., round-nearest-ties-to-even. I can't find any mention whether this was a conscious choice, and it at least disagrees with what Python does. (Not that I'd consider Python the gold standard, but it does tend to get these things right.)

My best guess is it was just coded that way, without a conscious choice; but perhaps other people familiar with the history of numeric programming in Haskell can remember better.

What to do

Interestingly, I found the following discussion dating all the way back to 2008 as a Python bug: https://bugs.python.org/issue3166. Apparently, Python used to do the wrong thing here as well, but they fixed the behavior. It's hard to track the exact history, but it appears Haskell and Python both made the same mistake; Python recovered, but it went unnoticed in Haskell. If this was a conscious choice, I'd like to know why.

So, that's where it stands. I'd recommend opening a GHC ticket so it can be at least documented properly that this is the "chosen" behavior; or better, fix it so that it uses the default rounding mode instead.

Update:

GHC ticket opened: https://gitlab.haskell.org/ghc/ghc/issues/17231

2022 Update:

This is now fixed in GHC; at least as of GHC 9.2.2; but possibly earlier:

GHCi, version 9.2.2: https://www.haskell.org/ghc/  :? for help
Prelude> (fromIntegral $ 4141414141414141*4141414141414141) :: Double
1.7151311090705027e31
York answered 21/9, 2019 at 6:23 Comment(0)
M
6

Not all Integers are exactly representable as Doubles. For those that aren't, fromInteger is in the bad position of needing to make a choice: which Double should it return? I can't find anything in the Report which discusses what to do here, wow!

One obvious solution would be to return a Double that has no fractional part and which represents the integer with the smallest absolute difference from the original of any Double that exists. Unfortunately this appears not to be the decision made by GHC's fromInteger.

Instead, GHC's choice is to return the Double with the largest magnitude that does not exceed the magnitude of the original number. So:

> 17151311090705026844052714160127 :: Double
1.7151311090705025e31
> 17151311090705026844052714160128 :: Double
1.7151311090705027e31

(Don't be fooled by how short the displayed number is in the second one: the Double there is the exact representation of the integer on the line above it; the digits stop there because there are enough to uniquely identify a single Double.)

Why does this matter for you? Well, the true answer to 4141414141414141*4141414141414141 is:

> 4141414141414141*4141414141414141
17151311090705026668707274767881

If fromInteger converted this to the nearest Double, as in plan (1) above, it would choose 1.7151311090705027e31. But since it returns the largest Double less than the input as in plan (2) above, and 17151311090705026844052714160128 is technically bigger, it returns the less accurate representation 1.7151311090705025e31.

Meanwhile, 4141414141414141 itself is exactly representable as a Double, so if you first convert to Double, then square, you get Double's semantics of choosing the representation that is closest to the correct answer, hence plan (1) instead of plan (2).

This explains the discrepancy in sqrt's output: doing your computations in Integer first and getting an exact answer, then converting to Double at the last second, paradoxically is less accurate than converting to Double immediately and doing your computations with rounding the whole way, because of how fromInteger does its conversion! Ouch.

I suspect a patch to modify fromInteger to do something better would be looked on favorably by GHCHQ; in any case I know I would look favorably on it!

Magnusson answered 20/9, 2019 at 23:10 Comment(2)
Your analysis is supported by z3. Funny this never came up before!York
ghci 7.8.3. returns same correct result in both cases. (Windows7 64 bit)Overrun

© 2022 - 2024 — McMap. All rights reserved.