Converting an arbitrary-precision rational number (OCaml, zarith) to an approximate floating number
Asked Answered
P

2

9

I'm using the Zarith library to do arbitrary-precision rational arithmetic. Suppose I have a rational number q of type Q.t that is the ratio of two big integers (Q is Zarith's arbitrary-precision rational number module). Sometimes, I would like to print this number as a floating point number for readability, and sometimes I need to convert this number floating-point for later non-arbitrary-precision calculations. Is there a way to convert q to a floating-point number up to a certain level of precision?

The way I am converting q to floating-point now has no guarantees, and can create undefined floating-point numbers (Z is the arbitrary-precision integer module):

let to_float q =
  let n, d = num q, den q in
  (* check if d is zero and raise an error if it is *)
  let nf, df = Z.to_float n, Z.to_float d in
  nf /. df

Is there a better way to handle this, where I can obtain a floating-point that most accurately approximates any q?

Edit

I quickly wrote up Mark Dickinson's answer in OCaml, if anyone is interested. It can probably (definitely) be improved and cleaned up. I'll edit if I do so or if anyone has any advice for improvements. But for now this has solved my problem!

let to_float q = 
  let n, d = num q, den q in
  let n_sign = Z.sign n in
  let d_sign = Z.sign d in (* always >= 0 *)
  if d_sign = 0 then raise Division_by_zero;
  let n = Z.abs n in
  if n_sign = 0 then 0. else
    let shift = (Z.numbits n) - (Z.numbits d) - 55 in
    let is_subnormal = shift < -1076 in
    let shift = if is_subnormal then -1076 else shift in
    let d = if shift >= 0 then Z.shift_left d shift else d in
    let n = if shift < 0 then Z.shift_left n (-shift)
      else n in
    let quotient, remainder = Z.div_rem n d in
    let quotient = if (Z.compare remainder (Z.zero)) = 0 && Z.is_even quotient then
        Z.add Z.one quotient else quotient in
    let quotient = if not is_subnormal then quotient else
        let round_select = Z.to_int @@ Z.rem quotient @@ Z.of_int 8 in
        Z.add quotient [|Z.zero;Z.minus_one;Z.of_int (-2);Z.one;Z.zero
                        ;Z.minus_one;Z.of_int 2;Z.one|].(round_select)
    in
    let unsigned_res = ldexp (Z.to_float quotient) shift in                                                                                                             
    if n_sign = 1 then unsigned_res else -.unsigned_res

I'll look into writing an interface for GMP's mpq_get_d function later, but I'm not entirely sure how to do that. The only way I see how to do that is to convert q : Q.t to a string and pass that to:

int mpq_set_str (mpq_t rop, const char *str, int base)

Does anyone know how to then pass rop to mpq_get_d in OCaml or have a reference that describes how to do this? I looked through chapter 19 of RWO and didn't see a situation like this.

Pali answered 10/11, 2015 at 6:1 Comment(4)
Please define "better": what do you care about? Speed? Accuracy? Something else? If you're interested in getting accurate results at the expense of speed, then you can use integer arithmetic operations to find the closest floating-point number to the actual value of q. This would also have the advantage of still giving good results when the numerator and denominator are outside the usual range of float, but the value of q is within range. Your current approach could have an error of up to 2 ulps, which may not matter for your applications.Renitarenitent
@MarkDickinson: Accuracy is most important for me. What I am currently doing cannot handle big integers in the numerator and denominator, but where q is within the usual floating-point range. It will just be undef in this case.Pali
@Pali there is a wrapper around Zarith, and it seems they use the same algo. See github.com/janestreet/bignum/blob/master/src/bignum0.ml#L584Campstool
this has been implemented in the mean time: github.com/ocaml/Zarith/issues/7Gorge
R
5

If you have access to

  • an integer log2 operation, and
  • the ability to left shift an integer by a given number of bits

then it's relatively easy to roll your own correctly rounded conversion. In a nutshell, the method looks like this:

  1. Reduce to the case n > 0, d > 0; filter out obvious underflow/overflow
  2. Choose an integer shift so that 2^-shift*n/d lies between 2^54 and 2^56.
  3. Use integer arithmetic to compute x = 2^-shift*n/d, rounded to the nearest integer using the round-to-odd rounding method.
  4. Convert x to the nearest IEEE 754 double-precision value dx, with the usual round-ties-to-even rounding mode.
  5. Return ldexp(dx, shift).

I'm afraid I'm not fluent in OCaml, but the following Python code illustrates the idea for positive inputs. I leave it to you to make the obvious modifications for negative inputs and for division by zero. You might also want to make an early return for cases of extreme overflow and underflow: those are easy to detect by looking for extra large or small values of shift below.

from math import ldexp

def to_float(numerator, denominator):
    """
    Convert numerator / denominator to float, correctly rounded.

    For simplicity, assume both inputs are positive.
    """
    # Shift satisfies 2**54 < (numerator / denominator) / 2**shift < 2**56
    shift = numerator.bit_length() - denominator.bit_length() - 55

    # Divide the fraction by 2**shift.
    if shift >= 0:
        denominator <<= shift
    else:
        numerator <<= -shift

    # Convert to the nearest integer, using round-to-odd.
    q, r = divmod(numerator, denominator)
    if r != 0 and q % 2 == 0:
        q += 1

    # Now convert to the nearest float and shift back.
    return ldexp(float(q), shift)

Some notes:

  • the bit_length method on a positive integer n gives the number of bits required to represent n, or in other words 1 + floor(log2(n)).
  • divmod is a Python function that computes the quotient and remainder of an integer division at the same time.
  • the quantity q fits (easily) in a 64-bit integer
  • we're rounding twice: once when converting the shifted numerator / denominator to the nearest integer, and again when rounding that integer to a float. The first round uses the round-to-odd method; this ensures that the second round (implicit in the conversion from int to float) gives the same result as if we'd rounded the fraction directly to a float.
  • the above algorithm doesn't correctly handle fractions whose converted float value is subnormal: in that case, the ldexp operation introduces potentially a third rounding. It's possible to deal with this, with some care. See below for some code.

The above is in fact a simplified version of the algorithm that Python uses when dividing one (big) integer by another to get a floating-point result. You can see the source here. The comment at the beginning of the long_true_divide function gives an overview of the method.

For completeness, here's a variant that also deals correctly with subnormal results.

def to_float(numerator, denominator):
    """
    Convert numerator / denominator to float, correctly rounded.

    For simplicity, assume both inputs are positive.
    """
    # Choose shift so that 2**54 < numerator / denominator / 2**shift < 2**56
    shift = numerator.bit_length() - denominator.bit_length() - 55

    # The 'treat_as_subnormal' flag catches all cases of subnormal results,
    # along with some cases where the result is not subnormal but *is* still
    # smaller than 2**-1021. In all these cases, it's sufficient to find the
    # closest integer multiple of 2**-1074. We first round to the nearest
    # multiple of 2**-1076 using round-to-odd.
    treat_as_subnormal = shift < -1076
    if treat_as_subnormal:
        shift = -1076

    # Divide the fraction by 2**shift.
    if shift >= 0:
        denominator <<= shift
    else:
        numerator <<= -shift

    # Convert to the nearest integer, using round-to-odd.
    q, r = divmod(numerator, denominator)
    if r != 0 and q % 2 == 0:
        q += 1

    # Now convert to the nearest float and shift back.
    if treat_as_subnormal:
        # Round to the nearest multiple of 4, rounding ties to
        # the nearest multiple of 8. This avoids double rounding
        # from the ldexp call below.
        q += [0, -1, -2, 1, 0, -1, 2, 1][q%8]

    return ldexp(float(q), shift)
Renitarenitent answered 10/11, 2015 at 20:54 Comment(7)
I like the round-to-odd trick. But you are aware that ldexp might itself be inexact in case of underflow. So how do you avoid double rounding for fractions that would underflow?Cruzcruzado
@aka.nice: For results that are going to underflow, you simply want to round to the nearest integer multiple of 2**-1074, and a variant of the code above will do that. The notes in the Python code go into this in some detail.Renitarenitent
Nice! That sounds a little bit simpler than Squeak/Pharo Smalltalk version then, but also because divmod rounds quotient to nearest integer, in Smalltalk I only had the truncated quotient. I prefer to read the shorter version though smallissimo.blogspot.fr/2011/09/reviewing-fraction-asfloat.htmlCruzcruzado
Nice article! Thanks for that link. For completeness, I've added a variant that deals correctly with the subnormal case.Renitarenitent
And as a result of this question and answer, I realised that there's a bug in Python's implementation: bugs.python.org/issue25604Renitarenitent
This algorithm makes sense, but I feel it would be less tricky to just do the rounding with integer arithmetic, making sure the integer results ends up with exactly 53 bits, then converting to float. This way we wouldn't have to deal with double rounding at all (except for denormals?). Is there a reason why this doesn't work?Percival
@zale: Yes, that also works, and in fact that's what I did in the CPython source; it has the additional advantage that you don't have to worry about someone changing the FPU rounding mode. The code ends up a touch more complicated, though: you need to first figure out how many bits there are in q so you know how many to round away (should be either 2 or 3, but you have to figure out which), then do the rounding. I wanted to keep the answer in this question simple.Renitarenitent
Z
2

This isn't a complete answer, but in looking around I found that Zarith uses GMP internally. There is a GMP function named mpq_get_d that converts a rational to a double. If it's not available directly in Zarith, it should be possible (given some time) to add an interface for it.

Zooplasty answered 10/11, 2015 at 6:44 Comment(4)
It isn't yet available in Zarith, so it seems I will need to wait or try adding an interface to it myself. Is there no other alternative?Pali
I read the source of mpq_get_d and it looks pretty hairy. In particular, it uses other parts of GMP (not just simple operations). If I wanted to do conversions to double (which it seems like everybody would at some point :-), I'd probably (try to) hack a call to mpq_get_d into Zarith. I have no idea if somebody has already done this, or reimplemented it in OCaml (say).Zooplasty
Speaking as a Zarith contributor, adding bindings for mpq_get_d is clearly the right way to do this. Each Z.t in the definition of Q.t is either an unboxed integer or an allocated representation. The stub for mpq_get_d should create the arguments for mpq_get_d from these two Z.t and call it.Hedley
Thanks for pointing this out. @PascalCuoq: This does seem ideal, but it would take me some time to figure it out.Pali

© 2022 - 2024 — McMap. All rights reserved.