I'm using the Zarith library to do arbitrary-precision rational arithmetic. Suppose I have a rational number q
of type Q.t
that is the ratio of two big integers (Q
is Zarith's arbitrary-precision rational number module). Sometimes, I would like to print this number as a floating point number for readability, and sometimes I need to convert this number floating-point for later non-arbitrary-precision calculations. Is there a way to convert q
to a floating-point number up to a certain level of precision?
The way I am converting q
to floating-point now has no guarantees, and can create undefined floating-point numbers (Z
is the arbitrary-precision integer module):
let to_float q =
let n, d = num q, den q in
(* check if d is zero and raise an error if it is *)
let nf, df = Z.to_float n, Z.to_float d in
nf /. df
Is there a better way to handle this, where I can obtain a floating-point that most accurately approximates any q
?
Edit
I quickly wrote up Mark Dickinson's answer in OCaml, if anyone is interested. It can probably (definitely) be improved and cleaned up. I'll edit if I do so or if anyone has any advice for improvements. But for now this has solved my problem!
let to_float q =
let n, d = num q, den q in
let n_sign = Z.sign n in
let d_sign = Z.sign d in (* always >= 0 *)
if d_sign = 0 then raise Division_by_zero;
let n = Z.abs n in
if n_sign = 0 then 0. else
let shift = (Z.numbits n) - (Z.numbits d) - 55 in
let is_subnormal = shift < -1076 in
let shift = if is_subnormal then -1076 else shift in
let d = if shift >= 0 then Z.shift_left d shift else d in
let n = if shift < 0 then Z.shift_left n (-shift)
else n in
let quotient, remainder = Z.div_rem n d in
let quotient = if (Z.compare remainder (Z.zero)) = 0 && Z.is_even quotient then
Z.add Z.one quotient else quotient in
let quotient = if not is_subnormal then quotient else
let round_select = Z.to_int @@ Z.rem quotient @@ Z.of_int 8 in
Z.add quotient [|Z.zero;Z.minus_one;Z.of_int (-2);Z.one;Z.zero
;Z.minus_one;Z.of_int 2;Z.one|].(round_select)
in
let unsigned_res = ldexp (Z.to_float quotient) shift in
if n_sign = 1 then unsigned_res else -.unsigned_res
I'll look into writing an interface for GMP's mpq_get_d
function later, but I'm not entirely sure how to do that. The only way I see how to do that is to convert q : Q.t
to a string and pass that to:
int mpq_set_str (mpq_t rop, const char *str, int base)
Does anyone know how to then pass rop
to mpq_get_d
in OCaml or have a reference that describes how to do this? I looked through chapter 19 of RWO and didn't see a situation like this.
q
. This would also have the advantage of still giving good results when the numerator and denominator are outside the usual range of float, but the value ofq
is within range. Your current approach could have an error of up to 2 ulps, which may not matter for your applications. – Renitarenitentq
is within the usual floating-point range. It will just beundef
in this case. – Pali