Convert mantissa and exponent into double
Asked Answered
L

2

8

In a very high performance app we find the the CPU can calculate long arithmetic significantly faster then with doubles. However, in our system it was determined that we never need more then 9 decimal places of precision. So we using longs for all floating point arithmetic with a 9 point precision understood.

However, in certain parts of the system it is more convenient due to readability to work with doubles. So we have to convert between the long value that assumes 9 decimal places into double.

We find the simply taking the long and dividing by 10 to the power of 9 or multiplying by 1 divided by 10 to the power of 9 gives imprecise representations in a double.

To solve that we using the Math.Round(value,9) to give the precise values.

However, Math.Round() is horrifically slow for performance.

So our idea at the moment is to directly convert the mantissa and exponent to the binary format of a double since--in that way, there will be zero need for rounding.

We have learned online how to examine bits of a double to get the mantissa and exponent but it's confusing to figure out how to reverse that to take a mantissa and exponent and fabricate a double by using the bits.

Any suggestions?

[Test]
public unsafe void ChangeBitsInDouble()
{
    var original = 1.0D;
    long bits;
    double* dptr = &original;
    //bits = *(long*) dptr;
    bits = BitConverter.DoubleToInt64Bits(original);
    var negative = (bits < 0);
    var exponent = (int) ((bits >> 52) & 0x7ffL);
    var mantissa = bits & 0xfffffffffffffL;
    if( exponent == 0)
    {
        exponent++;
    }
    else
    {
        mantissa = mantissa | (1L << 52);
    }
    exponent -= 1075;

    if( mantissa == 0)
    {
        return;
    }

    while ((mantissa & 1) == 0)
    {
        mantissa >>= 1;
        exponent++;
    }

    Console.WriteLine("Mantissa " + mantissa + ", exponent " + exponent);

}
Larentia answered 19/1, 2012 at 13:26 Comment(2)
Are you sure the value you have is exactly representable in a double?Renner
maybe this will help, I don't want to read it all just to help you :P steve.hollasch.net/cgindex/coding/ieeefloat.htmlEnki
D
1

You shouldn't use a scale factor of 10^9, you should use 2^30 instead.

Druce answered 19/1, 2012 at 13:43 Comment(4)
Thansk we were just realizing that code above isn't complete. And learning that double exponents as represented in bits uses a binary exponent rather than a decimal exponent. So your answer makes a WORLD of good sense.Larentia
This will make out long representation unreadable but will greatly improve performance and conversion back to double will be lightening fast. So it's an excellent tradeoff. Thanks!Larentia
Well it seems logical but in testing after dividing by the factor to convert back to double it comes out imprecise: var convert = 1<<30; double price = 45.454945768D; long result1 = (long) (price*convert); double result2 = ((double)result1)/convert; Assert.AreEqual(result2,price);Larentia
Ahh,, more testing... 2^30 makes long converted to double back to long produce the exact long value. However, double converted to long back to double doesn't match the original double. Is there more to your suggestion to cover that issue?Larentia
R
0

As you've already realised as per the other answer, doubles work by floating-point binary rather than floating-point decimal, and therefore the initial approach doesn't work.

It's also not clear if it could work with a deliberately simplified formula, because it's not clear what the maximum range you need is, so rounding becomes inevitable.

The problem of doing so quickly but precisely is well-studied and often supported by CPU instructions. Your only chance of beating the built-in conversions is either:

  1. You hit a mathematical breakthrough that's worthy of some serious papers being written about it.
  2. You exclude enough cases that won't occur in your own examples that while the built-ins are better generally yours is optimised for your own use.

Unless the range of values you use is very limited, the potential for short-cutting on conversion between double-precision IEEE 754 and long integer becomes smaller and smaller.

If you're at the point where you have to cover most of the cases IEEE 754 covers, or even a sizable proportion of them, then you'll end up making things slower.

I'd recommend either staying with what you have, moving the cases where double is more convenient to stick with long anyway despite the inconvenience, or if necessary using decimal. You can create a decimal from a long easily with:

private static decimal DivideByBillion (long l)
{
  if(l >= 0)
   return new decimal((int)(l & 0xFFFFFFFF), (int)(uint)(l >> 32), 0, false, 9);
  l = -l;
  return new decimal((int)(l & 0xFFFFFFFF), (int)(uint)(l >> 32), 0, true, 9);
}

Now, decimal is magnitudes slower to use in arithmetic than double (precisely because it implements an approach similar to yours in the opening question, but with a varying exponent and larger mantissa). But if you need just a convenient way to obtain a value for display or rendering to string, then hand-hacking the conversion to decimal has advantages over hand-hacking the conversion to double, so it could be worth looking at.

Rappel answered 19/1, 2012 at 15:27 Comment(1)
Performance is still very important in the code where doubles are are used. It's just that they are user plugins and users expect to user floating point numbers. Using decimal is certainly practical for users but disappointing for performance of any thing they do with them. Since all our input is from doubles (from string or doubles recording as binary) then I'm thinking of extracting the mantissa...doing math with that...then simply replacing the mantissa for the output. That will hopefully avoid rounding and lack of precision.Larentia

© 2022 - 2024 — McMap. All rights reserved.