How can I convert between a double-double and a decimal string?
Asked Answered
S

1

3

One way of increasing precision beyond that of a double (e.g. if my application is doing something space-related that needs to represent accurate positions over distances of many light-years) is to use a double-double, a structure composed of two doubles which represents the sum of the two. Algorithms are known for the various arithmetic operations on such a structure, e.g. double-double + double-double, double × double-double, etc, e.g. as given in this paper.

(Note that this is not the same format as the IEEE 754-2008 binary128, a.k.a. quad-precision and conversion to/from double-double and binary128 is not guaranteed to round-trip.)

An obvious way to represent such a quantity as a string would then be to use strings representing each individual component of the double, e.g. "1.0+1.0e-200". My question is, is there a known way to convert to and from strings that represent the value as a single decimal? I.e. given the string "0.3" then provide the double-double closest to this representation, or go in the reverse direction. One naïve way would be to use successive multiplications/divisions by 10, but that is insufficient for doubles so I'm somewhat sceptical that they would work here.

Sham answered 21/1, 2020 at 21:54 Comment(5)
Java has a java.math.BigDecimal class (documentation here) which might be useful.Tisiphone
I found this question on stack overflow which, in the second paragraph, says to pick two constants A and B, and then to initialize your two doubles U and V as functions of the original number, and A and B.Actinia
Even if a large part of intermediate bits must be all 1s or all 0s, the range between most significant bit and least significant can be considerable. This range will drive the complexity of print to decimal.Caber
Thus, the decimal might require a considerable amount of digits. For this reason, i would stick to a pair of double print.Caber
why not use hex strings (in mantisa/exponent form)? there are no rounding errors on converting them between binary and string representation, they can be used for "fast" computation on strings directly (so you can use it as arbitrary size)... see How do I convert a very long binary number to decimal? ... only the decadic print will be "slow". However in most cases is enough to do this: Is it possible to make realistic n-body solar system simulation in matter of size and mass?Propitious
P
1

such technique as summing 2 floating point variables just effectively doubles the mantissa bitwidth so its enough to just store/load bigger mantissa.

Standard IEEE 754 double has 52+1 bit mantissa leading to

log10(2^53) = 15.95 = ~16 [dec digits]

so when you add 2 such variables then:

log10(2^(53+53)) = 31.9 = ~32 [dec digits]

so just store/load 32 digit mantissa to/from string. The exponent of the 2 variables will differ by +/- 53 so its enough to store just one of them.

To further improve performance and precision you can use hex strings. Its much faster and there is no rounding as you can directly convert between the mantissa bits and hex string characters.

any 4 bits form a single hexadecimal digit so

(53+53) / 4 = 26.5 = ~27 [hex digits]

As you can see its also more storage efficient the only problem is the exponent delimiter as hexa digits contain E so you need to distinct the digit and exponent separator by upper/lower casing or use different character or use just sign for example:

1.23456789ABCDEFe10  
1.23456789ABCDEFe+10
1.23456789ABCDEF|+10
1.23456789ABCDEF+10

I usually use the first version. Also you need to take in mind the exponent is bit shift of mantissa so resulting number is:

mantisa<<exponent = mantisa * (2^exponent)

Now during loading/storing from/to string you just load 53+53 bit integer number then separate it to 2 mantissas and reconstruct the floating point values at bit level ... Its important that your mantissas are aligned so exp1+53 = exp2 give or take 1 ...

All this can be done on integer arithmetics.

If your exponent is exp10 then you will inflict heavy rounding on the number during both storage and loading to/from string as your mantissa will usually missing many zero bits before or after the decimal point making transformation between decadic and binary/hexadecimal very hard and inaccurate (especially if you limit your computation just to 64/80/128/160 bits of mantissa).

Here an C++ example of just that (printing 32bit float in decadic on integer arithmetics only):

//---------------------------------------------------------------------------
AnsiString f32_prn(float fx)    // scientific format integers only
    {
    const int ms=10+5;  // mantisa digits
    const int es=2;     // exponent digits
    const int eb=100000;// 10^(es+3)
    const int sz=ms+es+5;

    char txt[sz],c;
    int i=0,i0,i1,m,n,exp,e2,e10;
    DWORD x,y,man;
    for (i0=0;i0<sz;i0++) txt[i0]=' ';
    // float -> DWORD
    x=((DWORD*)(&fx))[0];
    // sign
    if (x>=0x80000000){ txt[i]='-'; i++; x&=0x7FFFFFFF; }
     else             { txt[i]='+'; i++; }
    // exp
    exp=((x>>23)&255)-127;
    // man
    man=x&0x007FFFFF;
    if ((exp!=-127)&&(exp!=+128)) man|=0x00800000;  // not zero or denormalized or Inf/NaN
    // special cases
    if ((man==0)&&(exp==-127)){ txt[i]='0'; i++; txt[i]=0; return txt; }    // +/- zero
    if ((man==0)&&(exp==+128)){ txt[i]='I'; i++;
                                txt[i]='N'; i++;
                                txt[i]='F'; i++; txt[i]=0; return txt; }    // +/- Infinity
    if ((man!=0)&&(exp==+128)){ txt[i]='N'; i++;
                                txt[i]='A'; i++;
                                txt[i]='N'; i++; txt[i]=0; return txt; }    // +/- Not a number
    // align man,exp to 4bit
    e2=(1+(exp&3))&3;
    man<<=e2;
    exp-=e2+23; // exp of lsb of mantisa
    e10=0;      // decimal digits to add/remove
    m=0;        // mantisa digits
    n=ms;       // max mantisa digits
    // integer part
    if (exp>=-28)
        {
        x=man; y=0; e2=exp;
        // shift x to integer part <<
        if (x) for (;e2>0;)
            {
            while (x>0x0FFFFFFF){ y/=10; y+=((x%10)<<28)/10; x/=10; e10++; }
            e2-=4; x<<=4; y<<=4;
            x+=(y>>28)&15; y&=0x0FFFFFFF;
            }
        // shift x to integer part >>
        for (;e2<0;e2+=4) x>>=4;
        // no exponent?
        if ((e10>0)&&(e10<=es+3)) n++;  // no '.'
        // print
        for (i0=i;x;)
            {
            if (m<n){ txt[i]='0'+(x%10); i++; m++; if ((m==n)&&(x<eb)) m+=es+1; } else e10++;
            x/=10;
            }
        // reverse digits
        for (i1=i-1;i0<i1;i0++,i1--){ c=txt[i0]; txt[i0]=txt[i1]; txt[i1]=c; }
        }
    // fractional part
    if (exp<0)
        {
        x=man; y=0; e2=exp;
        // shift x to fractional part <<
        if (x) for (;e2<-28;)
            {
            while ((x<=0x19999999)&&(y<=0x19999999)){ y*=10; x*=10; x+=(y>>28)&15; y&=0x0FFFFFFF; e10--; }
            y>>=4; y&=0x00FFFFFF; y|=(x&15)<<24;
            x>>=4; x&=0x0FFFFFFF; e2+=4;
            }
        // shift x to fractional part <<
        for (;e2>-28;e2-=4) x<<=4;
        // print
        x&=0x0FFFFFFF;
        if ((m)&&(!e10)) n+=es+2;   // no exponent means more digits for mantisa
        if (x)
            {
            if (m){ txt[i]='.'; i++; }
            for (i0=i;x;)
                {
                y*=10; x*=10;
                x+=(y>>28)&15;
                if (m<n)
                    {
                    i0=((x>>28)&15);
                    if (!m)
                        {
                        if (i0)
                            {
                            txt[i]='0'+i0; i++; m++;
                            txt[i]='.';    i++;
                            }
                        e10--;
                        if (!e10) n+=es+2;  // no exponent means more digits for mantisa
                        }
                    else { txt[i]='0'+i0; i++; m++; }
                    } else break;
                y&=0x0FFFFFFF;
                x&=0x0FFFFFFF;
                }
            }
        }
    else{
        // no fractional part
        if ((e10>0)&&(e10<sz-i))
         for (;e10;e10--){ txt[i]='0'+i0; i++; m++; }
        }
    // exponent
    if (e10)
        {
        if (e10>0)  // move . after first digit
            {
            for (i0=i;i0>2;i0--) txt[i0]=txt[i0-1];
            txt[2]='.'; i++; e10+=i-3;
            }
        // sign
        txt[i]='E'; i++;
        if (e10<0.0){ txt[i]='-'; i++; e10=-e10; }
         else       { txt[i]='+'; i++; }
        // print
        for (i0=i;e10;){ txt[i]='0'+(e10%10); e10/=10; i++; }
        // reverse digits
        for (i1=i-1;i0<i1;i0++,i1--){ c=txt[i0]; txt[i0]=txt[i1]; txt[i1]=c; }
        }

    txt[i]=0;
    return txt;
    }
//---------------------------------------------------------------------------

Just change the AnsiString return type into any string type or char* you got at your disposal ...

As you can see its a lot of code with a lot of hacks and internally a lot more than 24bit of mantissa is used to lower the rounding errors inflicted by decadic exponent.

So I strongly advice to use binary exponent (exp2) and hexa digits for mantissa it will simplify your problem a lot and get rid of the rounding entirely. The only problem is when you want print or input decadic number in such case you have no choice but to round ... Luckily you can use hexa output and convert it to decadic on strings... Or construct the print from single variable prints ...

for more info see related QAs:

Propitious answered 22/1, 2020 at 14:6 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.