C - Serialization of the floating point numbers (floats, doubles)

Asked 23/11, 2009 at 21:32 Answered 27/8, 2016 at 5:44

Solved c floating-point floating-point-conversion

How to convert a floating point number into a sequence of bytes so that it can be persisted in a file? Such algorithm must be fast and highly portable. It must allow also the opposite operation, deserialization. It would be nice if only very tiny excess of bits per value (persistent space) is required.

Barrator answered 23/11, 2009 at 21:32 Comment(4)

Which systems do you want to be portable to? – Ruffian 23/11, 2009 at 21:55

it must be independent on the underlying architecture, e.g. it can be ARM-7, PowerPC, Microblaze, OpenRISC or just x86. – Barrator 23/11, 2009 at 22:5

Is this homework? From your comments it sure appears so. – Canst 23/11, 2009 at 22:9

Some questions are interesting even in case they happen to have emerged from homework. Simply banning all facts and topics on this site if they ever were the subject of anybody's homework would mean to erase half of it, I assume... – Oribelle 10/4, 2017 at 9:2

Assuming you're using mainstream compilers, floating point values in C and C++ obey the IEEE standard and when written in binary form to a file can be recovered in any other platform, provided that you write and read using the same byte endianess. So my suggestion is: pick an endianess of choice, and before writing or after reading, check if that endianess is the same as in the current platform; if not, just swap the bytes.

Gemagemara answered 23/11, 2009 at 22:10 Comment(4)

according to the C99 spec, annex F, conforming implementations should define __STDC_IEC_559__, which in principle could be used as a compile-time check, but is useless in practice as there are issues with gcc ( gcc.gnu.org/c99status.html , scroll down to 'Further Issues') – Dionisio 23/11, 2009 at 22:37

Compiler's don't necessarily dictate the IEEE floating point format. There are still computers which use other formats unfortunately (VAX/Alpha, IBM). But +1 ensuring you have the endianness right. – Tortoiseshell 26/11, 2009 at 2:26

Right, but they have to know the format used by the platform to support it in the RTL. Also, many platforms (these days especially embedded) don't have a math coprocessor, so they do dictate the format in the accompanying emulation lib. So I thought it'd be easier to refer to the compiler. – Gemagemara 26/11, 2009 at 23:24

Isn't the case to treat those platforms that don't support the IEEE standard as exceptions, and when the (rare) version for them is needed, just do the necessary conversions only there? Here's a good article about the differences: codeproject.com/KB/applications/libnumber.aspx – Gemagemara 26/11, 2009 at 23:26

You could always convert to IEEE-754 format in a fixed byte order (either little endian or big endian). For most machines, that would require either nothing at all or a simple byte swap to serialize and deserialize. A machine that doesn't support IEEE-754 natively will need a converter written, but doing that with ldexp and frexp (standard C library functions)and bit shuffling is not too tough.

Giffer answered 23/11, 2009 at 22:29 Comment(2)

The problem comes with FP standards that lack some of the "features" of IEEE. Namely the VAX and IBM floating point formats...You're in for a world of hurt w.r.t. corner cases. Thankfully, people have written excellent converters which handle these cases gracefully (I'm looking at you USGS! I owe you a beer). – Tortoiseshell 26/11, 2009 at 2:28

An ANSI compliant frexp function hides most of that for you. Of course, you may end up with cases where serialization and deserialization gives you a (close but) different value. – Giffer 30/11, 2009 at 18:35

This might give you a good start - it packs a floating point value into an int and long long pair, which you can then serialise in the usual way.

#define FRAC_MAX 9223372036854775807LL /* 2**63 - 1 */

struct dbl_packed
{
    int exp;
    long long frac;
};

void pack(double x, struct dbl_packed *r)
{
    double xf = fabs(frexp(x, &r->exp)) - 0.5;

    if (xf < 0.0)
    {
        r->frac = 0;
        return;
    }

    r->frac = 1 + (long long)(xf * 2.0 * (FRAC_MAX - 1));

    if (x < 0.0)
        r->frac = -r->frac;
}

double unpack(const struct dbl_packed *p)
{
    double xf, x;

    if (p->frac == 0)
        return 0.0;

    xf = ((double)(llabs(p->frac) - 1) / (FRAC_MAX - 1)) / 2.0;

    x = ldexp(xf + 0.5, p->exp);

    if (p->frac < 0)
        x = -x;

    return x;
}

Brahmanism answered 24/11, 2009 at 0:6 Comment(0)

What do you mean, "portable"?

For portability, remember to keep the numbers within the limits defined in the Standard: use a single number outside these limits, and there goes all portability down the drain.

double planck_time = 5.39124E-44; /* second */

5.2.4.2.2 Characteristics of floating types <float.h>

[...]
10   The values given in the following list shall be replaced by constant
     expressions with implementation-defined values [...]
11   The values given in the following list shall be replaced by constant
     expressions with implementation-defined values [...]
12   The values given in the following list shall be replaced by constant
     expressions with implementation-defined (positive) values [...]
[...]

Note the implementation-defined in all these clauses.

Artie answered 23/11, 2009 at 22:10 Comment(0)

Converting to an ascii representation would be the simplest, but if you need to deal with a colossal number of floats, then of course you should go binary. But this can be a tricky issue if you care about portability. Floating point numbers are represented differently in different machines.

If you don't want to use a canned library, then your float-binary serializer/deserializer will simply have to have "a contract" on where each bit lands and what it represents.

Here's a fun website to help with that: link.

Pollak answered 23/11, 2009 at 21:55 Comment(0)

sprintf, fprintf ? you don't get any more portable than that.

Flanker answered 23/11, 2009 at 21:34 Comment(7)

it is not effective solution, it requires much more persistent space than for the same numbers represented in RAM – Barrator 23/11, 2009 at 21:39

What's not effective about it; there are potentially serious complications with trying to save the floating point numbers directly, doing it as strings is pretty much standard operating procedure. – Madancy 23/11, 2009 at 21:42

I would prefer smth. like an excess bit indicating an endiannes, an excess bit-or-two indicating a number of bytes per floating number. maybe some excess bits to indicate a mantisse/exponent type (e.g. IEEE754-2008) – Barrator 23/11, 2009 at 21:50

Well, why don't you just do that then? – Rightward 23/11, 2009 at 21:57

It may require more space, but it's both human readable and machine readable, endian-agnostic, and theoretically limitless with regards to the precision required. – Transcend 24/11, 2009 at 11:22

More importantly @dreamlax, it is Floating Point Format agnostic. – Tortoiseshell 26/11, 2009 at 2:24

This looks good at first, but has an implication that can be serious (or not, depending on use): You cannot always store a float in decimal format. That means that storing in ascii decimal-format, no matter how many decimal digits you add, you cannot guarantee that the numbers read back will be the same as the numbers that were stored. – Metathesis 4/4, 2013 at 13:45

What level of portability do you require? If the file is to be read on a computer with the same OS that it was generated on, than you using a binary file and just saving and restoring the bit pattern should work. Otherwise as boytheo said, ASCII is your friend.

Canst answered 23/11, 2009 at 21:50 Comment(0)

This version has excess of only one byte per one floating point value to indicate the endianness. But I think, it is still not very portable however.

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <ctype.h>

#define LITEND      'L'
#define BIGEND      'B'

typedef short               INT16;
typedef int                 INT32;
typedef double              vec1_t;

 typedef struct {
    FILE            *fp;
} WFILE, RFILE;

#define w_byte(c, p)    putc((c), (p)->fp)
#define r_byte(p)       getc((p)->fp)

static void w_vec1(vec1_t v1_Val, WFILE *p)
{
    INT32   i;
    char    *pc_Val;

    pc_Val = (char *)&v1_Val;

    w_byte(LITEND, p);
    for (i = 0; i<sizeof(vec1_t); i++)
    {
        w_byte(pc_Val[i], p);
    }
}


static vec1_t r_vec1(RFILE *p)
{
    INT32   i;
    vec1_t  v1_Val;
    char    c_Type,
            *pc_Val;

    pc_Val = (char *)&v1_Val;

    c_Type = r_byte(p);
    if (c_Type==LITEND)
    {
        for (i = 0; i<sizeof(vec1_t); i++)
        {
            pc_Val[i] = r_byte(p);
        }
    }
    return v1_Val;
}

int main(void)
{
    WFILE   x_FileW,
            *px_FileW = &x_FileW;
    RFILE   x_FileR,
            *px_FileR = &x_FileR;

    vec1_t  v1_Val;
    INT32   l_Val;
    char    *pc_Val = (char *)&v1_Val;
    INT32   i;

    px_FileW->fp = fopen("test.bin", "w");
    v1_Val = 1234567890.0987654321;
    printf("v1_Val before write = %.20f \n", v1_Val);
    w_vec1(v1_Val, px_FileW);
    fclose(px_FileW->fp);

    px_FileR->fp = fopen("test.bin", "r");
    v1_Val = r_vec1(px_FileR);
    printf("v1_Val after read = %.20f \n", v1_Val);
    fclose(px_FileR->fp);
    return 0;
}

Barrator answered 24/11, 2009 at 10:50 Comment(1)

It is portable only to machines sharing the same floating point format. Having been down this road, I will give you the following advice: Standardize on Little Endian IEEE-754 and make everybody else convert to/from that if necessary. You will be MUCH happier in the end. You will have portability through a rigid standard. – Tortoiseshell 26/11, 2009 at 2:31

Here we go.

Portable IEEE 754 serialisation / deserialisation that should work regardless of the machine's internal floating point representation.

https://github.com/MalcolmMcLean/ieee754

/*
* read a double from a stream in ieee754 format regardless of host
*  encoding.
*  fp - the stream
*  bigendian - set to if big bytes first, clear for little bytes
*              first
*
*/
double freadieee754(FILE *fp, int bigendian)
{
    unsigned char buff[8];
    int i;
    double fnorm = 0.0;
    unsigned char temp;
    int sign;
    int exponent;
    double bitval;
    int maski, mask;
    int expbits = 11;
    int significandbits = 52;
    int shift;
    double answer;

    /* read the data */
    for (i = 0; i < 8; i++)
        buff[i] = fgetc(fp);
    /* just reverse if not big-endian*/
    if (!bigendian)
    {
        for (i = 0; i < 4; i++)
        {
            temp = buff[i];
            buff[i] = buff[8 - i - 1];
            buff[8 - i - 1] = temp;
        }
    }
    sign = buff[0] & 0x80 ? -1 : 1;
    /* exponet in raw format*/
    exponent = ((buff[0] & 0x7F) << 4) | ((buff[1] & 0xF0) >> 4);

    /* read inthe mantissa. Top bit is 0.5, the successive bits half*/
    bitval = 0.5;
    maski = 1;
    mask = 0x08;
    for (i = 0; i < significandbits; i++)
    {
        if (buff[maski] & mask)
            fnorm += bitval;

        bitval /= 2.0;
        mask >>= 1;
        if (mask == 0)
        {
            mask = 0x80;
            maski++;
        }
    }
    /* handle zero specially */
    if (exponent == 0 && fnorm == 0)
        return 0.0;

    shift = exponent - ((1 << (expbits - 1)) - 1); /* exponent = shift + bias */
    /* nans have exp 1024 and non-zero mantissa */
    if (shift == 1024 && fnorm != 0)
        return sqrt(-1.0);
    /*infinity*/
    if (shift == 1024 && fnorm == 0)
    {

#ifdef INFINITY
        return sign == 1 ? INFINITY : -INFINITY;
#endif
        return  (sign * 1.0) / 0.0;
    }
    if (shift > -1023)
    {
        answer = ldexp(fnorm + 1.0, shift);
        return answer * sign;
    }
    else
    {
        /* denormalised numbers */
        if (fnorm == 0.0)
            return 0.0;
        shift = -1022;
        while (fnorm < 1.0)
        {
            fnorm *= 2;
            shift--;
        }
        answer = ldexp(fnorm, shift);
        return answer * sign;
    }
}


/*
* write a double to a stream in ieee754 format regardless of host
*  encoding.
*  x - number to write
*  fp - the stream
*  bigendian - set to write big bytes first, elee write litle bytes
*              first
*  Returns: 0 or EOF on error
*  Notes: different NaN types and negative zero not preserved.
*         if the number is too big to represent it will become infinity
*         if it is too small to represent it will become zero.
*/
int fwriteieee754(double x, FILE *fp, int bigendian)
{
    int shift;
    unsigned long sign, exp, hibits, hilong, lowlong;
    double fnorm, significand;
    int expbits = 11;
    int significandbits = 52;

    /* zero (can't handle signed zero) */
    if (x == 0)
    {
        hilong = 0;
        lowlong = 0;
        goto writedata;
    }
    /* infinity */
    if (x > DBL_MAX)
    {
        hilong = 1024 + ((1 << (expbits - 1)) - 1);
        hilong <<= (31 - expbits);
        lowlong = 0;
        goto writedata;
    }
    /* -infinity */
    if (x < -DBL_MAX)
    {
        hilong = 1024 + ((1 << (expbits - 1)) - 1);
        hilong <<= (31 - expbits);
        hilong |= (1 << 31);
        lowlong = 0;
        goto writedata;
    }
    /* NaN - dodgy because many compilers optimise out this test, but
    *there is no portable isnan() */
    if (x != x)
    {
        hilong = 1024 + ((1 << (expbits - 1)) - 1);
        hilong <<= (31 - expbits);
        lowlong = 1234;
        goto writedata;
    }

    /* get the sign */
    if (x < 0) { sign = 1; fnorm = -x; }
    else { sign = 0; fnorm = x; }

    /* get the normalized form of f and track the exponent */
    shift = 0;
    while (fnorm >= 2.0) { fnorm /= 2.0; shift++; }
    while (fnorm < 1.0) { fnorm *= 2.0; shift--; }

    /* check for denormalized numbers */
    if (shift < -1022)
    {
        while (shift < -1022) { fnorm /= 2.0; shift++; }
        shift = -1023;
    }
    /* out of range. Set to infinity */
    else if (shift > 1023)
    {
        hilong = 1024 + ((1 << (expbits - 1)) - 1);
        hilong <<= (31 - expbits);
        hilong |= (sign << 31);
        lowlong = 0;
        goto writedata;
    }
    else
        fnorm = fnorm - 1.0; /* take the significant bit off mantissa */

    /* calculate the integer form of the significand */
    /* hold it in a  double for now */

    significand = fnorm * ((1LL << significandbits) + 0.5f);


    /* get the biased exponent */
    exp = shift + ((1 << (expbits - 1)) - 1); /* shift + bias */

    /* put the data into two longs (for convenience) */
    hibits = (long)(significand / 4294967296);
    hilong = (sign << 31) | (exp << (31 - expbits)) | hibits;
    x = significand - hibits * 4294967296;
    lowlong = (unsigned long)(significand - hibits * 4294967296);

writedata:
    /* write the bytes out to the stream */
    if (bigendian)
    {
        fputc((hilong >> 24) & 0xFF, fp);
        fputc((hilong >> 16) & 0xFF, fp);
        fputc((hilong >> 8) & 0xFF, fp);
        fputc(hilong & 0xFF, fp);

        fputc((lowlong >> 24) & 0xFF, fp);
        fputc((lowlong >> 16) & 0xFF, fp);
        fputc((lowlong >> 8) & 0xFF, fp);
        fputc(lowlong & 0xFF, fp);
    }
    else
    {
        fputc(lowlong & 0xFF, fp);
        fputc((lowlong >> 8) & 0xFF, fp);
        fputc((lowlong >> 16) & 0xFF, fp);
        fputc((lowlong >> 24) & 0xFF, fp);

        fputc(hilong & 0xFF, fp);
        fputc((hilong >> 8) & 0xFF, fp);
        fputc((hilong >> 16) & 0xFF, fp);
        fputc((hilong >> 24) & 0xFF, fp);
    }
    return ferror(fp);
}

Religieuse answered 27/8, 2016 at 5:44 Comment(1)

Addressed. Code now in. (Link also has single precision but it follows straightforwardsly). – Religieuse 27/8, 2016 at 12:13

-1

fwrite(), fread()? You will likely want binary, and you cannot pack the bytes any tighter unless you want to sacrifice precision which you would do in the program and then fwrite() fread() anyway; float a; double b; a=(float)b; fwrite(&a,1,sizeof(a),fp);

If you are carrying different floating point formats around they may not convert in a straight binary sense, so you may have to pick apart the bits and perform the math, this to the power that plus this, etc. IEEE 754 is a dreadful standard to use but widespread so it would minimize the effort.

Outstretched answered 26/11, 2009 at 2:12 Comment(2)

The question clearly asks about a portable method, which this is obviously not. – Emlynn 1/12, 2014 at 21:21

"floating point" is by definition not portable, there are numerous formats and the specific format was not specified. C isnt very portable either, the question was flawed at best. – Outstretched 1/12, 2014 at 21:55

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

5.2.4.2.2 Characteristics of floating types <float.h>

Recommended topics

Hot tags