Are IEEE float and double guaranteed to be the same size on any OS?

Asked 11/6, 2014 at 7:32 Answered 10/5, 2016 at 6:34

Solved c++floating-point posix portability

I'm working on a OS portable database system. I want our database files to be OS portable so that customers can move their database files to other kinds of OS's at their discretion. Because of this use case I need my data types to be consistent across OS's, and I'm wondering if IEEE float's and double's are guaranteed to be the same byte size on any OS?

Hercule answered 11/6, 2014 at 7:32 Comment(7)

As well as size you need to be aware of endianness too. – Excommunication 11/6, 2014 at 7:34

IEEE 754 data types are platform-agnostic by definition. But the C++ float and double types are not guaranteed to be IEEE 754 binary32 and binary64. I assume you're more interested in the latter? – Outcrop 11/6, 2014 at 7:37

You might want to add the case of CHAR_BIT != 8 to your question. Most answers here will probably tell you that float is guaranteed to be 32-bit long and double is guaranteed to be 64-bit long. But what if, for example, CHAR_BIT is defined as 16? – Husband 11/6, 2014 at 7:42

@PaulR Thanks, Paul, I'm handling that by swapping bytes inside the storage engine and always making sure that data is stored in a little endian byte order. – Hercule 11/6, 2014 at 7:52

This pertains more to your specific scenario, but you may be best off with storing your values using arbitrary precision floating point numbers, which aren't hard to implement yourself. Simply multiply the floating point number by a power of two such that it can fit as the biggest possible integer inside your allotted storage, then store the integer along with the power of two you multiplied it by. Doing it this way will guarantee that the file is readable by any architecture. I could write a simple example if you desire. – Rhetic 11/6, 2014 at 8:20

@Rhetic Thanks, I will have a look at that. – Hercule 11/6, 2014 at 8:22

I thought it would be a fun exercise to implement a basic version of what I talked about. A more robust implementation would permit for larger exponents and the use of long double, but that should be left to someone who has serious needs for such details. If all you need is to be able to store the entirety of a double though, this should be sufficient. pastebin.com/6UVTi55d – Rhetic 11/6, 2014 at 9:30

C++ says almost nothing about the representation of floating point types.

[basic.fundamental]/8 says (Emphasis mine):

There are three floating point types: float, double, and long double. The type double provides at least as much precision as float, and the type long double provides at least as much precision as double. The set of values of the type float is a subset of the set of values of the type double; the set of values of the type double is a subset of the set of values of the type long double. The value representation of floating-point types is implementation-defined. Integral and floating types are collectively called arithmetic types. Specializations of the standard template std::numeric_limits (18.3) shall specify the maximum and minimum values of each arithmetic type for an implementation.

If you just write C++ code using float, double and long double, you have virtually no guarantees, apart from those given in the documentation for your particular compiler, and those that can be implied from std::numeric_limits.

On the other hand, IEEE 754 provides exact definitions of the behaviour and binary representation of its floating point types. These definitions are not quite enough to guarantee identical behaviour on all IEEE 754 platforms, since (for example) IEEE 754 sometimes allows multiple operations to be folded together when the result would be more precise than performing the two operations separately. This is likely to be unimportant to your specific case, since you just want the files to be portable, and probably do not care quite as much about identical queries creating identical changes to the files on different platforms as you do about identical files being loaded in identical ways on different platforms.

So the question is: "how do I get a portable IEEE 754 implementation for C++?".

The answer to this question is somewhat tricky. Most C++ compilers for reasonable platforms will provide at least float and double that approximately match IEEE 754's binary32 and binary64 specifications (although you will need to read the documentation for each individual compiler to be sure).

Alternatively, you can use a software floating point implementation or wrapper such as FLIP, libgcc's soft-float, SoftFloat, or STREFLOP. These libraries sometimes still make assumptions about the implementation that are not completely portable according to the C++ standard, so use at your own risk.

Drenthe answered 11/6, 2014 at 7:59 Comment(3)

Does this mean that 1.1+0.9=28.7 is allowed by the standard? – Elixir 27/2, 2019 at 9:50

@jinawee: C++14, [expr.add]/3 states "The result of the binary + operator is the sum of the operands.". [numeric.limits] somewhat constrains the results, but as far as I can see there is nothing stopping 1.1+0.9=28.7, as long as appropriate values are listed for numeric_limits<double>::round_error() and numeric_limits<double>::round_style, beyond the fact that it would most likely be considered a very low quality implementation. – Drenthe 27/2, 2019 at 11:34

C++ does not constrain this behaviour, because it is expected that any reasonable implementation will also implement ISO/IEC 10967 (Language Independent Arithmetic) and/or IEEE 754 (IEEE Floating Point), or otherwise have some reasonable behaviour. – Drenthe 27/2, 2019 at 11:36

--cut-- Nevermind https://mcmap.net/q/25072/-are-ieee-float-and-double-guaranteed-to-be-the-same-size-on-any-os provides a better explanation for the float sizes.

If you're however thinking about storing these floats in binary data files, do make sure you don't mess up the byte order or endianness. If you're dumping raw floats, some systems store the bytes in a different order, so casting the 4 bytes you just read to a double might give some surprising results.

Dowdy answered 11/6, 2014 at 7:39 Comment(3)

Would you have a quote from the C or C++ standards to back the statement about the sizes? Or are you referring only to IEEE 754? If so, it might be worth clarifying. I suspect OP thinks the C and/or C++ standards mandate IEEE 754 floating point. – Judd 11/6, 2014 at 7:52

Thanks. I'm handling byte ordering issues by always storing data in little endian byte order. – Hercule 11/6, 2014 at 7:56

"If you're however thinking about storing these floats in binary data files," then you should probably write your own size- and endianness-independent class to ensure guaranteed representation, since C++ does not require these attributes to be at all portable for its built-in types, and so storing them as binary is usually just asking for trouble. – Stockbreeder 26/2, 2016 at 20:10

std::numeric_limits<T>::is_iec559

Determines if a given type follows IEC 559, which is another name for IEEE 754.

This serves as further evidence that IEEE is optional, and offers a way for you to check if it is used or not.

C++11 N3337 standard draft 18.3.2.4 numeric_limits members:

static constexpr bool is_iec559;

56 True if and only if the type adheres to IEC 559 standard. (217)

57 Meaningful for all floating point types.

(217) International Electrotechnical Commission standard 559 is the same as IEEE 754.

Sample code:

#include <iostream>
#include <limits>

int main() {
    std::cout << std::numeric_limits<float>::is_iec559 << std::endl;
    std::cout << std::numeric_limits<double>::is_iec559 << std::endl;
    std::cout << std::numeric_limits<long double>::is_iec559 << std::endl;
}

Outputs:

1
1
1

on Ubuntu 16.04 x86-64.

__STDC_IEC_559__ is an analogous macro for C: https://mcmap.net/q/25070/-is-it-safe-to-assume-floating-point-is-represented-using-ieee754-floats-in-c

Rationale

This is an interesting article that describes the rationale behind not fixing sizes, and hot to get around it: http://yosefk.com/blog/consistency-how-to-defeat-the-purpose-of-ieee-floating-point.html

Southsouthwest answered 10/5, 2016 at 6:34 Comment(0)

They are. "float" will be 32 bits, "double" will be 64 bits. The byte ordering might be different; it's exactly the same as with 32 bit and 64 bit integers.

If you need extended precision: That may or may not be available as "long double". And extended precision uses 80 bits, but "long double" may have additional padding bits.

Cabezon answered 11/6, 2014 at 7:37 Comment(3)

Nite that there also exists quad precision (128 bits) which may be used instead of x86's 80-bit precision on some platforms. – Whinchat 11/6, 2014 at 7:44

I don't think these sizes are fixed in the C or C++ standards. But do you have a quote to back this answer up? – Judd 11/6, 2014 at 7:51

I guess you are referring to IEEE 754, not the C or C++ standards. Sorry for the confusion. – Judd 11/6, 2014 at 7:57

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

C++ says almost nothing about the representation of floating point types.

So the question is: "how do I get a portable IEEE 754 implementation for C++?".

Recommended topics

Hot tags