Is hardcode float precise if it can be represented by binary format in IEEE 754?
Asked Answered
A

4

9

for example, 0 , 0.5, 0.15625 , 1 , 2 , 3... are values converted from IEEE 754. Are their hardcode version precise?

for example:

is

float a=0;
if(a==0){
    return true;
}

always return true? other example:

float a=0.5;
float b=0.25;
float c=0.125;

is a * b always equal to 0.125 and a * b==c always true? And one more example:

int a=123;
float b=0.5;

is a * b always be 61.5? or in general, is integer multiply by IEEE 754 binary float precise?

Or a more general question: if the value is hardcode and both the value and result can be represented by binary format in IEEE 754 (e.g.:0.5 - 0.125), is the value precise?

Acetate answered 11/11, 2015 at 7:44 Comment(16)
No. no, no, no and noDagenham
I don't think 'precise' is the correct term when referring to whether or not a floating point representation can exactly represent some number?Hawkweed
agreed to @Dagenham i am working on a financial institution and half my time spending on fixing archaic codes that relies on floating point calculation.Revisionism
Then you should be looking to use some decimal-radix type and throwing out the floating-point completely.Zoophilous
Yes, floating point is not some vague thing that gives approximate answers. The issues with floating point are about representation of decimal numbers and precision issues. There is nothing "fuzzy" about the actual operations themselves. (Just be really really sure that you are doing it right though, I'd be very cautious about it anyway as code that works today for this might not tomorrow when someone changes 0.25 in some code to 0.1 not realizing it matters)Marquettamarquette
@Dagenham How so? Why shouldn't 0.125 always be expressed as 2^-3? The hassle begins only as soon as you throw in a number which is not expressible exactly. And 5 * 0.1 might not be 0.5. But, as said, as long as you stay within the range of exactly expressible numbers, it should be safe. The problem is to recognize whan you leave this range - and thus, it is better to stay on the safe side.Porcupine
It's a bit complicated, but yes, some numbers, such as 0.5 and 0.25, and all small integers, can be exactly represented using binary floating-point numbers. It's just that not all of them can. For details, this text is usually recommended: docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.htmlBrookebrooker
@ThomasPadron-McCarthy And exactly these are what the question is about.Porcupine
Always helpful The Floating-Point Guide - What Every Programmer Should Know ...Frightened
@glglgl: Yes, I agree. And, just to be clear, this question is not a duplicate of the usual "why doesn't floating point work".Brookebrooker
a=123 and b=0.25 so "is a * b always 61.5?" No, because it's 30.75. And there's your answer. Human error guarantees that some value that you think should be exactly representable in IEEE-754 actually won't be. So you should never write code that requires values to be exactly representable in IEEE-754.Emblements
It is a limitation of the 23-bit mantissa, the reduces the pool of numbers that can be represented exactly in 32-bit floating point. (In other words, you don't have all 32-bits to describe all 32-bit numbers...)Frightened
Actually with every deploy some function get rid of floating points.Revisionism
@amuse As you can see, this is a topic where many people have a dangerously incomplete knowledge, but think they're experts (two deleted heavily downvoted answers already, and the comments everywhere...). I'm not saying someone here is right or wrong, just that out of many conflicting statements, only one can be true ... and it's the same over the whole internet. So, please, take everything with a grain of salt, don't just immediately believe what someone wrote.Winnick
@DavidC.Rankin - on the contrary, that oft-cited paper is almost never helpful...Electrotechnics
@Dagenham You are so wrong.Sinter
B
7

There is no inherent fuzzyness in floating-point numbers. It's just that some, but not all, real numbers can't be exactly represented.

Compare with a fixed-width decimal representation, let's say with three digits. The integer 1 can be represented, using 1.00, and 1/10 can be represented, using 0.10, but 1/3 can only be approximated, using 0.33.

If we instead use binary digits, the integer 1 would be represented as 1.00 (binary digits), 1/2 as 0.10, 1/4 as 0.01, but 1/3 can (again) only be approximated.

There are some things to remember, though:

  • It's not the same numbers as with decimal digits. 1/10 can be written exactly as 0.1 using decimal digits, but not using binary digits, no matter how many you use (short of infinity).
  • In practice, it is difficult to keep track of which numbers can be
    represented and which can't. 0.5 can, but 0.4 can't. So when you need exact numbers, such as (often) when working with money, you shouldn't use floating-point numbers.
  • According to some sources, some processors do strange things internally when performing floating-point calculations on numbers that can't be exactly represented, causing results to vary in a way that is, in practice, unpredictable.

(My opinion is that it's actually a reasonable first approximation to say that yes, floating-point numbers are inherently fuzzy, so unless you are sure your particular application can handle that, stay away from them.)

For more details than you probably need or want, read the famous What Every Computer Scientist Should Know About Floating-Point Arithmetic. Also, this somewhat more accessible website: The Floating-Point Guide.

Brookebrooker answered 11/11, 2015 at 8:12 Comment(12)
Indeed, it is a fixed-with representation, but not a decimal one, but a binary one.Porcupine
This is incorrect. Floating point math, even between only perfectly representable numbers, is not accurate and will not yield the exact same results on different processors. This phenomenon, btw, is why in game development in client/server real time syncing you need a bunch of resync packets and tolerances.Archipelago
@Archipelago Sorry, but you are just wrong. If all mentionned processors use IEEE 754 AND the numbers are finite in binary representation, there is just no reason to provide different results. As soon as rounding is involved, there MIGHT be some differences, but not earlier.Porcupine
@Archipelago What has net serialization to do with math? What has everything here to do with realtime? It sounds very much like you just don't want to be wrong. And we're talking about IEEE754 only here. While it has some things where one possibility out of multiple can be chosen, theres nothing undefined or random in it.Winnick
@Winnick as soon as you have a math operation where one even temporary result has a number not directly finitely represented in floating point math, it will be inaccurate. Thats almost unavoidable, given how multiplication and division are implemented. Its further completely unavoidable when you use something like powArchipelago
@Archipelago Let me ask again, what has fp math to do with fp serialization and realtime scenarios? If this ic clear, we can continue.Winnick
@Winnick op is asking about results of FP math. So the answer is either incorrect or not adressing the question. Bad either way.Archipelago
@Archipelago And your comment was about something completely unrelated. That's why I'm askingWinnick
@Winnick In game development (my example) fp math is heavily used to calc vectors and positions and coordinates. That gets inaccurate real quick and errors might compound, resulting in desync. Maybe it wasn't a good example, but the point stands.Archipelago
And about as soon as you have a math operation where one even temporary result has a number not directly finitely represented in floating point math, it will be inaccurate. Ok, but unrepresentable numbers are not what everyone here is talking about.Winnick
@Winnick the answer is not adressing those particular. The numbers in the question are not all finitely representable, at least from what I see.Archipelago
Thanks for many useful comments. I'm trying to incorporate some of it in my answer.Brookebrooker
F
2

No, but as Thomas Padron-McCarthy says, some numbers can be exactly represented using binary but not all of them can.

This is the way I explain it to non-developers who I work with (like Mahmut Ali I also work on an very old financial package): Imagine having a very large cake that is cut into 256 slices. Now you can give 1 person the whole cake, 2 people half of the slices but soon as you decide to split it between 3 you can't - it's either 85 or 86 - you can't split the cake any further. The same is with floating point. You can only get exact numbers on some representations - some numbers can only be closely approximated.

Followthrough answered 11/11, 2015 at 8:13 Comment(0)
C
1

C++ does not require binary floating point representation. Built-in integers are required to have a binary representation, commonly two's complement, but one's complement and sign and magnitude are also supported. But floating point can be e.g. decimal.

This leaves open the question of whether C++ floating point can have a radix that does not have 2 as a prime factor, like 2 and 10. Are other radixes permitted? I don't know, and last time I tried to check that, I failed.

However, assuming that the radix must be 2 or 10, then all your examples involve values that are powers of 2 and therefore can be exactly represented.

This means that the single answer to most of your questions is “yes”. The exception is the question “is integer multiply by IEEE 754 binary float [exact]”. If the result exceeds the precision available, then it can't be exact, but otherwise it is.

See the classic “What Every Computer Scientist Should Know About Floating-Point Arithmetic” for background info about floating point representation & properties in general.


If a value can be exactly represented in 32-bit or 64-bit IEEE 754, then that doesn't mean that it can be exactly represented with some other floating point representation. That's because different 32-bit representations and different 64-bit representations use different number of bits to hold the mantissa and have different exponent ranges. So a number that can be exactly represented in one way, can be beyond the precision or range of some other representation.


You can use std::numeric_limits<T>::is_iec559 (where e.g. T is double) to check whether your implementation claims to be IEEE 754 compatible. However, when floating point optimizations are turned on, at least the g++ compiler (1)erroneously claims to be IEEE 754, while not treating e.g. NaN values correctly according to that standard. In effect, the is_iec559 only tells you whether the number representation is IEEE 754, and not whether the semantics conform.


(1) Essentially, instead of providing different types for different semantics, gcc and g++ try to accommodate different semantics via compiler options. And with separate compilation of parts of a program, that can't conform to the C++ standard.

Christoffer answered 11/11, 2015 at 8:16 Comment(5)
Right, but the question focuses explicitly on IEEE 754 representation.Porcupine
In C (C11, specifically), if I have a union of a float and an int, and I already know that sizeof(float) == sizeof(int), what happens if I write to the float, then read from the int? Are you saying I won't get a binary number back? C11 explicitly allows type punning here. See the accepted answer on this question: #11373703Permute
@George: Type punning has no relevance. So I'm not sure what you're asking; it doesn't make sense to me, but I guess it's about possible decimal floating point representations. In C++ numeric_limits<T>::radix() gives you the radix for a specific floating point type T. In C++11 it's specified as equivalent to FLT_RADIX in C, and that implies that for C++11 and earlier it's the same for all floating point types in a given implementation, as it is in C. It's there because it's needed. There is a proposal for a guaranteed decimal type, and then the equivalence has to be dropped.Christoffer
@Cheers and hth. - Alf, the type punning part was relevant because I wanted to make clear that accessing the binary representation of a floating-point number through the other element of a union was legal in the example I was describing, so I wouldn't get yelled at by people for invoking undefined behavior. Also, even a guaranteed decimal type will still have some sort of binary representation.Permute
@George: You are correct that C++ can only be effectively used on a binary machine, due to the requirement of binary built-in integral types. You are, I believe, wrong in assuming that readers are not aware of that. However, on the third & gripping hand, it's not as obvious as it might seem. When Donald Knuth defined the MIX assembly language in his classic The Art of Computer Programming, he defined so that one couldn't readily tell whether the underlying hardware was binary or decimal. C++ could be effectively used on such a machine. Except that such machine has no advantage today. ;-)Christoffer
P
0

In principle, this should be possible. If you restrict yourself to exactly this class of numbers with a finite 2-power representation.

But it is dangerous: what if someone takes your code and changes your 0.5 to 0.4 or your .0625 to .065 due to whatever reasons? Then your code is broken. And no, even excessive comments won't help about that - someone will always ignore them.

Porcupine answered 11/11, 2015 at 8:11 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.