Fixed point vs Floating point number
Asked Answered
E

7

167

I just can't understand fixed point and floating point numbers due to hard to read definitions about them all over Google. But none that I have read provide a simple enough explanation of what they really are. Can I get a plain definition with example?

Eyeshade answered 23/9, 2011 at 5:53 Comment(4)
I am also having trouble finding a good definition. The main thing I'm looking for is something to help me understand these results: 3.11 + 42.0 = 45.110001 (not 45.11), 3.12 + 42.0 = 45.119999 (not 45.12), 3.15 + 42.0 = 45.150002 (not 45.15).Corking
I think this is a relevant discussion (especially describing the traps with floating numbers and the better precision with fixed point and why you'd never want to descibe money as a float. #6320709Swordfish
Random link where the topics are mentioned "OpenGL® ES OpenGL ES includes profiles for floating-point and fixed-point systems and the EGL™ specification..."Shaftesbury
why have you not accepted an answer? what is missing for you? want place a bounty to fill in the missing knowledge you need?Icono
E
217

A fixed point number has a specific number of bits (or digits) reserved for the integer part (the part to the left of the decimal point) and a specific number of bits reserved for the fractional part (the part to the right of the decimal point). No matter how large or small your number is, it will always use the same number of bits for each portion. For example, if your fixed point format was in decimal IIIII.FFFFF then the largest number you could represent would be 99999.99999 and the smallest non-zero number would be 00000.00001. Every bit of code that processes such numbers has to have built-in knowledge of where the decimal point is.

A floating point number does not reserve a specific number of bits for the integer part or the fractional part. Instead it reserves a certain number of bits for the number (called the mantissa or significand) and a certain number of bits to say where within that number the decimal place sits (called the exponent). So a floating point number that took up 10 digits with 2 digits reserved for the exponent might represent a largest value of 9.9999999e+50 and a smallest non-zero value of 0.0000001e-49.

Entirely answered 23/9, 2011 at 6:6 Comment(11)
Well you're forgetting that floating point numbers are almost always signed, so the minimum value would really be -9.9999999e+50.Tipster
Also there is exponent biasing so that you can represent vastly more discrete values between 0 and 1 than you can between 1,000,000 and 1,000,001. And there are lots of complications with floating point operations which come up, like ensuring you don't report 0 as the difference between two very similar numbers when the bias isn't enough to increase precision.Tipster
@BrianGordon: I didn't forget sign bits; I intentionally ignored them in order to have a simple description and not have to worry about the difference between minimum/maximum and smallest/largest. I also intentionally left out exponent biasing (which has nothing to do with the number of discrete values between any two numbers), NaNs, infinities, normalizing, gradual underflow, signed zeroes, the fact that most floats are binary (which allows the first bit of the mantissa to be left out), and a number of other aspects that are unnecessary to illustrate the concept.Entirely
Oh, I was confusing exponent biasing with normalizingTipster
Why is the cutoff for the minimum number of fixed-point 00000.00001? I'm expecting to see 00000.00000 instead. Also, do you have any references that go into more detail about fixed-point numbers?Vulture
@NickMiller: Sorry for the confusion, but I was talking about non-zero numbers. My example format can represent 0, 0.00001, 0.00002, ..., 99999.99998, 99999.99999.Entirely
Can anyone explain how he got 9.9999999e+50 and 0.0000001e-49 with a floating-point of 10 bits?Mcmorris
@zer0uno: The key is that the answer said 10 digits, rather than 10 bits, as many folks familiar with computer arithmetic might expect. After discussing binary representations, the answer switched to examples that were all in decimal. So 9.9999999+e50 is 1 digit for the 9, 7 for the 9999999, and 2 for the 50, for a total of (1 + 7 + 2 =) 10.Drippy
I'm confused...if you have 2 digits for the exponent, then why is it not e+99 and e-99 as the limit for your examples?Icono
@CharlieParker, 2 digits is enough for 100 different exponents (-49 to 50). If you had 2 digits plus a sign bit, you could have 200 different exponents (-99 to 100), but I intentionally excluded sign bits from my description to simplify things.Entirely
@Entirely , I understand the smallest, but why largest number is not 99999999e+50 ? Why you put a . to make number 9.9999999?Jemmie
E
50

A fixed point number just means that there are a fixed number of digits after the decimal point. A floating point number allows for a varying number of digits after the decimal point.

For example, if you have a way of storing numbers that requires exactly four digits after the decimal point, then it is fixed point. Without that restriction it is floating point.

Often, when fixed point is used, the programmer actually uses an integer and then makes the assumption that some of the digits are beyond the decimal point. For example, I might want to keep two digits of precision, so a value of 100 means actually means 1.00, 101 means 1.01, 12345 means 123.45, etc.

Floating point numbers are more general purpose because they can represent very small or very large numbers in the same way, but there is a small penalty in having to have extra storage for where the decimal place goes.

Embarkment answered 23/9, 2011 at 6:4 Comment(2)
floating point usage becomes a problem when you start doing calculations with them like for instance if you add a really small and a really large floating point number together. The summed result loses accuracy since it has to represent a number with two extremes and the lower decimals get trimmed off (and aren't rounded).Swordfish
also the potential use of exponent values in a floating point can cause issues with computer systems expecting standard numbersSwordfish
S
10

From my understanding, fixed-point arithmetic is done using integers. where the decimal part is stored in a fixed amount of bits, or the number is multiplied by how many digits of decimal precision is needed.

For example, If the number 12.34 needs to be stored and we only need two digits of precision after the decimal point, the number is multiplied by 100 to get 1234. When performing math on this number, we'd use this rule set. Adding 5620 or 56.20 to this number would yield 6854 in data or 68.54.

If we want to calculate the decimal part of a fixed-point number, we use the modulo (%) operand.

12.34 (pseudocode):

v1 = 1234 / 100 // get the whole number
v2 = 1234 % 100 // get the decimal number (100ths of a whole).
print v1 + "." + v2 // "12.34"

Floating point numbers are a completely different story in programming. The current standard for floating point numbers use something like 23 bits for the data of the number, 8 bits for the exponent, and 1 but for sign. See this Wikipedia link for more information on this.

Serg answered 21/2, 2017 at 20:13 Comment(0)
A
4

The term ‘fixed point’ refers to the corresponding manner in which numbers are represented, with a fixed number of digits after, and sometimes before, the decimal point. With floating-point representation, the placement of the decimal point can ‘float’ relative to the significant digits of the number. For example, a fixed-point representation with a uniform decimal point placement convention can represent the numbers 123.45, 1234.56, 12345.67, etc, whereas a floating-point representation could in addition represent 1.234567, 123456.7, 0.00001234567, 1234567000000000, etc.

Arabic answered 18/6, 2014 at 20:4 Comment(0)
G
3

There's very little mention of what I consider the defining feature of fixed point numbers. The key difference is that floating-point numbers have a constant relative (percent) error caused by rounding or truncating. Fixed-point numbers have constant absolute error.

With 64-bit floats, you can be sure that the answer to x+y will never be off by more than 1 bit, but how big is a bit? Well, it depends on x and y -- if the exponent is equal to 10, then rounding off the last bit represents an error of 2^10=1024, but if the exponent is 0, then rounding off a bit is an error of 2^0=1.

With fixed point numbers, a bit always represents the same amount. For example, if we have 32 bits before the decimal point and 32 after, that means truncation errors will always change the answer by 2^-32 at most. This is great if you're working with numbers that are all about equal to 1, which gain a lot of precision, but bad if you're working with numbers that have different units--who cares if you calculate a distance of a googol meters, then end up with an error of 2^-32 meters?

In general, floating-point lets you represent much larger numbers, but the cost is higher (absolute) error for medium-sized numbers. Fixed points get better accuracy if you know how big of a number you'll have to represent ahead of time, so that you can put the decimal exactly where you want it for maximum accuracy. But if you don't know what units you're working with, floats are a better choice, because they represent a wide range with an accuracy that's good enough.

Godunov answered 19/10, 2022 at 20:34 Comment(0)
T
-2

It is CREATED, that fixed-point numbers don't only have some Fixed number of decimals after point (digits) but are mathematically represented in negative powers. Very good for mechanical calculators:

e.g, the price of smth is USD 23.37 (Q=2 digits after the point. ) The machine knows where the point is supposed to be!

Tomfoolery answered 26/1, 2022 at 14:1 Comment(1)
Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.Scheme
K
-7

Take the number 123.456789

  • As an integer, this number would be 123
  • As a fixed point (2), this number would be 123.46 (Assuming you rounded it up)
  • As a floating point, this number would be 123.456789

Floating point lets you represent most every number with a great deal of precision. Fixed is less precise, but simpler for the computer..

Kansas answered 23/9, 2011 at 6:4 Comment(3)
The precision with which you can write a number is not related to whether it's written in floating point, integer or fixed point. It's related to the number of significant digits you have available. For an example, INT_MAX is a number that can be exactly represented as an int, but not as a float because they don't have the 31 bits of precision needed to exactly represent it.Shaun
fixed is the most precise as long as its sized to handle the number in question. When you do math with fixed point numbers, rounding occurs when a calculation has a remainder exceeding the decimal limit. With a floating point you can get a very inaccurate value if you add a very small number with a very large one. When that occurs digits are lost without roundingSwordfish
This is misleading and wrong. Fixed point means that the number of digits after the decimal point is fixed. It says nothing about how much precision it has.Cufic

© 2022 - 2024 — McMap. All rights reserved.