Is floating-point math consistent in C#? Can it be?

Asked 13/7, 2011 at 17:29 Answered 2/8, 2011 at 14:32

Solved c#.net floating-point precision ieee-754

161

No, this is not another "Why is (1/3.0)*3 != 1" question.

I've been reading about floating-points a lot lately; specifically, how the same calculation might give different results on different architectures or optimization settings.

This is a problem for video games which store replays, or are peer-to-peer networked (as opposed to server-client), which rely on all clients generating exactly the same results every time they run the program - a small discrepancy in one floating-point calculation can lead to a drastically different game-state on different machines (or even on the same machine!)

This happens even amongst processors that "follow" IEEE-754, primarily because some processors (namely x86) use double extended precision. That is, they use 80-bit registers to do all the calculations, then truncate to 64- or 32-bits, leading to different rounding results than machines which use 64- or 32- bits for the calculations.

I've seen several solutions to this problem online, but all for C++, not C#:

Disable double extended-precision mode (so that all double calculations use IEEE-754 64-bits) using _controlfp_s (Windows), _FPU_SETCW (Linux?), or fpsetprec (BSD).
Always run the same compiler with the same optimization settings, and require all users to have the same CPU architecture (no cross-platform play). Because my "compiler" is actually the JIT, which may optimize differently every time the program is run, I don't think this is possible.
Use fixed-point arithmetic, and avoid float and double altogether. decimal would work for this purpose, but would be much slower, and none of the System.Math library functions support it.

So, is this even a problem in C#? What if I only intend to support Windows (not Mono)?

If it is, is there any way to force my program to run at normal double-precision?

If not, are there any libraries that would help keep floating-point calculations consistent?

Krieger answered 13/7, 2011 at 17:29 Comment(11)

I've seen this question, but every answer either repeats the problem with no solution, or says "ignore it," which is not an option. I asked a similar question on gamedev, but (because of the audience) most of the answers seem to be geared towards C++. – Krieger 13/7, 2011 at 17:30

not an answer, but I'm sure you in most domains you could design your system in such a way that all shared state is deterministic, and there is no significant performance degradation because of that – Teishateixeira 13/7, 2011 at 18:28

Are you aware of floating point emulation? If absolute portability of floating point operations is a concern, maybe a library dealing with floating point emulation could help. I'm not aware of any, though, and some performance may have to be sacrificed in the process. – Sherburne 13/7, 2011 at 18:54

@Peter do you know of any fast floating point emulation for .net? – Latonialatoniah 13/7, 2011 at 21:55

Here are some informations about System.Decimal - maybe it is interesting: csharpindepth.com/Articles/General/Decimal.aspx – Variegate 21/7, 2011 at 13:41

What floating point operations and math functions do you intend to use for your project? I'm currently working on a 64-bit floating point library (see my answer) and would like to know what features you would like to see in my library. – Sherburne 22/7, 2011 at 1:35

Isn't decimal essentially a type of floating-point emulation? – Ss 23/7, 2011 at 2:6

@Ss Yes it is. But it's not a good fit for games. – Latonialatoniah 23/7, 2011 at 10:33

Does Java suffer from this problem? – Democracy 1/8, 2011 at 20:40

@Josh: Java has the strictfp keyword, which forces all calculations to be done in the stated size (float or double) rather than an extended size. However, Java still has many problems with IEE-754 support. Very (very, very) few programming languages support IEE-754 well. – Salo 2/8, 2011 at 10:3

Related: x86 assembly language gives you deterministic FP (except maybe for rsqrtps), but the trick is getting the same or similar sources to always compile the same. It's problematic even in C/C++ if you allow different compilers. Related: Does any floating point-intensive code produce bit-exact results in any x86-based architecture?. But at least with an ahead-of-time compiled language, you can usually get determinism if you avoid any FP library functions that are allowed to be different on different platforms. – Eventful 30/4, 2018 at 9:12

I know of no way to way to make normal floating points deterministic in .net. The JITter is allowed to create code that behaves differently on different platforms(or between different versions of .net). So using normal floats in deterministic .net code is not possible.

The workarounds I considered:

Implement FixedPoint32 in C#. While this is not too hard(I have a half finished implementation) the very small range of values makes it annoying to use. You have to be careful at all times so you neither overflow, nor lose too much precision. In the end I found this not easier than using integers directly.
Implement FixedPoint64 in C#. I found this rather hard to do. For some operations intermediate integers of 128bit would be useful. But .net doesn't offer such a type.
Implement a custom 32 bit floatingpoint. The lack of a BitScanReverse intrinsic causes a few annoyances when implementing this. But currently I think this is the most promising path.
Use native code for the math operations. Incurs the overhead of a delegate call on every math operation.

I've just started a software implementation of 32 bit floating point math. It can do about 70million additions/multiplications per second on my 2.66GHz i3. https://github.com/CodesInChaos/SoftFloat . Obviously it's still very incomplete and buggy.

Latonialatoniah answered 13/7, 2011 at 21:33 Comment(5)

there's an "unlimited" sized integer available BigInteger though not as fast as native int or long it is there so .NET does offer such a type (created for F# I believe but can be used in C#) – Catherinacatherine 12/12, 2012 at 8:41

If you're going to do any of these, you might as well try decimal first, as it's much simpler to do. Only if it's too slow for the task at hand would other approaches be worth thinking about. – Decile 15/4, 2015 at 12:50

I have learned about one special case where floating points are deterministic. Explanation I got is: For multiplication/division, if one of the FP numbers is power of two number (2^x), significant/mantissa won't change during calculation. Only exponent will change (point will move). So rounding will never happen. Result will be deterministic. – Ratite 8/9, 2016 at 11:12

Example: A number like 2^32 is represented as (exponent: 32, mantissa: 1). If we multiply this with another float (exp, man), the result is (exp + 32, man * 1). For division, the result is (expo - 32, man * 1). Multiplying the mantissa by 1 doesn't change the mantissa, so it doesn't matter how many bits it has. – Ratite 8/9, 2016 at 11:13

Apologies for the downvote. I mis-clicked (if that is a word) on my phone, and now I can’t change it. – Juggins 12/7, 2018 at 13:32

The C# specification (§4.1.6 Floating point types) specifically allows floating point computations to be done using precision higher than that of the result. So, no, I don't think you can make those calculations deterministic directly in .Net. Others suggested various workarounds, so you could try them.

Encyclopedia answered 13/7, 2011 at 21:44 Comment(3)

I just realized that the C# specification doesn't really matter if one distributes compiled assemblies. It only matters if one wants source compatibility. What really matters is the CLR specification. But I'm pretty sure it's guarantees are just as weak as the C# guarantees. – Latonialatoniah 2/8, 2011 at 17:54

Wouldn't casting to double each time after an operation strip away the unwanted bits, yielding consistent results? – Akins 27/2, 2016 at 22:58

@IllidanS4 I don't think that would guarantee consistent results. – Encyclopedia 27/2, 2016 at 23:5

The following page may be useful in the case where you need absolute portability of such operations. It discusses software for testing implementations of the IEEE 754 standard, including software for emulating floating point operations. Most information is probably specific to C or C++, however.

http://www.math.utah.edu/~beebe/software/ieee/

A note on fixed point

Binary fixed point numbers can also work well as a substitute for floating point, as is evident from the four basic arithmetic operations:

Addition and subtraction are trivial. They work the same way as integers. Just add or subtract!
To multiply two fixed point numbers, multiply the two numbers then shift right the defined number of fractional bits.
To divide two fixed point numbers, shift the dividend left the defined number of fractional bits, then divide by the divisor.
Chapter four of Hattangady (2007) has additional guidance on implementing binary fixed point numbers (S.K. Hattangady, "Development of a Block Floating Point Interval ALU for DSP and Control Applications", Master's thesis, North Carolina State University, 2007).

Binary fixed point numbers can be implemented on any integer data type such as int, long, and BigInteger, and the non-CLS-compliant types uint and ulong.

As suggested in another answer, you can use lookup tables, where each element in the table is a binary fixed point number, to help implement complex functions such as sine, cosine, square root, and so on. If the lookup table is less granular than the fixed point number, it is suggested to round the input by adding one half of the granularity of the lookup table to the input:

// Assume each number has a 12 bit fractional part. (1/4096)
// Each entry in the lookup table corresponds to a fixed point number
//  with an 8-bit fractional part (1/256)
input+=(1<<3); // Add 2^3 for rounding purposes
input>>=4; // Shift right by 4 (to get 8-bit fractional part)
// --- clamp or restrict input here --
// Look up value.
return lookupTable[input];

Sherburne answered 13/7, 2011 at 19:41 Comment(5)

You should upload this to an open-source code project site, like sourceforge or github. This makes it easier to find, easier to contribute to, easier to put on your resume etc. Also, a few source-code tips (feel free to ignore): Use const instead of static for constants, so the compiler can optimize them; prefer member functions to static functions (so we can call, ex. myDouble.LeadingZeros() instead of IntDouble.LeadingZeros(myDouble)); try to avoid single-letter variable names (MultiplyAnyLength, for example, has 9, making it very hard to follow) – Krieger 22/7, 2011 at 2:12

Be careful using unchecked and non-CLS-compliant types like ulong, uint, etc. for speed purposes - because they are so rarely used, the JIT does not optimize them as aggressively, so using them can actually be slower than using normal types like long and int. Also, C# has operator overloading, which this project would benefit greatly from. Finally, are there any associated unit-tests? Besides those small things, amazing job Peter, this is ridiculously impressive! – Krieger 22/7, 2011 at 2:15

Thank you for the comments. I do perform unit tests on the code. They are rather extensive, though, much too extensive to release for now. I even write unit-testing helper routines to make writing multiple tests easier. I don't use overloaded operators for now because I have plans of translating the code to Java when I'm done. – Sherburne 22/7, 2011 at 2:20

The funny thing is when I posted on your blog I didn't notice that blog was yours. I had just decided to try google+ and in its C# spark it suggested that blog entry. So I thought "What a remarkable coincidence for us two to start writing such a thing at the same time". But of course we had the same trigger :) – Latonialatoniah 23/7, 2011 at 18:30

Why bother porting this to Java? Java already has guaranteed deterministic floating point math via strictfp. – Koo 25/2, 2013 at 19:44

Is this a problem for C#?

Yes. Different architectures are the least of your worries, different framerates etc. can lead to deviations due to inaccuracies in float representations - even if they are the same inaccuracies (e.g. same architecture, except a slower GPU on one machine).

Can I use System.Decimal?

There is no reason you can't, however it's dog slow.

Is there a way to force my program to run in double precision?

Yes. Host the CLR runtime yourself; and compile in all the nessecary calls/flags (that change the behaviour of floating point arithmetic) into the C++ application before calling CorBindToRuntimeEx.

Are there any libraries that would help keep floating point calculations consistent?

Not that I know of.

Is there another way to solve this?

I have tackled this problem before, the idea is to use QNumbers. They are a form of reals that are fixed-point; but not fixed point in base-10 (decimal) - rather base-2 (binary); because of this the mathematical primitives on them (add, sub, mul, div) are much faster than the naive base-10 fixed points; especially if n is the same for both values (which in your case it would be). Furthermore because they are integral they have well-defined results on every platform.

Keep in mind that framerate can still affect these, but it is not as bad and is easily rectified using syncronisation points.

Can I use more mathematical functions with QNumbers?

Yes, round-trip a decimal to do this. Furthermore, you should really be using lookup tables for the trig (sin, cos) functions; as those can really give different results on different platforms - and if you code them correctly they can use QNumbers directly.

Tortile answered 1/8, 2011 at 13:5 Comment(6)

Not sure what you're talking about with framerates being issue. Clearly you'd want to have a fixed update-rate (see for example here) - whether or not that's the same as the display-framerate is irrelevant. As long as the inaccuracies are the same on all machines, we're good. I don't understand your third answer at all. – Krieger 1/8, 2011 at 16:40

@BlueRaja: The answer "Is there a way to force my program to run in double precision?" would either amount to reimplementing the entire Common Language Runtime, which would be extremely complicated, or using native calls to a C++ DLL from the C# application, as hinted in user shelleybutterfly's answer. Think of "QNumbers" merely as binary fixed point numbers, as hinted in my answer (I had not until now seen binary fixed point numbers being called "QNumbers".) – Sherburne 1/8, 2011 at 21:9

@Pieter O. You do not need to reimplement the runtime. The server I work on at my company hosts the CLR runtime as a native C++ application (so does SQL Server). I suggest you google CorBindToRuntimeEx. – Tortile 3/8, 2011 at 10:2

@BlueRaja it depends on the game in question. Applying fixed framerate steps to all games is not a viable option - because the AOE algorithm introduces artificial latency; which is unnacceptable in e.g. a FPS. – Tortile 3/8, 2011 at 10:8

@Jonathan: This is only an issue in peer-to-peer games which send the input only - for these, you have to have a fixed update-rate. Most FPS's do not work like this, but the few that do necessarily have a fixed update-rate. See this question. – Krieger 3/8, 2011 at 16:50

@BlueRaja thanks for that. Quite informative - I still think fixed point can be important (but less so) in fixed time step scenarios, because after all you are at the mercy of the OS scheduler (but again, it's not that bad). – Tortile 11/8, 2011 at 0:35

According to this slightly old MSDN blog entry the JIT will not use SSE/SSE2 for floating point, it's all x87. Because of that, as you mentioned you have to worry about modes and flags, and in C# that's not possible to control. So using normal floating point operations will not guarantee the exact same result on every machine for your program.

To get precise reproducibility of double precision you are going to have to do software floating point (or fixed point) emulation. I don't know of C# libraries to do this.

Depending on the operations you need, you might be able to get away with single precision. Here's the idea:

store all values you care about in single precision
to perform an operation:
- expand inputs to double precision
- do operation in double precision
- convert result back to single precision

The big issue with x87 is that calculations might be done in 53-bit or 64-bit accuracy depending on the precision flag and whether the register spilled to memory. But for many operations, performing the operation in high precision and rounding back to lower precision will guarantee the correct answer, which implies that the answer will be guaranteed to be the same on all systems. Whether you get the extra precision won't matter, since you have enough precision to guarantee the right answer in either case.

Operations that should work in this scheme: addition, subtraction, multiplication, division, sqrt. Things like sin, exp, etc. won't work (results will usually match but there is no guarantee). "When is double rounding innocuous?" ACM Reference (paid reg. req.)

Hope this helps!

Goodrich answered 19/7, 2011 at 2:18 Comment(1)

It's also a problem that .NET 5, or 6, or 42, might not use the x87 calculation mode anymore. There's nothing in the standard that requires it to. – Permanganate 15/2, 2012 at 3:9

As already stated by other answers: Yes, this is a problem in C# - even when staying pure Windows.

As for a solution: You can reduce (and with some effort/performance hit) avoid the problem completely if you use built-in BigInteger class and scaling all calculations to a defined precision by using a common denominator for any calculation/storage of such numbers.

As requested by OP - regarding performance:

System.Decimal represents number with 1 bit for a sign and 96 bit Integer and a "scale" (representing where the decimal point is). For all calculations you make it must operate on this data structure and can't use any floating point instructions built into the CPU.

The BigInteger "solution" does something similar - only that you can define how much digits you need/want... perhaps you want only 80 bits or 240 bits of precision.

The slowness comes always from having to simulate all operations on these number via integer-only instructions without using the CPU/FPU-built-in instructions which in turn leads to much more instructions per mathematical operation.

To lessen the performance hit there are several strategies - like QNumbers (see answer from Jonathan Dickinson - Is floating-point math consistent in C#? Can it be?) and/or caching (for example trig calculations...) etc.

Continuous answered 19/7, 2011 at 19:54 Comment(5)

Note that BigInteger is only available in .Net 4.0. – Encyclopedia 20/7, 2011 at 17:42

My guess is that the performance hit of BigInteger exceeds even the performance hit by Decimal. – Latonialatoniah 23/7, 2011 at 20:13

A couple of times in the answers here there is reference to the performance hit of using Decimal (@Jonathan Dickinson - 'dog slow') or BigInteger (@CodeInChaos comment above) - can someone please provide a little explanation on these performance hits and as to whether/why they are really show-stoppers to providing a solution. – Ashok 1/8, 2011 at 15:34

@Continuous - thank you for the edit - interesting reading, however, could you please just also give a ball-park guesstimate as to the performance hit of not-using 'float' are we talking 10% slower or 10 times slower - I just want to get a feel for the order of magnitude implied. – Ashok 1/8, 2011 at 16:9

it more likeley in the area of 1:5 than "only 10%" – Continuous 1/8, 2011 at 16:20

Well, here would be my first attempt on how to do this:

Create an ATL.dll project that has a simple object in it to be used for your critical floating point operations. make sure to compile it with flags that disable using any non xx87 hardware to do floating point.
Create functions that call floating point operations and return the results; start simple and then if it's working for you, you can always increase the complexity to meet your performance needs later if necessary.
Put the control_fp calls around the actual math to ensure that it's done the same way on all machines.
Reference your new library and test to make sure it works as expected.

(I believe you can just compile to a 32-bit .dll and then use it with either x86 or AnyCpu [or likely only targeting x86 on a 64-bit system; see comment below].)

Then, assuming it works, should you want to use Mono I imagine you should be able to replicate the library on other x86 platforms in a similar manner (not COM of course; although, perhaps, with wine? a little out of my area once we go there though...).

Assuming you can make it work, you should be able to set up custom functions that can do multiple operations at once to fix any performance issues, and you'll have floating point math that allows you to have consistent results across platforms with a minimal amount of code written in C++, and leaving the rest of your code in C#.

Dantzler answered 23/7, 2011 at 5:10 Comment(1)

"compile to a 32-bit .dll and then use ... AnyCpu" I think this will only work when running on a 32 bit system. On a 64bit system only a program targetting x86 will be able to load the 32 bit dll. – Latonialatoniah 23/7, 2011 at 20:16

I'm not a game developer, though I do have a lot of experience with computationally difficult problems ... so, I'll do my best.

The strategy I would adopt is essentially this:

Use a slower (if necessary; if there's a faster way, great!), but predictable method to get reproducible results
Use double for everything else (eg, rendering)

The short'n long of this is: you need to find a balance. If you're spending 30ms rendering (~33fps) and only 1ms doing collision detection (or insert some other highly sensitive operation) -- even if you triple the time it takes to do the critical arithmetic, the impact it has on your framerate is you drop from 33.3fps to 30.3fps.

I suggest you profile everything, account for how much time is spent doing each of the noticeably expensive calculations, then repeat the measurements with 1 or more methods of resolving this problem and see what the impact is.

Mesencephalon answered 2/8, 2011 at 14:32 Comment(0)

Checking the links in the other answers make it clear you'll never have a guarantee of whether floating point is "correctly" implemented or whether you'll always receive a certain precision for a given calculation, but perhaps you could make a best effort by (1) truncating all calculations to a common minimum (eg, if different implementations will give you 32 to 80 bits of precision, always truncating every operation to 30 or 31 bits), (2) have a table of a few test cases at startup (borderline cases of add, subtract, multiply, divide, sqrt, cosine, etc.) and if the implementation calculates values matching the table then not bother making any adjustments.

Hooky answered 1/8, 2011 at 18:41 Comment(2)

always truncating every operation to 30 or 31 bits - this is exactly what the float datatype does on x86 machines - however this will cause slightly different results from machines which do all their calculations using only 32-bits, and these small changes will propagate over time. Hence, the question. – Krieger 1/8, 2011 at 19:10

If "N bits of precision" means any calculation is accurate to that many bits, and machine A is accurate to 32 bits while machine B is accurate to 48 bits, then the first 32 bits of any calc by both machines should be identical. Wouldn't truncating to 32 bits or less after every operation keep both machines exactly in sync? If not, what's an example? – Bandwidth 3/8, 2011 at 3:25

-3

Your question in quite difficult and technical stuff O_o. However I may have an idea.

You sure know that the CPU makes some adjustment after any floating operations. And CPU offer several different instructions which make different rounding operation.

So for an expression, your compiler will choose a set of instructions which lead you to a result. But any other instruction workflow, even if they intend to compute the same expression, can provide another result.

The 'mistakes' made by a rounding adjustment will grow at each further instructions.

As an exemple we can say that at an assembly level: a * b * c is not equivalent to a * c * b.

I'm not entirely sure of that, you will need to ask for someone who know CPU architecture a lot more than me : p

However to answer your question: in C or C++ you can solve your problem because you have some control on the machine code generate by your compiler, however in .NET you don't have any. So as long as your machine code can be different, you'll never be sure about the exact result.

I'm curious in which way this can be a problem because variation seems very minimal, but if you need really accurate operation the only solution I can think about will be to increase the size of your floating registers. Use double precision or even long double if you can (not sure that's possible using CLI).

I hope I've been clear enough, I'm not perfect in English (...at all : s)

Abercrombie answered 13/7, 2011 at 21:25 Comment(7)

Imagine a P2P shooter. You shoot at a guy, you hit him and he dies, but it's very close, you almost missed. On the other guy's PC uses slightly different calculations and it computes that you miss. Do you see the problem now? In this case, increasing the size of registers will not help (at least not completely). Using the exact same calculation on each computer will. – Encyclopedia 13/7, 2011 at 21:35

In this scenario one usually doesn't care about how close the result is to the actual result(as long as it's reasonable), but what matters is that it's exactly the same for all users. – Latonialatoniah 13/7, 2011 at 21:54

You right, I didn't though about this kind of scenario. However I'm agree with @CodeInChaos on this one. I didn't found that really smart to take an important decision twice. This is more a software architecture issue. One program, the shooter's application for exemple, should made the calculation and send result to the others. You will never have errors in this way. You have a hit or not, but only one take the descision. Like say @Teishateixeira – Abercrombie 13/7, 2011 at 23:3

@Aesgar: Yes, that is how most shooters work; that "authority" is called the server, and we call the overall architecture a "client/server" architecture. However, there is another kind of architecture: peer-to-peer. In P2P, there is no server; rather, all clients must verify all actions with each other before anything happens. This increases the lag, making it not acceptable for shooters, but vastly decreases the network traffic, making it perfect for games where a small lag (~250ms) is acceptable, but syncing the entire game state is not. Namely, RTS games like C&C and Starcraft use P2P. – Krieger 14/7, 2011 at 6:20

That'll be a weird game if only the server can shoot :D. P2P is a communication protocole but it'll not stop the message: 'This one take my bullet.' If only one program test it, you could even have time to improve more sophisticate physics. And more important it'll ignore your bug due to hardware limitation ! – Abercrombie 14/7, 2011 at 12:24

This kind of network code relies on the code behaving in exactly the same way on all participating computers if the input is exactly the same. It then only sends the input. The nice thing about this is that the traffic is very low even when you have very many objects. Thus it is used in almost all RTS games I've seen. An old article on this: gamasutra.com/view/feature/3094/… – Latonialatoniah 14/7, 2011 at 13:37

In a p2p game you don't have a trusted machine to rely on. If you allow one station to decide whether his bullet hit or not you open up the possibility of a client cheating. Furthermore, the links can't even handle the amount of data that sometimes results--the games work by sending the orders rather than the results. I play RTS games and many times I've seen so much junk flying around there's no way it could be sent over normal household uplinks. – Haight 17/7, 2011 at 6:0

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags