Evil in the python decimal / float
Asked Answered
L

3

17

I have a large amount of python code that tries to handle numbers with 4 decimal precision and I am stuck with python 2.4 for many reasons. The code does very simplistic math (its a credit management code that takes or add credits mostly)

It has intermingled usage of float and Decimal (MySQLdb returns Decimal objects for SQL DECIMAL types). After several strange bugs coming up from usage, I have found root cause of all to be a few places in the code that float and Decimals are being compared.

I got to cases like this:

>>> from decimal import Decimal
>>> max(Decimal('0.06'), 0.6)
Decimal("0.06")

Now my fear is that I might not be able to catch all such cases in the code. (a normal programmer will keep doing x > 0 instead of x > Decimal('0.0000') and it is very hard to avoid)

I have come up with a patch (inspired by improvements to decimal package in python 2.7).

import decimal
def _convert_other(other):
     """Convert other to Decimal.

     Verifies that it's ok to use in an implicit construction.
     """
     if isinstance(other, Decimal):
         return other
     if isinstance(other, (int, long)):
         return Decimal(other)
     # Our small patch begins
     if isinstance(other, float):
         return Decimal(str(other))
     # Our small patch ends
     return NotImplemented
decimal._convert_other = _convert_other

I just do it in a very early loading library and it will change the decimal package behavior by allowing for float to Decimal conversion before comparisons (to avoid hitting python's default object to object comparison).

I specifically used "str" instead of "repr" as it fixes some of float's rounding cases. E.g.

>>> Decimal(str(0.6))
Decimal("0.6")
>>> Decimal(repr(0.6))
Decimal("0.59999999999999998")

Now my question is: Am I missing anything here? Is this fairly safe? or am I breaking something here? (I am thinking the authors of the package had very strong reasons to avoid floats so much)

Larina answered 15/11, 2010 at 8:9 Comment(0)
T
4

I think you want raise NotImplementedError() instead of return NotImplemented, to start.

What you're doing is called "monkey patching", and is OK to do, so long as you know what you're doing, are aware of the fallout, and are OK with that fallout. Generally you limit this to fixing a bug, or some other change where you know you're alteration of the behavior is still correct and backwards compatible.

In this case, because you're patching a class, you can change behavior outside of the cases where you use it. If another library uses decimal, and somehow relies on the default behavior, it might cause subtle bugs. The trouble is you don't really know unless you audit all your code, including any dependencies, and find all the call sites.

Basically - do it at your own risk.

Personally I find it more reassuring to fix all my code, add tests, and make it harder to do the wrong thing (e.g., use wrapper classes or helper functions). Another approach would be to instrument your code with your patch to find all the call sites, then go back and fix them.

Edit - I guess I should add that the probable reason they avoided floats is floats can't accurately represent all numbers, which is important if you're dealing with money.

Tympanites answered 15/11, 2010 at 8:24 Comment(3)
Just a note that the "return NotImplemented" is from the decimal.py package itself. The two lines I have added are between the comments. I agree with your approach, however, in this implementation, python allows for logically insane comparisons between objects that we assume are both numbers. Hmm, another idea might be to raise an error instead of implicit conversion, but regardless, I think I need to do something ...Larina
return NotImplemented is correct and is the proper, documentation specified thing to return for an unsupported comparison. It allows python to try to find another way to do things.Continuance
+1 for using the term "monkey patching", which led me to wikipedia the term, to find it comes from "guerrilla patching", like in guerrilla warfare =).Castle
T
3

There are very good reasons to avoid floats. With floats, you cannot reliably do comparisons such as ==, >, < etc. because of floating point noise. With any floating point operation you accumulate noise. It starts with very small digits appearing at the very end, e.g., 1.000...002 but it can eventually accumulate such as 1.0000000453436.

Using str() may work for you if you don't do that many floating point computations, but if you do a lot of computations, the floating point noise will eventually be big enough that str() will give you the wrong answer.

In sum, if (1) you don't do that many floating point computations, or (2) you don't need to do comparisons like ==, >, < etc then you might be ok.

If you want to be sure, then remove all floating point code.

Tartar answered 15/11, 2010 at 14:6 Comment(2)
There are very good reasons to avoid floats in accounting programs like the one in the question. Floats work perfectly fine for their intended purpose of representing approximate quantities.Toein
@Dan, yes, the premise of my answer is that you can't do == with floats. If you are representing approximate quantities, then you are not using == since equality is not approximate.Tartar
D
-2

First, floating-point numbers are not "evil". They inaccuracy you are seeing is the result of machine error (aka, a finite system/precision trying to represent an infinite system/precision).

Something that you could try is round( , 2) so that it rounds to your monetary/credit calculations (since that is the extent of precision you will ever require).

>>> round(0.6, 2)
0.6
>>> round(0.5999998, 2)
0.6
Disaster answered 7/6, 2022 at 3:8 Comment(2)
The original question isn't related to float precision - it's related to the fact that in ancient Python versions, comparisons between float and Decimal values would silently give "wrong" results - i.e., results not based on the numeric values at all. See the OP's example of max(Decimal('0.06'), 0.6) giving 0.06, even though 0.6 is ten times larger. That's long since fixed.Coccidioidomycosis
Even so, numeric representation and conversions between them are intrinsically linked to the representation's precision.Disaster

© 2022 - 2024 — McMap. All rights reserved.