Comparing floats in a pandas column
Asked Answered
L

4

39

I have the following dataframe:

       actual_credit    min_required_credit
   0   0.3              0.4
   1   0.5              0.2
   2   0.4              0.4
   3   0.2              0.3

I need to add a column indicating where actual_credit >= min_required_credit. The result would be:

       actual_credit    min_required_credit   result
   0   0.3              0.4                   False
   1   0.5              0.2                   True
   2   0.4              0.4                   True
   3   0.1              0.3                   False

I am doing the following:

df['result'] = abs(df['actual_credit']) >= abs(df['min_required_credit'])

However the 3rd row (0.4 and 0.4) is constantly resulting in False. After researching this issue at various places including: What is the best way to compare floats for almost-equality in Python? I still can't get this to work. Whenever the two columns have an identical value, the result is False which is not correct.

I am using python 3.3

Lovato answered 10/11, 2015 at 9:15 Comment(0)
W
52

Due to imprecise float comparison you can or your comparison with np.isclose, isclose takes a relative and absolute tolerance param so the following should work:

df['result'] = df['actual_credit'].ge(df['min_required_credit']) | np.isclose(df['actual_credit'], df['min_required_credit'])
Weinreb answered 10/11, 2015 at 10:5 Comment(0)
O
10

@EdChum's answer works great, but using the pandas.DataFrame.round function is another clean option that works well without the use of numpy.

df = pd.DataFrame(  # adding a small difference at the thousandths place to reproduce the issue
    data=[[0.3, 0.4], [0.5, 0.2], [0.400, 0.401], [0.2, 0.3]],
    columns=['actual_credit', 'min_required_credit'])

df['result'] = df['actual_credit'].round(1) >= df['min_required_credit'].round(1)
print(df)
   actual_credit  min_required_credit  result
0            0.3                0.400   False
1            0.5                0.200    True
2            0.4                0.401    True
3            0.2                0.300   False

You might consider using round() to more permanently edit your dataframe, depending if you desire that precision or not. In this example, it seems like the OP suggests this is probably just noise and is just causing confusion.

df = pd.DataFrame(  # adding a small difference at the thousandths place to reproduce the issue
    data=[[0.3, 0.4], [0.5, 0.2], [0.400, 0.401], [0.2, 0.3]],
    columns=['actual_credit', 'min_required_credit'])
df = df.round(1)
df['result'] = df['actual_credit'] >= df['min_required_credit']
print(df)
   actual_credit  min_required_credit  result
0            0.3                  0.4   False
1            0.5                  0.2    True
2            0.4                  0.4    True
3            0.2                  0.3   False
Outfall answered 30/4, 2021 at 15:47 Comment(3)
Yes but caution shoud be applied when you use .round(2) and have 0.005001 and 0.0049999 as an example. You get BIGGER differences after...Brazil
@LittleBobbyTables The number of decimal places to round to should be used with discretion based on your dataset. In the example you provide (0.005001 and 0.0049999), I would be inclined to use .round(3) instead to ensure data are rounded to the nearest thousandth.Outfall
What @LittleBobbyTables said. Rounding isn't the correct method here.Tindall
E
1

In general numpy Comparison functions work well with pd.Series and allow for element-wise comparisons: isclose, allclose, greater, greater_equal, less, less_equal etc.

In your case greater_equal would do:

df['result'] = np.greater_equal(df['actual_credit'], df['min_required_credit'])

or alternatively, as proposed, using pandas.ge(alternatively le, gt etc.):

df['result'] = df['actual_credit'].ge(df['min_required_credit'])

The risk with oring with ge (as mentioned above) is that e.g. comparing 3.999999999999 and 4.0 might return True which might not necessarily be what you want.

Eunuchize answered 25/3, 2020 at 18:56 Comment(0)
R
-4

Use pandas.DataFrame.abs() instead of the built-in abs():

df['result'] = df['actual_credit'].abs() >= df['min_required_credit'].abs()
Repeal answered 10/11, 2015 at 9:29 Comment(1)
Thanks for your suggestion unfortunately this doesn't work for me. I think my data is perhaps represented differently in memory from what I can see on the screen therefore the results are coming out strangely. EdChum's suggestion works. Thanks.Lovato

© 2022 - 2024 — McMap. All rights reserved.