Can I use pandas.dataframe.isin() with a numeric tolerance parameter?
Asked Answered
I

1

8

I reviewed the following posts beforehand. Is there a way to use DataFrame.isin() with an approximation factor or a tolerance value? Or is there another method that could?

Filter dataframe rows if value in column is in a set list of values

use a list of values to select rows from a pandas dataframe

EX)

df = DataFrame({'A' : [5,6,3.3,4], 'B' : [1,2,3.2, 5]})

In : df
Out:
   A    B
0  5    1
1  6    2
2  3.3  3.2
3  4    5  

df[df['A'].isin([3, 6], tol=.5)]

In : df
Out:
   A    B
1  6    2
2  3.3  3.2
Igbo answered 20/9, 2016 at 19:7 Comment(1)
In this exact case, you can create copies of A & B that are rounded to the nearest integer then use those to identify valid index values in the original columns. In other words, you can implement the tolerances on the data side rather than the function side.Pocosin
P
16

You can do a similar thing with numpy's isclose:

df[np.isclose(df['A'].values[:, None], [3, 6], atol=.5).any(axis=1)]
Out: 
     A    B
1  6.0  2.0
2  3.3  3.2

np.isclose returns this:

np.isclose(df['A'].values[:, None], [3, 6], atol=.5)
Out: 
array([[False, False],
       [False,  True],
       [ True, False],
       [False, False]], dtype=bool)

It is a pairwise comparison of df['A']'s elements and [3, 6] (that's why we needed df['A'].values[: None] - for broadcasting). Since you are looking for whether it is close to any one of them in the list, we call .any(axis=1) at the end.


For multiple columns, change the slice a little bit:

mask = np.isclose(df[['A', 'B']].values[:, :, None], [3, 6], atol=0.5).any(axis=(1, 2))
mask
Out: array([False,  True,  True, False], dtype=bool)

You can use this mask to slice the DataFrame (i.e. df[mask])


If you want to compare df['A'] and df['B'] (and possible other columns) with different vectors, you can create two different masks:

mask1 = np.isclose(df['A'].values[:, None], [1, 2, 3], atol=.5).any(axis=1)
mask2 = np.isclose(df['B'].values[:, None], [4, 5], atol=.5).any(axis=1)
mask3 = ...

Then slice:

df[mask1 & mask2]  # or df[mask1 & mask2 & mask3 & ...]
Prevalent answered 20/9, 2016 at 19:14 Comment(2)
it was something else, I managed to fix the issue. now im trying to figure out how to do multiple columns at a time. perhaps join/concatenate df2 and df3? df2=df[np.isclose(df['B'].values[:, None], [0.939,0.874,1.0, ], atol=.05).any(axis=1)] df3=df[np.isclose(df['A'].values[:, None], [-0.12,0.0,0.12], atol=.05).any(axis=1)]Igbo
something like this mask = np.isclose(df[['A', 'B', 'C', 'D']].values[:, :, :,:,None], [a_val, b_val, c_val, d_val], atol=0.5).any(axis=(1, 2,3,4)) where a_val correspoonds to df['A'], b_val to df['B'] and so onIgbo

© 2022 - 2024 — McMap. All rights reserved.