Replace outliers with column quantile in Pandas dataframe
Asked Answered
H

1

8

I have a dataframe:

df = pd.DataFrame(np.random.randint(0,100,size=(5, 2)), columns=list('AB'))
    A   B
0  92  65
1  61  97
2  17  39
3  70  47
4  56   6

Here are 5% quantiles:

down_quantiles = df.quantile(0.05)
A    24.8
B    12.6

And here is the mask for values that are lower than quantiles:

outliers_low = (df < down_quantiles)
       A      B
0  False  False
1  False  False
2   True  False
3  False  False
4  False   True

I want to set values in df lower than quantile to its column quantile. I can do it like this:

df[outliers_low] = np.nan
df.fillna(down_quantiles, inplace=True)

    A   B
0  92.0  65.0
1  61.0  97.0
2  24.8  39.0
3  70.0  47.0
4  56.0  12.6

But certainly there should be a more elegant way. How can I do this without fillna? Thanks.

Haerle answered 20/1, 2017 at 9:20 Comment(3)
do you mind a one-liner: df[~outliers_low].fillna(down_quantiles, inplace=True)?Bile
I thought that there should be more native pandas way to do this. And Nickil Maveli answer clearly shows it.Haerle
Yes I forgot about mask, I've upvoted Nickil's answer accordinglyBile
C
12

You can use DF.mask() method. Wherever there is a presence of a True instance, the values from the other series get's replaced aligned as per matching column names by providing axis=1.

df.mask(outliers_low, down_quantiles, axis=1)  

enter image description here


Another variant would be to use DF.where() method after inverting your boolean mask using the tilde (~) symbol.

df.where(~outliers_low, down_quantiles, axis=1)

enter image description here

Checkrow answered 20/1, 2017 at 9:26 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.