Does the quantile() function in Pandas ignore NaN?

Asked 4/9, 2018 at 17:27 Answered 17/11, 2021 at 10:35

I have a dfAB

import pandas as pd
import random

A = [ random.randint(0,100) for i in range(10) ]
B = [ random.randint(0,100) for i in range(10) ]

dfAB = pd.DataFrame({ 'A': A, 'B': B })
dfAB

We can take the quantile function, because I want to know the 75th percentile of the columns:

dfAB.quantile(0.75)

But say now I put some NaNs in the dfAB and re-do the function, obviously its differnt:

dfAB.loc[5:8]=np.nan
dfAB.quantile(0.75)

Basically, when I calculated the mean of the dfAB, I passed skipna to ignore Na's as I didn't want them affecting my stats (I have quite a few in my code, on purpose, and obv making them zero doesn't help)

dfAB.mean(skipna=True)

Thus, what im getting at is whether/how the quantile function addresses NaN's?

Electrodynamics answered 4/9, 2018 at 17:27 Comment(5)

Well, if you pass skipna=True, I guess it skips them. – Ceuta 4/9, 2018 at 17:31

If you not pass skipna=True , in mean , if it have nan , it will return nan – Rainstorm 4/9, 2018 at 17:34

Don't ask us; we're biological units. Try it and see what happens. Load a df with half NaN values and play around for a few minutes. – Oddfellow 4/9, 2018 at 17:34

side comment on the way you generate A, B. you can just A = np.random.randint(100, size=10) – Cassicassia 4/9, 2018 at 17:40

Docs didn't have a reference to skipnan for quantile function, that's why I asked.. DataFrame.quantile(q=0.5, axis=0, numeric_only=True, interpolation='linear') @sacul kindly highlighted the correct comparator, which I didn't know existed, in np.nanpercentile Thanks all – Electrodynamics 4/9, 2018 at 17:49

Yes, this appears to be the way that pd.quantile deals with NaN values. To illustrate, you can compare the results to np.nanpercentile, which explicitely Computes the qth percentile of the data along the specified axis, while ignoring nan values (quoted from the docs, my emphasis):

>>> dfAB
      A     B
0   5.0  10.0
1  43.0  67.0
2  86.0   2.0
3  61.0  83.0
4   2.0  27.0
5   NaN   NaN
6   NaN   NaN
7   NaN   NaN
8   NaN   NaN
9  27.0  70.0

>>> dfAB.quantile(0.75)
A    56.50
B    69.25
Name: 0.75, dtype: float64

>>> np.nanpercentile(dfAB, 75, axis=0)
array([56.5 , 69.25])

And see that they are equivalent

Montane answered 4/9, 2018 at 17:38 Comment(1)

For Pandas v2.0 and up the default for numeric_only is False. See docs. I expect this will change the output of the answer here. – Vogler 24/7, 2023 at 15:38

Yes. pd.quantile() will ignore NaN values when calculating the quantile.

To prove this, we can compare it with np.nanquantile, which compute the qth quantile of the data along the specified axis, while ignoring nan values[source] .

>>> random.seed(7)
>>> A = [ random.randint(0,100) for i in range(10) ]
>>> B = [ random.randint(0,100) for i in range(10) ]
>>> dfAB = pd.DataFrame({'A': A, 'B': B})
>>> dfAB.loc[5:8]=np.nan

>>> dfAB
      A     B
0  41.0   7.0
1  19.0  64.0
2  50.0  27.0
3  83.0   4.0
4   6.0  11.0
5   NaN   NaN
6   NaN   NaN
7   NaN   NaN
8   NaN   NaN
9  74.0  11.0

>>> dfAB.quantile(0.75)
A    68.0
B    23.0
Name: 0.75, dtype: float64

>>> np.nanquantile(dfAB, 0.75, axis=0)
array([68.  23.])

Fitment answered 17/11, 2021 at 10:35 Comment(0)

Recommended topics

Hot tags