I have thousands of series (rows of a DataFrame) that I need to apply qcut on. Periodically there will be a series (row) that has fewer values than the desired quantile (say, 1 value vs 2 quantiles):
>>> s = pd.Series([5, np.nan, np.nan])
When I apply .quantile()
to it, it has no problem breaking into 2 quantiles (of the same boundary value)
>>> s.quantile([0.5, 1])
0.5 5.0
1.0 5.0
dtype: float64
But when I apply .qcut()
with an integer value for number of quantiles an error is thrown:
>>> pd.qcut(s, 2)
...
ValueError: Bin edges must be unique: array([ 5., 5., 5.]).
You can drop duplicate edges by setting the 'duplicates' kwarg
Even after I set the duplicates
argument, it still fails:
>>> pd.qcut(s, 2, duplicates='drop')
....
IndexError: index 0 is out of bounds for axis 0 with size 0
How do I make this work? (And equivalently, pd.qcut(s, [0, 0.5, 1], duplicates='drop')
also doesn't work.)
The desired output is to have the 5.0
assigned to a single bin and the NaN
are preserved:
0 (4.999, 5.000]
1 NaN
2 NaN
qcut
with atry-except
block to catch the faulty Series (Be specific enough to only get the ones too short) and deal with the ones too short sem-manually – Dunkle