Replace NaN values in all levels of a Pandas MultiIndex
Asked Answered
L

3

5

After reading in an excel sheet with a MultiIndex, I am getting np.nan appearing in the index because some of the values are 'N/A' and pd.read_excel thinks it's a good idea to convert them. However I want to keep them as 'N/A' to preserve the multi-index. I thought it would be easy to change them back using MultiIndex.fillna but I get this error:

index = pd.MultiIndex(levels=[[u'foo', u'bar'], [u'one', np.nan]],
           codes=[[0, 0, 1, 1], [0, 1, 0, 1]],
           names=[u'first', u'second'])
df = pd.DataFrame(index=index, columns=['A', 'B'])
df

enter image description here

df.index.fillna("N/A")

Output:

---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
<ipython-input-17-09e14dcdc74f> in <module>
----> 1 df.index.fillna("N/A")

/anaconda3/envs/torch/lib/python3.7/site-packages/pandas/core/indexes/multi.py in fillna(self, value, downcast)
   1456         fillna is not implemented for MultiIndex
   1457         """
-> 1458         raise NotImplementedError("isna is not defined for MultiIndex")
   1459 
   1460     @Appender(_index_shared_docs["dropna"])

NotImplementedError: isna is not defined for MultiIndex

Update:

Code updated to reflect Pandas 1.0.2. Prior to version 0.24.0 the codes attribute of pd.MultiIndex was called labels. Also, the traceback details changed from isnull is not defined to isna is not defined as above.

Labor answered 15/9, 2016 at 2:45 Comment(3)
This is a kind of solution I guess: new_index = pd.MultiIndex(levels=[index.levels[0].fillna('N/A'), index.levels[1].fillna('N/A')], labels=index.labels, names=index.names) but my actual dataframe is a 6-level MultiIndex so it's bit messy!Labor
I would suggest to load the dataframe without any index hint, massage the NA stuff, then set_index. Any reason you could not proceed in that order?Echeverria
Thanks @Boud that would also work.Labor
H
5

The accepted solution did not work for me either. It still left NA values in the index even though inspecting the df.index.levels individually did not show NA values.

Jorge's solution pointed me in the right direction but also wasn't quite right for my case. Here is my approach, including handling of the single Index case as discussed in the comments of the accepted answer.

if isinstance(df.index, pd.MultiIndex):
    df.index = pd.MultiIndex.from_frame(
        df.index.to_frame().fillna(my_fillna_value)
    )
else:
    df.index = df.index.fillna(my_fillna_value)
Hardened answered 21/8, 2020 at 18:29 Comment(1)
I tested the previously accepted solution by @piRSquared today and indeed it doesn't seem to work (see my comment). This one and Jorge's both work and I decided to accept this one because it addresses my original question. Thanks.Labor
I
4

Use set_levels

df.index.set_levels([l.fillna('N/A') for l in df.index.levels], inplace=True)
df

enter image description here

Ibbie answered 15/9, 2016 at 4:9 Comment(7)
OP has 6-levels! Update the answer to fix all levels in one shot?Orsino
@NehalJWani Updated post, fully generalized. ThanksIbbie
:-o Didn't know that set_levels supported a List in its argument!Orsino
@NehalJWani, why not? It just needs an iterable. I am guessing a dict will also work, with the keys being converted to level names. Haven't tried it though.Minion
I'm getting an error when i use this: AttributeError Traceback (most recent call last) <ipython-input-30-439020707a4b> in <module>() 1 #replace nans in index ----> 2 df.index.set_levels([l.fillna('N/A') for l in df.index.levels], inplace=True) 3 4 df AttributeError: 'Index' object has no attribute 'set_levels'Contemptible
@Contemptible your attempting to use this on a single level, otherwise known as a normal, index. In that case, you just do something like df.index = df.index.fillna('N/A')Ibbie
There are some issues with this solution (see @totalhack's answer). I assume it worked back in 2016 or I wouldn't have accepted it but anyway, now if I do this it doesn't seem to change the index. Strangely, after executing your code, the values in level 2 of the index are different though: df.index.levels[1].values gives array(['one', 'N/A'], dtype=object) but df.index[1] gives ('foo', nan).Labor
S
4

The current solution didn't work for me when having multi level columns. What i did and worked for me was the following:

df.columns = pd.MultiIndex.from_frame(df.columns.to_frame().fillna(''))
Spindle answered 5/4, 2020 at 2:46 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.