Getting max values from pandas multiindex dataframe

Asked 23/2, 2018 at 21:20 Answered 3/10, 2022 at 11:56

Im trying to retrieve only the max values (including the multi index values) from a pandas dataframe that has multiple indexes. The dataframe I have is generated via a groupby and column selection ('tOfmAJyI') like this:

df.groupby('id')['tOfmAJyI'].value_counts()

Out[4]: 
id     tOfmAJyI
3      mlNXN       4
       SSvEP       2
       hCIpw       2
5      SSvEP       2
       hCIpw       1
       mlNXN       1
11     mlNXN       2
       SSvEP       1
...

What I would like to achieve is to get the max values including their corresponding index values. So something like:

id     tOfmAJyI
3      mlNXN       4
5      SSvEP       2
11     mlNXN       2
...

Any ideas how I can achieve this? I was able to get the id and max value but I'm still trying to get the corresponding value of 'tOfmAJyI'.

Ceporah answered 23/2, 2018 at 21:20 Comment(1)

That was a really clever way to use value_counts(). Thanks :) – Galactopoietic 23/2, 2018 at 21:36

groupby + head

df.groupby(level=0).head(1)
Out[1882]: 
id  tOfmAJyI
3   mlNXN       4
5   SSvEP       2
11  mlNXN       2
Name: V, dtype: int64

df.loc[df.groupby(level=0).idxmax()]
Out[1888]: 
id  tOfmAJyI
3   mlNXN       4
5   SSvEP       2
11  mlNXN       2
Name: V, dtype: int64

Monetary answered 23/2, 2018 at 21:38 Comment(6)

Yes, first solution worked for me. 2nd one throws an error : NotImplementedError: Indexing a MultiIndex with a DataFrame key is not implemented. – Ceporah 24/2, 2018 at 20:13

@RamonAnkersmit then you can use 1st method ，multiple index always come with some funny error – Monetary 24/2, 2018 at 22:43

The first solution doesn't return MAX, it returns just the 1st value which may not always be the max in the set. – Perzan 16/7, 2021 at 11:0

I second what @GaryFrewin says, the first answer is just wrong I think – Parachronism 7/2, 2022 at 23:23

@GaryFrewin in this case , since he do value_counts which return the ordered series , so the first solution will work only for this, in general case we can add sort_values – Monetary 8/2, 2022 at 0:24

@Parachronism see above ~ – Monetary 8/2, 2022 at 0:24

I can't understand why the practical solution for this problem isn't mentioned anywhere!?

Just do this:

For DataFrame DF with Keys KEY1,KEY2 where you want the max value for every KEY1, including KEY2:

DF.groupby('KEY1').apply(lambda x: x.max())

And you'll get the maximum for each KEY1 INCLUDING the Information which KEY2 holds the maximum, relative to each KEY1.

Megacycle answered 2/7, 2020 at 8:35 Comment(1)

this does not use the multiindex, in fact it ignores it and it would not work without a .reset_index() call first – Parachronism 7/2, 2022 at 23:23

I had a similar question and I don't think currently this question has a good answer.

My solution was this, I think it's cleaner:

df.groupby(level=0).nlargest(1)

this keeps the multiindex object and doesn't need a lambda function

Parachronism answered 7/2, 2022 at 23:22 Comment(0)

If you don't have previously sorted values, I think the best general answer is this variation of the one by ffggk which avoids duplicate index.

df.groupby(level=0, group_keys=False).nlargest(1)

Example:

>> df

id  tOfmAJyI
3   mlNXN       4
    SSvEP       2
    hCIpw       2
5   SSvEP       2
    hCIpw       1
    mlNXN       1
11  mlNXN       2
    SSvEP       1
Name: val, dtype: int64


>> df.groupby(level=0, group_keys=False).nlargest(1)

id  tOfmAJyI
3   mlNXN       4
5   SSvEP       2
11  mlNXN       2
Name: val, dtype: int64

Cry answered 3/10, 2022 at 11:56 Comment(0)

Recommended topics

Hot tags