Getting max values from pandas multiindex dataframe
Asked Answered
C

4

11

Im trying to retrieve only the max values (including the multi index values) from a pandas dataframe that has multiple indexes. The dataframe I have is generated via a groupby and column selection ('tOfmAJyI') like this:

df.groupby('id')['tOfmAJyI'].value_counts()

Out[4]: 
id     tOfmAJyI
3      mlNXN       4
       SSvEP       2
       hCIpw       2
5      SSvEP       2
       hCIpw       1
       mlNXN       1
11     mlNXN       2
       SSvEP       1
...

What I would like to achieve is to get the max values including their corresponding index values. So something like:

id     tOfmAJyI
3      mlNXN       4
5      SSvEP       2
11     mlNXN       2
...

Any ideas how I can achieve this? I was able to get the id and max value but I'm still trying to get the corresponding value of 'tOfmAJyI'.

Ceporah answered 23/2, 2018 at 21:20 Comment(1)
That was a really clever way to use value_counts(). Thanks :)Galactopoietic
M
11

groupby + head

df.groupby(level=0).head(1)
Out[1882]: 
id  tOfmAJyI
3   mlNXN       4
5   SSvEP       2
11  mlNXN       2
Name: V, dtype: int64

Or

df.loc[df.groupby(level=0).idxmax()]
Out[1888]: 
id  tOfmAJyI
3   mlNXN       4
5   SSvEP       2
11  mlNXN       2
Name: V, dtype: int64
Monetary answered 23/2, 2018 at 21:38 Comment(6)
Yes, first solution worked for me. 2nd one throws an error : NotImplementedError: Indexing a MultiIndex with a DataFrame key is not implemented.Ceporah
@RamonAnkersmit then you can use 1st method ,multiple index always come with some funny errorMonetary
The first solution doesn't return MAX, it returns just the 1st value which may not always be the max in the set.Perzan
I second what @GaryFrewin says, the first answer is just wrong I thinkParachronism
@GaryFrewin in this case , since he do value_counts which return the ordered series , so the first solution will work only for this, in general case we can add sort_valuesMonetary
@Parachronism see above ~Monetary
M
4

I can't understand why the practical solution for this problem isn't mentioned anywhere!?

Just do this:

For DataFrame DF with Keys KEY1,KEY2 where you want the max value for every KEY1, including KEY2:

DF.groupby('KEY1').apply(lambda x: x.max())

And you'll get the maximum for each KEY1 INCLUDING the Information which KEY2 holds the maximum, relative to each KEY1.

Megacycle answered 2/7, 2020 at 8:35 Comment(1)
this does not use the multiindex, in fact it ignores it and it would not work without a .reset_index() call firstParachronism
P
0

I had a similar question and I don't think currently this question has a good answer.

My solution was this, I think it's cleaner:

df.groupby(level=0).nlargest(1)

this keeps the multiindex object and doesn't need a lambda function

Parachronism answered 7/2, 2022 at 23:22 Comment(0)
C
0

If you don't have previously sorted values, I think the best general answer is this variation of the one by ffggk which avoids duplicate index.

df.groupby(level=0, group_keys=False).nlargest(1)

Example:

>> df

id  tOfmAJyI
3   mlNXN       4
    SSvEP       2
    hCIpw       2
5   SSvEP       2
    hCIpw       1
    mlNXN       1
11  mlNXN       2
    SSvEP       1
Name: val, dtype: int64


>> df.groupby(level=0, group_keys=False).nlargest(1)

id  tOfmAJyI
3   mlNXN       4
5   SSvEP       2
11  mlNXN       2
Name: val, dtype: int64
Cry answered 3/10, 2022 at 11:56 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.