Python pandas dataframe: find max for each unique values of an another column
Asked Answered
C

2

10

I have a large dataframe (from 500k to 1M rows) which contains for example these 3 numeric columns: ID, A, B

I want to filter the results in order to obtain a table like the one in the image below, where, for each unique value of column id, i have the maximum and minimum value of A and B. How can i do?

EDIT: i have updated the image below in order to be more clear: when i get the max or min from a column i need to get also the data associated to it of the others columns

enter image description here

Catholicism answered 12/11, 2016 at 22:34 Comment(2)
what do you do if several rows have the same min or max value/Noctilucent
good question by @Boud. It didn't matter in my first answer (which interpreted the question incorrectly). My current answer is using idxmax/min which will take the first row in the event of ties. I think this becomes a much harder question without some assumption like that...Rumple
R
9

Sample data (note that you posted an image which can't be used by potential answerers without retyping, so I'm making a simple example in its place):

df=pd.DataFrame({ 'id':[1,1,1,1,2,2,2,2],
                   'a':range(8), 'b':range(8,0,-1) })

The key to this is just using idxmax and idxmin and then futzing with the indexes so that you can merge things in a readable way. Here's the whole answer and you may wish to examine intermediate dataframes to see how this is working.

df_max = df.groupby('id').idxmax()
df_max['type'] = 'max'
df_min = df.groupby('id').idxmin()
df_min['type'] = 'min'

df2 = df_max.append(df_min).set_index('type',append=True).stack().rename('index')

df3 = pd.concat([ df2.reset_index().drop('id',axis=1).set_index('index'), 
                  df.loc[df2.values] ], axis=1 )

df3.set_index(['id','level_2','type']).sort_index()

                 a  b
id level_2 type      
1  a       max   3  5
           min   0  8
   b       max   0  8
           min   3  5
2  a       max   7  1
           min   4  4
   b       max   4  4
           min   7  1

Note in particular that df2 looks like this:

id  type   
1   max   a    3
          b    0
2   max   a    7
          b    4
1   min   a    0
          b    3
2   min   a    4
          b    7

The last column there holds the index values in df that were derived with idxmax & idxmin. So basically all the information you need is in df2. The rest of it is just a matter of merging back with df and making it more readable.

Rumple answered 13/11, 2016 at 0:22 Comment(1)
many thanks for your response. This is almost what i need. I have updated the question in order to be more clearCatholicism
P
4

For anyone looking to get min and max values of a specific column where there is a unique ID, this is how I modified the above code:

df_maxA = df.groupby('id').max()['A']
df_maxA['type'] = 'max'
df_minA = df.groupby('id').max()['A']
df_minA['type'] = 'min'

df_maxB = df.groupby('id').max()['B']
df_maxB['type'] = 'max'
df_minB = df.groupby('id').max()['B']
df_minB['type'] = 'min'

Then you can merge these together to create a single dataframe.

Philpott answered 7/4, 2022 at 14:37 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.