pandas groupby and then select a row by value of column (min,max, for example) [duplicate]

Asked 15/3, 2019 at 15:47 Answered 20/11, 2019 at 14:57

Let's say I have a dataframe

Category Data1 column1
A 'SOMEDATA' 10
A 'SOMEDATA' 2
A 'SOMEDATA' -10
B 'SOMEDATA' 10
B 'SOMEDATA' 1
B 'SOMEDATA' -10

and so on

I'd like to select a one row in each group by column value. For example, ABS(column1)

So resulting data is

Category Data1 column1
A 'SOMEDATA' 2
B 'SOMEDATA'  1

How can I do this in python?

I couldn't figure out how to return entire row. For example,

df.groupby('Category')['column1'].min();

this would only return 'Category' min(column1) only.

Creditor answered 15/3, 2019 at 15:47 Comment(1)

Welcome to Stack Overflow! Please familiarize with these FAQs and repost your question -- (1) How to create a Minimal, Complete, and Verifiable example - stackoverflow.com/help/mcve and (2) Why is voting important? - stackoverflow.com/help/why-vote. -- Happy Coding!! – Cataract 15/3, 2019 at 15:53

Here is a solution that is more computationally efficient.

TL;DR version

df.loc[df.groupby('Category')['column1'].idxmin()]

Aalst answered 20/11, 2019 at 14:57 Comment(0)

sort then .drop_duplicates, if you want single minimum row based on absolute value.

(df.assign(to_sort = df.column1.abs()).sort_values('to_sort')
     .drop_duplicates('Category').drop(columns='to_sort'))

  Category       Data1  column1
4        B  'SOMEDATA'        1
1        A  'SOMEDATA'        2

Sort can only sort on existing columns, so we need to create the column of absolute values (with .assign). Sorting then ensures the minumum absolute value appears first, and dropping duplicates keeps the first row for each category, which is now the minumum absolute value row.

Also possible with groupby, which is better if you need to return more than one row per group:

df.assign(to_sort = df.column1.abs()).sort_values('to_sort').groupby(df.Category).head(1)

Alternatively, you can slice with the result of a groupby. This is useful in cases where you want to return all rows that match the minimum:

df[df.groupby(df.Category, group_keys=False).apply(lambda x: x.column1 == x.column1.abs().min())]

  Category       Data1  column1
1        A  'SOMEDATA'        2
4        B  'SOMEDATA'        1

Alible answered 15/3, 2019 at 16:5 Comment(0)

Recommended topics

Hot tags