Python pandas dataframe group by based on a condition
Asked Answered
D

3

26

My question is simple, I have a dataframe and I groupby the results based on a column and get the size like this:

df.groupby('column').size()

Now the problem is that I only want the ones where size is greater than X. I am wondering if I can do it using a lambda function or anything similar? I have already tried this:

df.groupby('column').size() > X

and it prints out some True and False values.

Dunaville answered 8/7, 2015 at 20:48 Comment(0)
T
29

The grouped result is a regular DataFrame, so just filter the results as usual:

 import pandas as pd

 df = pd.DataFrame({'a': ['a', 'b', 'a', 'a', 'b', 'c', 'd']})
 after = df.groupby('a').size()
 >> after
 a
 a    3
 b    2
 c    1
 d    1
 dtype: int64

 >> after[after > 2]
 a
 a    3
 dtype: int64
Townsfolk answered 8/7, 2015 at 20:59 Comment(0)
G
30

Try this code:

df.groupby('column').filter(lambda group: group.size > X)
Glowworm answered 8/7, 2015 at 20:59 Comment(0)
T
29

The grouped result is a regular DataFrame, so just filter the results as usual:

 import pandas as pd

 df = pd.DataFrame({'a': ['a', 'b', 'a', 'a', 'b', 'c', 'd']})
 after = df.groupby('a').size()
 >> after
 a
 a    3
 b    2
 c    1
 d    1
 dtype: int64

 >> after[after > 2]
 a
 a    3
 dtype: int64
Townsfolk answered 8/7, 2015 at 20:59 Comment(0)
K
1

In Pandas > 2.0

and as per here you can use the function count() to achieve this.

See below:

df.groupby("group_col")\
  .filter(lambda x: x['another_col'].count() > X)\
  .groupby("group_col").size()

You can also emit the last groupby to get only those rows that have count greater than X

Kuibyshev answered 6/7, 2023 at 10:47 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.