For example, I have the following table:
index,A,B
0,0,0
1,0,8
2,0,8
3,1,5
4,1,3
After grouping by A
:
0:
index,A,B
0,0,0
1,0,8
2,0,8
1:
index,A,B
3,1,5
4,1,3
What I need is to drop rows from each group, where the number in column B
is less than maximum value from all rows from group's column B
. Well I have a problem translating and formulating this problem to English so here is the example:
Maximum value from rows in column B
in group 0
: 8
So I want to drop row with index 0
and keep rows with indexes 1
and 2
Maximum value from rows in column B
in group 1
: 5
So I want to drop row with index 4
and keep row with index 3
I have tried to use pandas filter function, but the problem is that it is operating on all rows in group at one time:
data = <example table>
grouped = data.groupby("A")
filtered = grouped.filter(lambda x: x["B"] == x["B"].max())
So what I ideally need is some filter, which iterates through all rows in group.
Thanks for help!
P.S. Is there also way to only delete rows in groups and do not return DataFrame
object?
df.query
andpd.eval
seem like good fits for this use case. For information on thepd.eval()
family of functions, their features and use cases, please visit Dynamic Expression Evaluation in pandas using pd.eval(). – Chronometry