I have a dataframe which shall be grouped and then on each group several functions shall be applied. Normally, I would do this with groupby().agg()
(cf. Apply multiple functions to multiple groupby columns), but the functions I'm interested do not need one column as input but multiple columns.
I learned that, when I have one function that has multiple columns as input, I need apply
(cf. Pandas DataFrame aggregate function using multiple columns).
But what do I need, when I have multiple functions that have multiple columns as input?
import pandas as pd
df = pd.DataFrame({'x':[2, 3, -10, -10], 'y':[10, 13, 20, 30], 'id':['a', 'a', 'b', 'b']})
def mindist(data): #of course these functions are more complicated in reality
return min(data['y'] - data['x'])
def maxdist(data):
return max(data['y'] - data['x'])
I would expect something like df.groupby('id').apply([mindist, maxdist])
min max
id
a 8 10
b 30 40
(achieved with pd.DataFrame({'mindist':df.groupby('id').apply(mindist),'maxdist':df.groupby('id').apply(maxdist)}
- which obviously isn't very handy if I have a dozend of functions to apply on the grouped dataframe). Initially I thought this OP had the same question, but he seems to be fine with aggregate
, meaning his functions take only one column as input.