Groupby sum and count on multiple columns in python
Asked Answered
C

4

14

I have a pandas dataframe that looks like this

ID     country   month   revenue  profit   ebit
234    USA       201409   10        5       3
344    USA       201409    9        7       2
532    UK        201410    20       10      5
129    Canada    201411    15       10      5

I want to group by ID, country, month and count the IDs per month and country and sum the revenue, profit, ebit. The output for the above data would be:

 country   month    revenue   profit  ebit   count
   USA     201409     19        12      5      2
   UK      201409     20        10      5      1
   Canada  201411     15        10      5      1

I have tried different variations of groupby, sum and count functions of pandas but I am unable to figure out how to apply groupby sum and count all together to give the result as shown. Please share any ideas that you might have. Thanks!

Cutshall answered 13/2, 2018 at 14:3 Comment(1)
Can you post the code of the tries as well, what approach have you tried?Formularize
F
28

It can be done using pivot_table this way:

>>> df1=pd.pivot_table(df, index=['country','month'],values=['revenue','profit','ebit'],aggfunc=np.sum)
>>> df1 
                ebit  profit  revenue
country month                        
Canada  201411     5      10       15
UK      201410     5      10       20
USA     201409     5      12       19

>>> df2=pd.pivot_table(df, index=['country','month'], values='ID',aggfunc=len).rename('count')
>>> df2

country  month 
Canada   201411    1
UK       201410    1
USA      201409    2

>>> pd.concat([df1,df2],axis=1)

                ebit  profit  revenue  count
country month                               
Canada  201411     5      10       15      1
UK      201410     5      10       20      1
USA     201409     5      12       19      2

UPDATE

It can be done in one-line using pivot_table and providing a dict of functions to apply to each column in the aggfunc argument:

pd.pivot_table(
   df,
   index=['country','month'],
   aggfunc={'revenue': np.sum, 'profit': np.sum, 'ebit': np.sum, 'ID': len}
).rename(columns={'ID': 'count'})

                count  ebit  profit  revenue
country month                               
Canada  201411      1     5      10       15
UK      201410      1     5      10       20
USA     201409      2     5      12       19
Feebleminded answered 13/2, 2018 at 14:30 Comment(1)
would you mind looking into this too please? It just got more tricky! #48786333Cutshall
H
19

You can do the groupby, and then map the counts of each country to a new column.

g = df.groupby(['country', 'month'])['revenue', 'profit', 'ebit'].sum().reset_index()
g['count'] = g['country'].map(df['country'].value_counts())
g

Out[3]:


    country  month   revenue  profit  ebit  count
0   Canada   201411  15       10      5     1
1   UK       201410  20       10      5     1
2   USA      201409  19       12      5     2

Edit

To get the counts per country and month, you can do another groupby, and then join the two DataFrames together.

g = df.groupby(['country', 'month'])['revenue', 'profit', 'ebit'].sum()
j = df.groupby(['country', 'month']).size().to_frame('count')
pd.merge(g, j, left_index=True, right_index=True).reset_index()

Out[6]:

    country  month   revenue  profit  ebit  count
0   Canada   201411  15       10      5     1
1   UK       201410  20       10      5     1
2   UK       201411  10       5       2     1
3   USA      201409  19       12      5     2

I added another record for the UK with a different date - notice how there are now two UK entries in the merged DataFrame, with the appropriate counts.

Hyperparathyroidism answered 13/2, 2018 at 14:21 Comment(2)
Thanks for your help Ben. But this solution is not taking into account the month. I need a count value for all the IDs against a each unique combination of country and month.Cutshall
great! Thank you!Cutshall
M
6

The following solution seems the simplest.

Group by country and month:

grouped_df = df.groupby(['country', 'month'])

Apply sum to columns of interest (revenue, profit, ebit):

final = grouped_df[['revenue', 'profit', 'ebit']].agg('sum')

Assign the size of the grouped_df to a new column in 'final':

final['count'] = grouped_df.size()
print(final)

Out[256]: 
                revenue  profit  ebit  count
country month                               
Canada  201411       15      10     5      1
UK      201410       20      10     5      1
USA     201409       19      12     5      2

All done!

Moncrief answered 7/6, 2021 at 10:11 Comment(0)
P
3

The following groupby solution perhaps is the simplest and exactly what you want.

temp2 = temp1.groupby(['country', 'month'])
        .agg({'revenue': 'sum', 'profit': 'sum', 'ebit': 'sum', 'ID': 'count'})
        .reset_index()
        .rename(columns={'ID': 'count'})
Payola answered 9/11, 2022 at 18:57 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.