Pandas groupby cumulative sum

Asked 26/3, 2014 at 3:17 Answered 15/9, 2022 at 10:25

154

I would like to add a cumulative sum column to my Pandas dataframe so that:

name	day	no
Jack	Monday	10
Jack	Tuesday	20
Jack	Tuesday	10
Jack	Wednesday	50
Jill	Monday	40
Jill	Wednesday	110

becomes:

Jack | Monday     | 10  | 10
Jack | Tuesday    | 30  | 40
Jack | Wednesday  | 50  | 90
Jill | Monday     | 40  | 40
Jill | Wednesday  | 110 | 150

I tried various combos of df.groupby and df.agg(lambda x: cumsum(x)) to no avail.

Disapprobation answered 26/3, 2014 at 3:17 Comment(1)

To create both columns using a one-liner, use this answer. – Daciadacie 16/11, 2022 at 22:0

148

This should do it, need groupby() twice:

df.groupby(['name', 'day']).sum() \
  .groupby(level=0).cumsum().reset_index()

Explanation:

print(df)
   name        day   no
0  Jack     Monday   10
1  Jack    Tuesday   20
2  Jack    Tuesday   10
3  Jack  Wednesday   50
4  Jill     Monday   40
5  Jill  Wednesday  110

# sum per name/day
print( df.groupby(['name', 'day']).sum() )
                 no
name day           
Jack Monday      10
     Tuesday     30
     Wednesday   50
Jill Monday      40
      Wednesday  110

# cumulative sum per name/day
print( df.groupby(['name', 'day']).sum() \
         .groupby(level=0).cumsum() )
                 no
name day           
Jack Monday      10
     Tuesday     40
     Wednesday   90
Jill Monday      40
     Wednesday  150

The dataframe resulting from the first sum is indexed by 'name' and by 'day'. You can see it by printing

df.groupby(['name', 'day']).sum().index

When computing the cumulative sum, you want to do so by 'name', corresponding to the first index (level 0).

Finally, use reset_index to have the names repeated.

df.groupby(['name', 'day']).sum().groupby(level=0).cumsum().reset_index()

   name        day   no
0  Jack     Monday   10
1  Jack    Tuesday   40
2  Jack  Wednesday   90
3  Jill     Monday   40
4  Jill  Wednesday  150

Dews answered 26/3, 2014 at 3:56 Comment(1)

What a brute method to achieve the result, wished this was simple in pandas – Tanberg 29/3, 2023 at 21:53

Modification to @Dmitry's answer. This is simpler and works in pandas 0.19.0:

print(df) 

 name        day   no
0  Jack     Monday   10
1  Jack    Tuesday   20
2  Jack    Tuesday   10
3  Jack  Wednesday   50
4  Jill     Monday   40
5  Jill  Wednesday  110

df['no_csum'] = df.groupby(['name'])['no'].cumsum()

print(df)
   name        day   no  no_csum
0  Jack     Monday   10       10
1  Jack    Tuesday   20       30
2  Jack    Tuesday   10       40
3  Jack  Wednesday   50       90
4  Jill     Monday   40       40
5  Jill  Wednesday  110      150

Ringmaster answered 30/3, 2018 at 16:49 Comment(1)

This works but you need to be careful with the order of the 'day' column. For example, if 'day' was in alphabetical order, 'no_csum' probably wouldn't reflect the information you actually need. – Lunitidal 26/6, 2023 at 7:12

This works in pandas 0.16.2

In[23]: print df
        name          day   no
0      Jack       Monday    10
1      Jack      Tuesday    20
2      Jack      Tuesday    10
3      Jack    Wednesday    50
4      Jill       Monday    40
5      Jill    Wednesday   110
In[24]: df['no_cumulative'] = df.groupby(['name'])['no'].apply(lambda x: x.cumsum())
In[25]: print df
        name          day   no  no_cumulative
0      Jack       Monday    10             10
1      Jack      Tuesday    20             30
2      Jack      Tuesday    10             40
3      Jack    Wednesday    50             90
4      Jill       Monday    40             40
5      Jill    Wednesday   110            150

Percyperdido answered 7/12, 2015 at 10:3 Comment(1)

df.groupby(['name'])['no'].cumsum() also works fine. – Getter 21/9, 2023 at 18:56

you should use

df['cum_no'] = df.no.cumsum()

http://pandas.pydata.org/pandas-docs/version/0.19.2/generated/pandas.DataFrame.cumsum.html

Another way of doing it

import pandas as pd
df = pd.DataFrame({'C1' : ['a','a','a','b','b'],
           'C2' : [1,2,3,4,5]})
df['cumsum'] = df.groupby(by=['C1'])['C2'].transform(lambda x: x.cumsum())
df

Plexiform answered 26/4, 2017 at 4:33 Comment(0)

Instead of df.groupby(by=['name','day']).sum().groupby(level=[0]).cumsum() (see above) you could also do a df.set_index(['name', 'day']).groupby(level=0, as_index=False).cumsum()

df.groupby(by=['name','day']).sum() is actually just moving both columns to a MultiIndex
as_index=False means you do not need to call reset_index afterwards

Redaredact answered 19/7, 2017 at 10:40 Comment(0)

data.csv:

name,day,no
Jack,Monday,10
Jack,Tuesday,20
Jack,Tuesday,10
Jack,Wednesday,50
Jill,Monday,40
Jill,Wednesday,110

Code:

import numpy as np
import pandas as pd

df = pd.read_csv('data.csv')
print(df)
df = df.groupby(['name', 'day'])['no'].sum().reset_index()
print(df)
df['cumsum'] = df.groupby(['name'])['no'].apply(lambda x: x.cumsum())
print(df)

Output:

   name        day   no
0  Jack     Monday   10
1  Jack    Tuesday   20
2  Jack    Tuesday   10
3  Jack  Wednesday   50
4  Jill     Monday   40
5  Jill  Wednesday  110
   name        day   no
0  Jack     Monday   10
1  Jack    Tuesday   30
2  Jack  Wednesday   50
3  Jill     Monday   40
4  Jill  Wednesday  110
   name        day   no  cumsum
0  Jack     Monday   10      10
1  Jack    Tuesday   30      40
2  Jack  Wednesday   50      90
3  Jill     Monday   40      40
4  Jill  Wednesday  110     150

Arlon answered 4/11, 2020 at 10:56 Comment(0)

as of version 1.0 pandas got a new api for window functions.

specifically, what was achieved earlier with

df.groupby(['name'])['no'].apply(lambda x: x.cumsum())

df.set_index(['name', 'day']).groupby(level=0, as_index=False).cumsum()

now becomes

df.groupby(['name'])['no'].expanding().sum()

I find it more intuitive for all window-related functions than groupby+level operations

although learning to use groupby is useful for general purpose.
see docs: https://pandas.pydata.org/docs/user_guide/window.html

Streetlight answered 14/9, 2022 at 19:48 Comment(0)

If you want to write a one-liner (perhaps you want to pass the methods into a pipeline), you can do so by first setting as_index parameter of groupby method to False to return a dataframe from the aggregation step and use assign() to assign a new column to it (the cumulative sum for each person).

These chained methods return a new dataframe, so you'll need to assign it to a variable (e.g. agg_df) to be able to use it later on.

agg_df = (
    # aggregate df by name and day
    df.groupby(['name','day'], as_index=False)['no'].sum()
    .assign(
        # assign the cumulative sum of each name as a new column
        cumulative_sum=lambda x: x.groupby('name')['no'].cumsum()
    )
)

Daciadacie answered 15/9, 2022 at 10:25 Comment(2)

How can we be sure that "cumsum" is executed in the "day" order? – Pirog 23/11, 2023 at 15:12

@JigidiSarnath you’ll have to sort the groupby result by day (before the call to cumsum) if you want the cumsum to be executed in day order. See this post for ways to sort the frame. – Daciadacie 23/11, 2023 at 16:25

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags