Cumsum within group and reset on condition in pandas

About

Asked 10/12, 2017 at 17:42 Answered 10/12, 2017 at 18:14

pandas group-by count conditional-statements cumsum

I have a dataframe with two columns ID and Activity. The activity is either 0 or 1. I want a new column containing a increasing number since the last activity was 1. However, the count should only be within one group (ID). If the activity is 1, the counting column should be reset to 0 and the count starts again.

So, I have a dataframe containing the following:

What is want is this:

Can someone help me?

Gaygaya answered 10/12, 2017 at 17:42 Comment(3)

it is really unclear to me .. – Stralsund 10/12, 2017 at 17:56

@Wen Hope this is more clear...? – Gaygaya 10/12, 2017 at 18:6

Don't post images of data, post data. If you post an image, you're saying other people should spend time typing your data in just to reproduce your situation. If you post it as text, people can simple copy/paste. – Yerxa 10/12, 2017 at 18:11

We using a new para 'G' here

df['G']=df.groupby('ID').Activeity.apply(lambda x :(x.diff().ne(0)&x==1)|x==1)

df.groupby([df.ID,df.G.cumsum()]).G.apply(lambda x : (~x).cumsum())

Out[713]: 
0     1
1     2
2     0
3     1
4     2
5     1
6     2
7     0
8     1
9     0
10    1
11    1
12    0
13    0
14    1
15    2
Name: G, dtype: int32

Data input

df=pd.DataFrame({'ID':list('AAAAABBBBBBCCCCC'),'Activeity':[0,0,1,0,0,0,0,1,0,1,0,0,1,1,0,0]})

Explanation :

Here we get the new para 'G'
df['G']=df.groupby('ID').Activeity.apply(lambda x :(x.diff().ne(0)&x==1)|x==1)
df
Out[134]: 
    Activeity ID      G
0           0  A  False
1           0  A  False
2           1  A   True
3           0  A  False
4           0  A  False
5           0  B  False
6           0  B  False
7           1  B   True
8           0  B  False
9           1  B   True
10          0  B  False
11          0  C  False
12          1  C   True
13          1  C   True
14          0  C  False
15          0  C  False

Then we do cumsum for G, is to getting where is the cycle we should set the number to 0

df.G.cumsum()
Out[135]: 
0     0
1     0
2     1
3     1
4     1
5     1
6     1
7     2
8     2
9     3
10    3
11    3
12    4
13    5
14    5
15    5
Name: G, dtype: int32

Stralsund answered 10/12, 2017 at 18:14 Comment(3)

This just works! Can you please explain what logic is being applied here? It especially gets confusing when we are grouping on a cumsum in the second statement. – Imperium 14/12, 2017 at 1:57

@Imperium cumsum in groupby is to get the group ID, I will update the answer adding more explanation – Stralsund 14/12, 2017 at 2:11

Actually, you don't even need the additional column. Cumulative sum can be done right on Activeity. In lambda function switch 0 and 1 using xor. Here is the final oneliner: df.groupby(['ID', df.Activeity.cumsum()]).Activeity.apply(lambda x: (x ^ 1).cumsum()) – Beauteous 30/5, 2019 at 21:12

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags