Python pandas cumsum() reset after hitting max
Asked Answered
E

1

6

I have a pandas DataFrame with timedeltas as a cumulative sum of those deltas in a separate column expressed in milliseconds. An example is provided below:

Transaction_ID  Time            TimeDelta       CumSum[ms]
1              00:00:04.500     00:00:00.000    000
2              00:00:04.600     00:00:00.100    100
3              00:00:04.762     00:00:00.162    262
4              00:00:05.543     00:00:00.781    1043
5              00:00:09.567     00:00:04.024    5067
6              00:00:10.654     00:00:01.087    6154
7              00:00:14.300     00:00:03.646    9800
8              00:00:14.532     00:00:00.232    10032
9              00:00:16.500     00:00:01.968    12000
10             00:00:17.543     00:00:01.043    13043

I would like to be able to provide a maximum value for CumSum[ms] after which the cumulative sum would start over again at 0. For example, if the maximum value was 3000 in the above example, the results would look like so:

Transaction_ID  Time            TimeDelta       CumSum[ms]
1              00:00:04.500     00:00:00.000    000
2              00:00:04.600     00:00:00.100    100
3              00:00:04.762     00:00:00.162    262
4              00:00:05.543     00:00:00.781    1043
5              00:00:09.567     00:00:04.024    0
6              00:00:10.654     00:00:01.087    1087
7              00:00:14.300     00:00:03.646    0
8              00:00:14.532     00:00:00.232    232
9              00:00:16.500     00:00:01.968    2200
10             00:00:17.543     00:00:01.043    0

I have explored using the modulo operator, but am only successful in resetting back to zero when the resulting cumsum is equal to the limit provided (i.e. cumsum[ms] of 500 % 500 equals zero).

Thanks in advance for any thoughts you may have, and please let me know if I can provide any more information.

Estaminet answered 25/11, 2014 at 17:14 Comment(1)
One way could be to use expanding_apply - https://mcmap.net/q/1772024/-reset-cumsum-if-over-limit-pythonCasey
D
11

Here's an example of how you might do this by iterating over each row in the dataframe. I created new data for the example for simplicity:

df = pd.DataFrame({'TimeDelta': np.random.normal( 900, 60, size=100)})
print df.head()
    TimeDelta
0  971.021295
1  734.359861
2  867.000397
3  992.166539
4  853.281131

So let's do an accumulator loop with your desired 3000 max:

maxvalue = 3000

lastvalue = 0
newcum = []
for row in df.iterrows():
    thisvalue =  row[1]['TimeDelta'] + lastvalue
    if thisvalue > maxvalue:
        thisvalue = 0
    newcum.append( thisvalue )
    lastvalue = thisvalue

Then put the newcom list into the dataframe:

df['newcum'] = newcum
print df.head()
    TimeDelta       newcum
0  801.977678   801.977678
1  893.296429  1695.274107
2  935.303566  2630.577673
3  850.719497     0.000000
4  951.554206   951.554206
Duarte answered 25/11, 2014 at 17:48 Comment(3)
This works perfectly. Looks like I was over-complicating things with cumsum(). Thanks very much for chiming in.Estaminet
The advantage of the numpy accumulators that Pandas supports is how very very fast they are. The iterating is done in C, I presume. Iterating over rows in Pandas is always much slower. But sometimes it's a reasonably pragmatic thing to do.Duarte
thank you. if I have another condition > reset after hiting max+reset when ID change... what sould I do?Avail

© 2022 - 2024 — McMap. All rights reserved.