Calculating cumulative returns with pandas dataframe
Asked Answered
C

3

19

I have this dataframe

Poloniex_DOGE_BTC   Poloniex_XMR_BTC    Daily_rets  perc_ret
172 0.006085    -0.000839   0.003309    0
173 0.006229    0.002111    0.005135    0
174 0.000000    -0.001651   0.004203    0
175 0.000000    0.007743    0.005313    0
176 0.000000    -0.001013   -0.003466   0
177 0.000000    -0.000550   0.000772    0
178 0.000000    -0.009864   0.001764    0

I'm trying to make a running total of daily_rets in perc_ret

however my code just copies the values from daily_rets

df['perc_ret'] = (  df['Daily_rets'] + df['perc_ret'].shift(1) )


Poloniex_DOGE_BTC   Poloniex_XMR_BTC    Daily_rets  perc_ret
172 0.006085    -0.000839   0.003309    NaN
173 0.006229    0.002111    0.005135    0.005135
174 0.000000    -0.001651   0.004203    0.004203
175 0.000000    0.007743    0.005313    0.005313
176 0.000000    -0.001013   -0.003466   -0.003466
177 0.000000    -0.000550   0.000772    0.000772
178 0.000000    -0.009864   0.001764    0.001764
Crackerjack answered 12/2, 2016 at 14:52 Comment(0)
T
7

If performance is important, use numpy.cumprod:

np.cumprod(1 + df['Daily_rets'].values) - 1

Timings:

#7k rows
df = pd.concat([df] * 1000, ignore_index=True)

In [191]: %timeit np.cumprod(1 + df['Daily_rets'].values) - 1
41 µs ± 282 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [192]: %timeit (1 + df.Daily_rets).cumprod() - 1
554 µs ± 3.63 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Transposition answered 12/2, 2016 at 15:8 Comment(3)
This answer is incorrect. fractional returns cannot be simply added together, as the return for tomorrow needs to take into account the return for today. See Alexander's answer below for a correct answer.Iroquoian
@Iroquoian - Agree, so changed answer.Transposition
@EndreMoen - Is possible post new question with sample data, expected output? Because I dont know what you meanTransposition
P
44

If they are daily simple returns and you want a cumulative return, surely you must want a daily compounded number?

df['perc_ret'] = (1 + df.Daily_rets).cumprod() - 1  
# Or:
# df.Daily_rets.add(1).cumprod().sub(1)

>>> df
     Poloniex_DOGE_BTC  Poloniex_XMR_BTC  Daily_rets  perc_ret
172           0.006085         -0.000839    0.003309  0.003309
173           0.006229          0.002111    0.005135  0.008461
174           0.000000         -0.001651    0.004203  0.012700
175           0.000000          0.007743    0.005313  0.018080
176           0.000000         -0.001013   -0.003466  0.014551
177           0.000000         -0.000550    0.000772  0.015335
178           0.000000         -0.009864    0.001764  0.017126

If they are log returns, then you could just use cumsum.

Prolocutor answered 12/2, 2016 at 16:38 Comment(0)
T
7

If performance is important, use numpy.cumprod:

np.cumprod(1 + df['Daily_rets'].values) - 1

Timings:

#7k rows
df = pd.concat([df] * 1000, ignore_index=True)

In [191]: %timeit np.cumprod(1 + df['Daily_rets'].values) - 1
41 µs ± 282 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [192]: %timeit (1 + df.Daily_rets).cumprod() - 1
554 µs ± 3.63 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Transposition answered 12/2, 2016 at 15:8 Comment(3)
This answer is incorrect. fractional returns cannot be simply added together, as the return for tomorrow needs to take into account the return for today. See Alexander's answer below for a correct answer.Iroquoian
@Iroquoian - Agree, so changed answer.Transposition
@EndreMoen - Is possible post new question with sample data, expected output? Because I dont know what you meanTransposition
O
5

you just cannot simply add them all by using cumsum

for example, if you have array [1.1, 1.1], you supposed to have 2.21, not 2.2

import numpy as np

# daily return:
df['daily_return'] = df['close'].pct_change()

# calculate cumluative return
df['cumluative_return'] = np.exp(np.log1p(df['daily_return']).cumsum())
Overhaul answered 8/5, 2017 at 4:49 Comment(3)
Do we need to - 1 in the end for np.exp(np.log1p(df['daily_return']).cumsum()) ?Affiche
yes, the right formula is: np.exp(np.log1p(df['daily_return']).cumsum()) - 1Hicks
Alternatively use the np.expm1 function directly. It is more precise numericallySulemasulf

© 2022 - 2024 — McMap. All rights reserved.