It's pretty easy to write a function that computes the maximum drawdown of a time series. It takes a small bit of thinking to write it in O(n)
time instead of O(n^2)
time. But it's not that bad. This will work:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
def max_dd(ser):
max2here = pd.expanding_max(ser)
dd2here = ser - max2here
return dd2here.min()
Let's set up a brief series to play with to try it out:
np.random.seed(0)
n = 100
s = pd.Series(np.random.randn(n).cumsum())
s.plot()
plt.show()
As expected, max_dd(s)
winds up showing something right around -17.6. Good, great, grand. Now say I'm interested in computing the rolling drawdown of this Series. I.e. for each step, I want to compute the maximum drawdown from the preceding sub series of a specified length. This is easy to do using pd.rolling_apply
. It works like so:
rolling_dd = pd.rolling_apply(s, 10, max_dd, min_periods=0)
df = pd.concat([s, rolling_dd], axis=1)
df.columns = ['s', 'rol_dd_10']
df.plot()
This works perfectly. But it feels very slow. Is there a particularly slick algorithm in pandas or another toolkit to do this fast? I took a shot at writing something bespoke: it keeps track of all sorts of intermediate data (locations of observed maxima, locations of previously found drawdowns) to cut down on lots of redundant calculations. It does save some time, but not a whole lot, and not nearly as much as should be possible.
I think it's because of all the looping overhead in Python/Numpy/Pandas. But I'm not currently fluent enough in Cython to really know how to begin attacking this from that angle. I was hoping someone had tried this before. Or, perhaps, that someone might want to have a look at my "handmade" code and be willing to help me convert it to Cython.
Edit: For anyone who wants a review of all the functions mentioned here (and some others!) have a look at the iPython notebook at: http://nbviewer.ipython.org/gist/8one6/8506455
It shows how some of the approaches to this problem relate, checks that they give the same results, and shows their runtimes on data of various sizes.
If anyone is interested, the "bespoke" algorithm I alluded to in my post is rolling_dd_custom
. I think that could be a very fast solution if implemented in Cython.
data
, with:data.expand().max()
– Merchandising