I am trying a grid search to perform model selection by fitting SARIMAX(p, d, q)x(P, D, Q, s) models using SARIMAX()
method in statsmodels. I do set d
and D
to 1 and s
to 7 and iterate over values of p
in {0, 1}
, q
in {0, 1, 2}
, P
in {0, 1}
, Q
in {0, 1}
, and trend
in {None, 'c'}
, which makes for a total of 48 iterations. During the model fitting phase, if any combination of the parameters leads to a non-stationary or non-invertible model, I move to the next combination of parameters.
I have a set of time-series, each one representing the performance of an agent over time and consisting of 83 (daily) measurements with a weekly seasonality. I keep 90% of the data for model fitting, and the last 10% for forecasting/testing purposes.
What I find is that model fitting during the grid search takes a very long time, about 11 minutes, for a couple of agents, whereas the same 48 iterations take much less time, less than 10 seconds, for others.
However, if, before performing my grid search, I log-transform the data corresponding to the agents whose analyses take a very long time, the same 48 iterations take about 15 seconds! However, as much as I love the speed-up factor, the final forecast turns out to be poorer compared to the case where the original (that is, not log-transformed) data was used. So, I'd rather keep the data in its original format.
My questions are the following:
What causes such slow down for certain time-serires?
And is there a way to speed-up the model fitting by giving SARIMAX()
or SARIMAX.fit()
certain arguments? I have tried simple_differencing = True
which, constructing a smaller model in the state-space, reduced the time from 11 minutes to 6 minutes, but that's still too long.
I'd appreciate any help.
start_params
. For example, when trying different models it's often possible to warm start, by adding zeros for extra parameters or dropping parameters. – Tellostart_params
, by first performing a burn-in.fit()
and then fitting the final model using.fit(params = <burn_in_model>.params)
. The problem is that I get a speed-up IF I setenforce_stationarity
andenforce_invertibility
toFalse
for the burn-in phase (the final model fit is performed with these two parameters set toTrue
). I don't know if (A) my final estimates are biased by not enforcing stationarity and invertiblity in the burn-in phase and if (B) the burn-in parameters would lead to the best fit possible (given the data and the model). – Seventeenparams
combines the different coefficients. (I don't remember for SARIMAX) – Tellosimple_differencing = True
. It cut down the run time from ~ 11 minutes down to 14 seconds! The best-fit models are largely the same and have similar AICc scores. I'll stick with this for now, but will try your suggestion when I get a bit more free time to implement it. Cheers! – Seventeen.fit()
. I'd getWarning: Desired error not necessarily achieved due to precision loss.
whenmethod = 'bfgs'
is used, and thenm
algorithm exceeds the maximum number of iterations. – Seventeennm
to converge by increasing the max iter to a ridiculously high value. – Wallsend