How to perform multi-step out-of-time forecast which does not involve refitting the ARIMA model?
Asked Answered
A

3

10

I have an already existing ARIMA (p,d,q) model fit to a time-series data (for ex, data[0:100]) using python. I would like to do forecasts (forecast[100:120]) with this model. However, given that I also have the future true data (eg: data[100:120]), how do I ensure that the multi-step forecast takes into account the future true data that I have instead of using the data it forecasted?

In essence, when forecasting I would like forecast[101] to be computed using data[100] instead of forecast[100].

I would like to avoid refitting the entire ARIMA model at every time step with the updated "history".

I fit the ARIMAX model as follows:

train, test = data[:100], data[100:]
ext_train, ext_test = external[:100], external[100:]
model = ARIMA(train, order=(p, d, q), exog=ext_train)
model_fit = model.fit(displ=False)

Now, the following code allows me to predict values for the entire dataset, including the test

forecast = model_fit.predict(end=len(data)-1, exog=external, dynamic=False)

However in this case after 100 steps, the ARIMAX predicted values quickly converge to the long-run mean (as expected, since after 100 time steps it is using the forecasted values only). I would like to know if there is a way to provide the "future" true values to give better online predictions. Something along the lines of:

forecast = model_fit.predict_fn(end = len(data)-1, exog=external, true=data, dynamic=False)

I know I can always keep refitting the ARIMAX model by doing

historical = train
historical_ext = ext_train
predictions = []

for t in range(len(test)):
    model = ARIMA(historical, order=(p,d,q), exog=historical_ext)
    model_fit = model.fit(disp=False)
    output = model_fit.forecast(exog=ext_test[t])[0]
    predictions.append(output)
    observed = test[t]
    historical.append(observed)
    historical_ext.append(ext_test[t])

but this leads to me training the ARIMAX model again and again which doesn't make a lot of sense to me. It leads to using a lot of computational resources and is quite impractical. It further makes it difficult to evaluate the ARIMAX model cause the fitted params to keep on changing every iteration.

Is there something incorrect about my understanding/use of the ARIMAX model?

Atalya answered 28/5, 2019 at 6:11 Comment(4)
I have the exact same question. Did you find an answer?Eisenstein
Sadly, no. I could not find an easy way to do this. I believe the R package has some support for this, but I couldn't port everything I already had to R.Atalya
it is correct @john.LudlumBaltimore
You might find this helpful if you have not solved your problem:statsmodels.org/dev/examples/notebooks/generated/…Mesozoic
I
2

I was struggling with this problem. Luckily, I found a very useful discussion about it. As far as I know, the case is not supported by ARIMA in python, we need to use SARIMAX.

You can refer to the link of discussion: https://github.com/statsmodels/statsmodels/issues/2788

Insectile answered 7/4, 2020 at 9:5 Comment(1)
wow this is great thanks! if only i had found this back when i was doing thisAtalya
G
10

You are right, if you want to do online forecasting using new data you will need to estimate the parameters over and over again which is computationally inefficient. One thing to note is that for the ARIMA model mainly the estimation of the parameters of the MA part of the model is computationally heavy, since these parameters are estimated using numerical optimization, not using ordinary least squares. Since after calculating the parameters once for the initial model you know what is expected for future models, since one observation won't change them much, you might be able to initialize the search for the parameters to improve computational efficiency.

Also, there may be a method to do the estimation more efficiently, since you have your old data and parameters for the model, the only thing you do is add one more datapoint. This means that you only need to calculate the theta and phi parameters for the combination of the new datapoint with all the others, while not computing the known combinations again, which would save quite some time. I very much like this book: Heij, Christiaan, et al. Econometric methods with applications in business and economics. Oxford University Press, 2004.

And this lecture might give you some idea of how this might be feasible: lecture on ARIMA parameter estimation

You would have to implement this yourself, I'm afraid. As far as I can tell, there is nothing readily available to do this.

Hope this gives you some new ideas!

Godmother answered 13/10, 2019 at 2:24 Comment(0)
Q
3

As this very good blog suggests (3 facts about time series forecasting that surprise experienced machine learning practitioners):

"You need to retrain your model every time you want to generate a new prediction", it also gives the intuitive understanding of why this happens with examples.
That basically highlights time-series forecasting challenge as a constant change, that needs refitting.

Queer answered 15/10, 2019 at 9:18 Comment(2)
IMHO you should get more in detail. This is close to a link-only answer.Alumna
I understand, I just tried not repeating the blog, and summarized a very short explanation.Queer
I
2

I was struggling with this problem. Luckily, I found a very useful discussion about it. As far as I know, the case is not supported by ARIMA in python, we need to use SARIMAX.

You can refer to the link of discussion: https://github.com/statsmodels/statsmodels/issues/2788

Insectile answered 7/4, 2020 at 9:5 Comment(1)
wow this is great thanks! if only i had found this back when i was doing thisAtalya

© 2022 - 2024 — McMap. All rights reserved.