Statsmodels ARMA training data vs test data for prediction
Asked Answered
P

2

6

I'm trying to test ARMA models, and working through the examples provided here:

http://www.statsmodels.org/dev/examples/notebooks/generated/tsa_arma_0.html

I can't tell if there is a straightforward way to train a model on a training dataset then test it on a test dataset. It seems to me that you have to fit the model on an entire dataset. Then you can do in-sample predictions, which use the same dataset as you used to train the model. Or you can do an out of sample prediction, but that has to start at the end of your training dataset. What I would like to do instead is fit the model on a training dataset, then run the model over an entirely different dataset that wasn't part of the training dataset and get a series of 1 step ahead predictions.

To illustrate the issue, here is abbreviated code from the link above. You see that the model is fitting data for 1700-2008 then predicting 1990-2012. The problem I have is that 1990-2008 were already part of the data that was used to fit the model, so I think I'm predicting and training on the same data. I want to be able to get a series of 1 step predictions that don't have look-ahead bias.

import numpy as np
import matplotlib.pyplot as plt
import statsmodels.api as sm

dta = sm.datasets.sunspots.load_pandas().data
dta.index = pandas.Index(sm.tsa.datetools.dates_from_range('1700', '2008'))
dta = dta.drop('YEAR',1)

arma_mod30 = sm.tsa.ARMA(dta, (3, 0)).fit(disp=False)
predict_sunspots = arma_mod30.predict('1990', '2012', dynamic=True)

fig, ax = plt.subplots(figsize=(12, 8))
ax = dta.ix['1950':].plot(ax=ax)
fig = arma_mod30.plot_predict('1990', '2012', dynamic=True, ax=ax, plot_insample=False)

plt.show()

enter image description here

Petaliferous answered 17/5, 2017 at 17:14 Comment(0)
P
4

In the 16 months since I asked this question, I've learned a lot more about ARIMA modeling in statsmodels, and I think that the behavior I'm looking for isn't supported for the ARMA or ARIMA model, but it is supported in the SARIMAX model. See below code, based on the examples from statsmodels.org. The green line represents an ARIMA(10,0,0) model (or AR(10)) model that was trained from 1700-1990, and then dynamically predicted from 1990-2012.

https://www.statsmodels.org/dev/examples/notebooks/generated/statespace_sarimax_stata.html

import pandas
import matplotlib.pyplot as plt
import statsmodels.api as sm

dta = sm.datasets.sunspots.load_pandas().data
dta.index = pandas.Index(sm.tsa.datetools.dates_from_range('1700', '2008'))
dta = dta.drop('YEAR', 1)

arma_mod30 = sm.tsa.ARMA(dta, (3, 0)).fit(disp=False)
predict_sunspots = arma_mod30.predict('1990', '2012', dynamic=True)

fig, ax = plt.subplots(figsize=(12, 8))
ax = dta.ix['1950':].plot(ax=ax)
fig = arma_mod30.plot_predict('1990', '2012', dynamic=True, ax=ax, plot_insample=False)

# Fit the model
mod = sm.tsa.statespace.SARIMAX(dta.loc[:'1990'], order=(10, 0, 0))
fit_res = mod.fit(disp=False)

# Create new model, but instead of fit, copy the params from the first model
mod = sm.tsa.statespace.SARIMAX(dta, order=(10, 0, 0))
res = mod.filter(fit_res.params)

# Dynamic predictions
predict_dy = res.get_prediction(dynamic='1990', end='2012')
predict_dy = predict_dy.predicted_mean
predict_dy['1990':].plot(ax=ax)

plt.show()

enter image description here

Petaliferous answered 20/9, 2018 at 20:30 Comment(0)
P
0

You can slice your data into two datasets. For example, make the training data be a slice of the original data up until January 1st of last year, and make the test data a slice from January of last year until the end. Then, predict for the length of the test set from the fitted model.

Phosphatize answered 19/9, 2018 at 14:5 Comment(1)
Thanks for the answer, but that just gives an out of sample prediction for one point in time at the end of the training period. What I am looking to do is run the model over the test period, evaluating the model over the entire test dataset in the same way that arma_mod30.predict('1990', '2012', dynamic=True) makes a 1 step prediction on each new updated datapoint. I've looked at this a number of times since I first posed the question, and I currently believe this isn't supported for statsmodel ARMA.Petaliferous

© 2022 - 2024 — McMap. All rights reserved.