Python ARIMA model, predicted values are shifted
Asked Answered
F

3

9

I am new to Python ARIMA implementation. I have a data at 15 min frequency for few months. In my attempt to follow the Box-Jenkins method to fit a timeseries model. I ran into an issue towards the end. The ACF-PACF graph for the time series (ts) and the difference series (ts_diff) are given. I used ARIMA (5,1,2) and finally I plotted the fitted values(green) and original values(blue). As you can from figure, there is a clear shift(by one) in values. What am I doing wrong?

Is the prediction bad? Any insight will be helpful.

Foregone answered 24/2, 2016 at 5:12 Comment(0)
S
2

This is a standard property of one-step ahead prediction or forecasting.

The information used for the forecast is the history up to and including the previous period. A peak, for example, at a period will affect the forecast for the next period, but cannot influence the forecast for the peak period. This makes the forecasts appear shifted in the plot.

A two-step ahead forecast would give the impression of a shift by two periods.

Schleswigholstein answered 24/2, 2016 at 13:14 Comment(0)
F
1

Just to confirm, I am doing this right then? Here is the code I used.

from statsmodels.tsa.arima_model import ARIMA
model = sm.tsa.ARIMA(ts, order=(5, 1, 2))
model = model.fit()
results_ARIMA=model.predict(typ='levels')
concatenated = pd.concat([ts, results_ARIMA], axis=1, keys=['original', 'predicted'])
concatenated.head(10)
    original    predicted
login_time      
1970-01-01 20:00:00 2   NaN
1970-01-01 20:15:00 6   2.000186
1970-01-01 20:30:00 9   4.552971
1970-01-01 20:45:00 7   7.118973
1970-01-01 21:00:00 1   7.099769
1970-01-01 21:15:00 4   3.624975
1970-01-01 21:30:00 0   3.867454
1970-01-01 21:45:00 4   1.618120
1970-01-01 22:00:00 9   2.997275
1970-01-01 22:15:00 8   6.300015
Foregone answered 24/2, 2016 at 14:21 Comment(0)
R
1

In the model you specify (5, 1, 2), you set d = 1. This means that you are differencing the data by 1, or in other words, performing a shift of your entire range of time-related observations so as to minimize the residuals of the fitted model.

Sometimes, setting d to 1 will result in a ACF / PACF plot with fewer and / or less dramatic spikes (i.e. less extreme residuals). In such cases, if you use the model you have fitted to predict future values, your predictions will deviate less dramatically from the observations you have if you apply differencing.

Differencing is accomplished through Y(differenced) = Y(t) - Y(t-d), where Y(t) refers to observed value Y at timeindex t, and d refers to the order of differencing you apply. When you use differencing, your entire range of observations basically shifts to the right. This means you lose some data at the left edge of your time series. How many time points you lose depends on the order of differencing d you use. This is where your observed shift comes from.

This page may offer a more elaborate explanation (make sure to click around a bit and explore the other pages on there if you want a treatment of the whole process of fitting an ARIMA model).

Hope this helps (or at least puts your mind at ease about the shift)!

Bests,

Evert

Ravenravening answered 19/9, 2016 at 13:30 Comment(2)
do you think it would be (or wouldn't be) wise, after having created the final dataset with predicted values, to shift the predicted data back 1 to fall in line with the original? do you think there's anything wrong with that?Lustral
No, you don't need to do that. If we look at statistical software in general, the 0th value should be "missing value". So, in the final dataset, you should remove the 0th value in the fitted value and in the original data when plotting or calculating the RMSE unless RMSE would be bigger than it should be.Rheims

© 2022 - 2024 — McMap. All rights reserved.