Correct way to use ARMAResult.predict() function
Asked Answered
R

2

8

According to this question How to get constant term in AR Model with statsmodels and Python?. I'm now trying to use the ARMA model to fit the data but again I couldn't find a way to interpret the model's result. Here what I have done according to ARMA out-of-sample prediction with statsmodels and ARMAResults.predict API document.

# Parameter

INPUT_DATA_POINT = 200
P = 5
Q = 0

# Read Data

data = []

f = open('stock_all.csv', 'r')
for line in f:
    data.append(float(line.split(',')[5]))
f.close()

# Fit ARMA-model using the first piece of data

result = arma_model(data[:INPUT_DATA_POINT], P, Q)

# Predict using model (fit dimension is len(data) + 1 why?)

fit = result.predict(0, len(data))

# Plot

plt.figure(facecolor='white')
plt.title('ARMA Model Fitted Using ' + str(INPUT_DATA_POINT) + ' Data Points, P=' + str(P) +  ' Q=' + str(Q) + '\n')
plt.plot(data, 'b-', label='data')
plt.plot(range(INPUT_DATA_POINT), result.fittedvalues, 'g--', label='fit')
plt.plot(range(len(data)), fit[:len(data)], 'r-', label='predict')
plt.legend(loc=4)
plt.show()

Here the result which is very strange because it should be nearly identical to the result from my last question as I mention in the link above. Also I'm not quite understand why there is some results for a couple of first data points since that shouldn't be valid (no previous value to compute).

enter image description here

I try to write my own prediction code which is shown below (omitted the top part that is identical to the above code)

# Predict using model

start_pos = max(result.k_ar, result.k_ma)

fit = []
for t in range(start_pos, len(data)):
    value = 0
    for i in range(1, result.k_ar + 1):
        value += result.arparams[i - 1] * data[t - i]
    for i in range(1, result.k_ma + 1):
        value += result.maparams[i - 1] * data[t - i]
    fit.append(value)

# Plot

plt.figure(facecolor='white')
plt.title('ARMA Model Fitted Using ' + str(INPUT_DATA_POINT) + ' Data Points, P=' + str(P) +  ' Q=' + str(Q) + '\n')
plt.plot(data, 'b-', label='data')
plt.plot(range(INPUT_DATA_POINT), result.fittedvalues, 'r+', label='fit')
plt.plot(range(start_pos, len(data)), fit, 'r-', label='predict')
plt.legend(loc=4)
plt.show()

This is the best result I got

enter image description here

Roselba answered 20/6, 2014 at 16:34 Comment(0)
D
2

You trained the model on a subset of the data and then predict out of sample. AR(MA) prediction quickly converges to the mean of the data. That is why you see the first results. In your second results, you're not doing out of sample forecasting, you're just getting out-of-sample fitted values.

The first few observation data points are fit using the Kalman filter recursions (this is the distinction between full maximum likelihood estimates and conditional maximum likelihood estimates).

I would pick up a good forecasting textbook and review it to understand this behavior.

Derk answered 24/6, 2014 at 16:32 Comment(6)
Thanks. Yes, I just want an out-of-sample fitted value. Is my code correct especially the statement value = 0 should that be value=result.params[0]? If my code is corrected, the first 200 data points should be equal to the result from result.fittedvalues right? But in this case it isn't. Please correct me if I'm wrong.Roselba
It looks to me like you're omitting the constant. See my code and comment about mean vs constant in your last question.Derk
I've tried but don't know how to get the constant term. Using value=result.params[0] liked my previous question doesn't work here.Roselba
Yes, because ARMA reports the mean not the constant. The pseudocode to get the constant from ARMA is in the other answer, and it's important to understand the distinction. I hope to expose the forecasting functionality as standalone functions sometime in the future to make this easier.Derk
Could you please give me a link to that answer please?Roselba
Oh I found it sorry. It is constant = mean(1 - arparams.sum()). Really appreciate your help.Roselba
H
0

Another possible and probably shorter solution:

for i in range(0,len(data)):
    fit.append(result.forecast()[0])
    numpy.append(result.data.endog.data[i])
Hymnology answered 31/3, 2015 at 9:42 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.