Statsmodels ARIMA: how to get confidence/prediction interval?
Asked Answered
S

3

9

How to generate "lower" and "upper" predictions, not just "yhat"?

import statsmodels
from statsmodels.tsa.arima.model import ARIMA

assert statsmodels.__version__ == '0.12.0'

arima = ARIMA(df['value'], order=order)
model = arima.fit()

Now I can generate "yhat" predictions

yhat = model.forecast(123)

and get confidence intervals for model parameters (but not for predictions):

model.conf_int()

but how to generate yhat_lower and yhat_upper predictions?

Scream answered 9/10, 2020 at 10:5 Comment(0)
S
16

In general, the forecast and predict methods only produce point predictions, while the get_forecast and get_prediction methods produce full results including prediction intervals.

In your example, you can do:

forecast = model.get_forecast(123)
yhat = forecast.predicted_mean
yhat_conf_int = forecast.conf_int(alpha=0.05)

If your data is a Pandas Series, then yhat_conf_int will be a DataFrame with two columns, lower <name> and upper <name>, where <name> is the name of the Pandas Series.

If your data is a numpy array (or Python list), then yhat_conf_int will be an (n_forecasts, 2) array, where the first column is the lower part of the interval and the second column is the upper part.

Southeaster answered 9/10, 2020 at 16:31 Comment(1)
One should differ confidence intervals from prediction intervals, also a mean estimation and point prediction. These are different terms, concepts, and go under different calculations.Darlenedarline
H
2

using ARIMA you need to include seasonality and exogenous variables in the model yourself. While using SARIMA (Seasonal ARIMA) or SARIMAX (also for exogenous factors) implementation give C.I. to summary_frame:

import statsmodels.api as sm
import matplotlib.pyplot as plt
import pandas as pd

dta = sm.datasets.sunspots.load_pandas().data[['SUNACTIVITY']]
dta.index = pd.Index(pd.date_range("1700", end="2009", freq="A"))
print(dta)
print("init data:\n")
dta.plot(figsize=(12,4));
plt.show()

##print("SARIMAX(dta, order=(2,0,0), trend='c'):\n")
result = sm.tsa.SARIMAX(dta, order=(2,0,0), trend='c').fit(disp=False)
print(">>> result.params:\n", result.params, "\n")

##print("SARIMA_model.plot_diagnostics:\n")
result.plot_diagnostics(figsize=(15,12))
plt.show()    
# summary stats of residuals
print(">>> residuals.describe:\n", result.resid.describe(), "\n")

# Out-of-sample forecasts are produced using the forecast or get_forecast methods from the results object
# The get_forecast method is more general, and also allows constructing confidence intervals.
fcast_res1 = result.get_forecast()
# specify that we want a confidence level of 90%
print(">>> forecast summary at alpha=0.01:\n", fcast_res1.summary_frame(alpha=0.10), "\n")

# plot forecast
fig, ax = plt.subplots(figsize=(15, 5))    
# Construct the forecasts
fcast = result.get_forecast('2010Q4').summary_frame()
print(fcast)
fcast['mean'].plot(ax=ax, style='k--')
ax.fill_between(fcast.index, fcast['mean_ci_lower'], fcast['mean_ci_upper'], color='k', alpha=0.1);    
fig.tight_layout()
plt.show()

docs: "The forecast above may not look very impressive, as it is almost a straight line. This is because this is a very simple, univariate forecasting model. Nonetheless, keep in mind that these simple forecasting models can be extremely competitive"

p.s. here " you can use it in a non-seasonal way by setting the seasonal terms to zero."

Hydrastine answered 7/11, 2022 at 11:49 Comment(0)
Z
0

To generate prediction intervals as opposed to confidence intervals (which you have neatly made the distinction between, and is also presented in Hyndman's blog post on the difference between prediction intervals and confidence intervals), then you can follow the guidance available in this answer.

You could also try to compute bootstrapped prediction intervals, which is laid out in this answer.

Below, is my attempt at implementing this (I'll update it when I get the chance to check it in more detail):

def bootstrap_prediction_interval(y_train: Union[list, pd.Series],
                                  y_fit: Union[list, pd.Series],
                                  y_pred_value: float,
                                  alpha: float = 0.05,
                                  nbootstrap: int = None,
                                  seed: int = None):
    """
    Bootstraps a prediction interval around an ARIMA model's predictions.
    Method presented clearly here:
        - https://stats.stackexchange.com/a/254321
    Also found through here, though less clearly:
        - https://otexts.com/fpp3/prediction-intervals.html
    Can consider this to be a time-series version of the following generalisation:
        - https://saattrupdan.github.io/2020-03-01-bootstrap-prediction/

    :param y_train: List or Series of training univariate time-series data.
    :param y_fit: List or Series of model fitted univariate time-series data.
    :param y_pred_value: Float of the model predicted univariate time-series you want to compute P.I. for.
    :param alpha: float = 0.05, the prediction uncertainty.
    :param nbootstrap: integer = 1000, the number of bootstrap sampling of the residual forecast error.
        Rules of thumb provided here:
            - https://stats.stackexchange.com/questions/86040/rule-of-thumb-for-number-of-bootstrap-samples
    :param seed: Integer to specify if you want deterministic sampling.

    :return: A list [`lower`, `pred`, `upper`] with `pred` being the prediction
    of the model and `lower` and `upper` constituting the lower- and upper
    bounds for the prediction interval around `pred`, respectively.
    """

    # get number of samples
    n = len(y_train)

    # compute the forecast errors/resid
    fe = y_train - y_fit

    # get percentile bounds
    percentile_lower = (alpha * 100) / 2
    percentile_higher = 100 - percentile_lower

    if nbootstrap is None:
        nbootstrap = np.sqrt(n).astype(int)
    if seed is None:
        rng = np.random.default_rng()
    else:
        rng = np.random.default_rng(seed)

    # bootstrap sample from errors
    error_bootstrap = []
    for _ in range(nbootstrap):
        idx = rng.integers(low=n)
        error_bootstrap.append(fe[idx])

    # get lower and higher percentiles of sampled forecast errors
    fe_lower = np.percentile(a=error_bootstrap, q=percentile_lower)
    fe_higher = np.percentile(a=error_bootstrap, q=percentile_higher)

    # compute P.I.
    pi = [y_pred_value + fe_lower, y_pred_value, y_pred_value + fe_higher]

    return pi
Zoes answered 16/5, 2021 at 20:58 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.