how to solve LinAlgError & ValueError when training arima model with Python
Asked Answered
S

1

5

I am trying to implement a time series model and getting some strange exceptions that tells nothing to me. I wonder if I am making a mistake or if it is totally expected. Here comes details...

When training my model, I try to make a grid search to find the best (p, d, q) settings. Here is the complete code (and I will explain down below what is happening here):

The reproducible code below is essentially a copy from https://machinelearningmastery.com/grid-search-arima-hyperparameters-with-python/, with some slight changes...:

import warnings
from pandas import Series
from statsmodels.tsa.arima_model import ARIMA
from sklearn.metrics import mean_squared_error

# evaluate an ARIMA model for a given order (p,d,q)
def evaluate_arima_model(X, arima_order):
    # prepare training dataset
    train_size = int(len(X) * 0.66)
    train, test = X[0:train_size], X[train_size:]
    history = [x for x in train]
    # make predictions
    predictions = list()
    for t in range(len(test)):
        model = ARIMA(history, order=arima_order)
        model_fit = model.fit(disp=0)
        yhat = model_fit.forecast()[0]
        predictions.append(yhat)
        history.append(test[t])
    # calculate out of sample error
    error = mean_squared_error(test, predictions)
    return error

# evaluate combinations of p, d and q values for an ARIMA model
def evaluate_models(dataset, p_values, d_values, q_values):
    dataset = dataset.astype('float64')
    best_score, best_cfg = float("inf"), None
    for p in p_values:
        for d in d_values:
            for q in q_values:
                order = (p,d,q)
                try:
                    print("Evaluating the settings: ", p, d, q)
                    mse = evaluate_arima_model(dataset, order)
                    if mse < best_score:
                        best_score, best_cfg = mse, order
                    print('ARIMA%s MSE=%.3f' % (order,mse))
                except Exception as exception:
                    print("Exception occured...", type(exception).__name__, "\n", exception)

    print('Best ARIMA%s MSE=%.3f' % (best_cfg, best_score))

# dataset
values = np.array([-1.45, -9.04, -3.64, -10.37, -1.36, -6.83, -6.01, -3.84, -9.92, -5.21,
                   -8.97, -6.19, -4.12, -11.03, -2.27, -4.07, -5.08, -4.57, -7.87, -2.80,
                   -4.29, -4.19, -3.76, -22.54, -5.87, -6.39, -4.19, -2.63, -8.70, -3.52, 
                   -5.76, -1.41, -6.94, -12.95, -8.64, -7.21, -4.05, -3.01])

# evaluate parameters
p_values = [7, 8, 9, 10]
d_values = range(0, 3)
q_values = range(0, 3)
warnings.filterwarnings("ignore")
evaluate_models(values, p_values, d_values, q_values)

And here is the output (not everything but it gives enough information):

Evaluating the settings:  7 0 0
Exception occured... LinAlgError 
 SVD did not converge
Evaluating the settings:  7 0 1
Exception occured... LinAlgError 
 SVD did not converge
Evaluating the settings:  7 0 2
Exception occured... ValueError 
 The computed initial AR coefficients are not stationary
You should induce stationarity, choose a different model order, or you can
pass your own start_params.
Evaluating the settings:  7 1 0
Exception occured... LinAlgError 
 SVD did not converge
Evaluating the settings:  7 1 1
Exception occured... ValueError 
 The computed initial AR coefficients are not stationary
You should induce stationarity, choose a different model order, or you can
pass your own start_params.
Evaluating the settings:  7 1 2
Exception occured... ValueError 
 The computed initial AR coefficients are not stationary
You should induce stationarity, choose a different model order, or you can
pass your own start_params.
Evaluating the settings:  7 2 0
Exception occured... LinAlgError 
 SVD did not converge
Evaluating the settings:  7 2 1
Exception occured... ValueError 
 The computed initial AR coefficients are not stationary
You should induce stationarity, choose a different model order, or you can
pass your own start_params.
Evaluating the settings:  7 2 2
Exception occured... ValueError 
 The computed initial AR coefficients are not stationary
You should induce stationarity, choose a different model order, or you can
pass your own start_params.

The code is simply trying all different given settings, training the model, calculating MSE (mean squared error) for each given setting, and then selecting the best one (based on minimum MSE).

But during the training procedure, the code keeps throwing LinAlgError and ValueError exceptions, which tells nothing to me.

And as far as I can follow it, the code is not really truly training certain settings when these exceptions are thrown, and then just jumping to the next setting that will be tried out.

Why do I see these exceptions? Can they be ignored? What do I need to do to solve it out?

Selfheal answered 11/3, 2019 at 20:30 Comment(0)
H
6

First, to answer your specific question: I think the "SVD did not converge" is a bug in the ARIMA model of Statsmodels. The SARIMAX model better supported these days (and does everything the ARIMA model does + more), so I would recommend using that instead. To do so, replace model creation with:

model = sm.tsa.SARIMAX(history, trend='c', order=arima_order, enforce_stationarity=False, enforce_invertibility=False)

With that being said, I think that you are still unlikely to get good results given your time series and the specifications you are trying.

In particular, your time series is very short, and you are only considering extremely long autoregressive lag lengths (p > 6). It will be difficult to estimate that many parameters with so few data points, particularly when you also have integration (d = 1 or d = 2) and when you also add in moving average components. I suggest that you re-evaluate which models you are considering.

Harlanharland answered 14/3, 2019 at 2:9 Comment(4)
thank you for your answer. Studying it now before I accept the answer. The forecast function for ARIMA model even returns confidence intervals, which I did not mention above for the sake of simplicity. How is confidence interval calculated when doing forecasts with SARIMAX? Additionally, by changing possible p, d, q values, I can still make a grid search, right? (These statistical models is not really my expertise..)Selfheal
Yes, you can get confidence intervals from the results object, using fcast_res = res.get_forecast() followed by ci = fcast_res.conf_int(), while the actual forecast is fcast = fcast_res.predicted_mean. Yes you can still do the grid search. SARIMAX is simply an ARIMA model with the option for also adding (S)easonal terms and e(X)ogenous regressors if you want.Harlanharland
Thank you! One last question.. My dataset have huge numbers. (Ex: -6.1e+10) And I occasionally see ´"ValueError Input contains NaN, infinity or a value too large for dtype('float64')"´ during training. I can normalize dataset with sklearn.preprocessing.StandardScaler. With 'fit' and 'transform' functions, normalization can be done. But then calculated confidence intervals will be based on normalized dataset (too small numbers). To de-normalize the dataset (or even predictions), I can call the inverse_transform function but how do I solve the same problem for conf. interv.?Selfheal
Unfortunately, there isn't a straightforward way to transform confidence intervals. There are a number of directions you could go, but none are built-in (e.g. an approximation technique called the delta method, or simulation). See for example stats.stackexchange.com/questions/1713 for a discussion of some of these issues.Harlanharland

© 2022 - 2024 — McMap. All rights reserved.