How exactly BIC in Augmented Dickey–Fuller test work in Python?
Asked Answered
P

1

3

This question is on Augmented Dickey–Fuller test implementation in statsmodels.tsa.stattools python library - adfuller().

In principle, AIC and BIC are supposed to compute information criterion for a set of available models and pick up the best (the one with the lowest information loss).

But how do they operate in the context of Augmented Dickey–Fuller?

The thing which I don't get: I've set maxlag=30, BIC chose lags=5 with some informational criterion. I've set maxlag=40 - BIC still chooses lags=5 but the information criterion have changed! Why in the world would information criterion for the same number of lags differ with maxlag changed?

Sometimes this leads to change of the choice of the model, when BIC switches from lags=5 to lags=4 when maxlag is changed from 20 to 30, which makes no sense as lag=4 was previously available.

Postmark answered 1/11, 2015 at 16:16 Comment(5)
This isn't really a programming question - stats.stackexchange.com might be more appropriate.Ugh
If I remember correctly, the reason is to drop the same number of initial observations for all lags up to maxlags to insure that the AIC, BIC search uses the same observations for all models. If maxlag changes, then the sample will change.Lentz
@tzaman: well, I considered stats section, but the question is about a specific realisation of satatistical method in one of the packages, thus I've chosen this section.Postmark
@user333700: This fits! So, do you imply that first BIC searches for the best model using (NumOfObs - maxlag) number of observations and then runs the chosen one with (NumOfObs - LagsOfChosenOne) observations? You can post this as an answer. I think, this is it.Postmark
I wrote an answer. I didn't remember runs the chosen one with (NumOfObs - LagsOfChosenOne) , but I checked the code and this is what adfuller is doing.Lentz
L
2

When we request automatic lag selection in adfulller, then the function needs to compare all models up to the given maxlag lags. For this comparison we need to use the same observations for all models. Because lagged observations enter the regressor matrix we loose observations as initial conditions corresponding to the largest lag included.

As a consequence autolag uses nobs - maxlags observations for all models. For calculating the test statistic for adfuller itself, we don't need model comparison anymore and we can use all observations available for the chosen lag, i.e. nobs - best_lag.

More general, how to treat initial conditions and different number of initial conditions is not always clear cut, autocorrelation and partial autocorrelation are largely based on using all available observations, full MLE for AR and ARMA models uses the stationary model to include the initial conditions, while conditional MLE or least squares drops them as necessary.

Lentz answered 2/11, 2015 at 14:4 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.