I am struggling to understand the concept of p-value and the various other results of adfuller test.
The code I am using:
(I found this code in Stack Overflow)
import numpy as np
import os
import pandas as pd
import statsmodels.api as sm
import cython
import statsmodels.tsa.stattools as ts
loc = r"C:\Stock Study\Stock Research\Hist Data"
os.chdir(loc)
xl_file1 = pd.ExcelFile("HDFCBANK.xlsx")
xl_file2 = pd.ExcelFile("KOTAKBANK.xlsx")
y1 = xl_file1.parse("Sheet1")
x1 = xl_file2.parse("Sheet1")
x = x1['Close']
y = y1['Close']
def cointegration_test(y, x):
# Step 1: regress on variable on the other
ols_result = sm.OLS(y, x).fit()
# Step 2: obtain the residual (ols_resuld.resid)
# Step 3: apply Augmented Dickey-Fuller test to see whether
# the residual is unit root
return ts.adfuller(ols_result.resid)
The output:
(-1.8481210964862593, 0.35684591783869046, 0, 1954, {'10%': -2.5675580437891359, '1%': -3.4337010293693235, '5%': -2.863020285222162}, 21029.870846458849)
If I understand the test correctly:
Value | |
---|---|
adf : float | Test statistic |
pvalue : float | MacKinnon’s approximate p-value based on MacKinnon (1994, 2010) |
usedlag : int | Number of lags used |
nobs : int | Number of observations used for the ADF regression and calculation of the critical values |
critical values : dict | Critical values for the test statistic at the 1 %, 5 %, and 10 % levels. Based on MacKinnon (2010) |
icbest : float | The maximized information criterion if autolag is not None. |
resstore : ResultStore, optional |
I am unable to completely understand the results and was hoping someone would be willing to explain them in layman's language. All the explanations I am finding are very technical.
My interpretation is: they are cointegrated, i.e. we failed to disprove the null hypothesis(i.e. unit root exists). Confidence levels are the % numbers.
Am I completely wrong?