Efficient Cointegration Test in Python

Asked 6/7, 2012 at 13:16 Answered 22/2, 2018 at 12:49

I am wondering if there is a better way to test if two variables are cointegrated than the following method:

import numpy as np
import statsmodels.api as sm
import statsmodels.tsa.stattools as ts

y = np.random.normal(0,1, 250)
x = np.random.normal(0,1, 250)

def cointegration_test(y, x):
    # Step 1: regress on variable on the other 
    ols_result = sm.OLS(y, x).fit() 
    # Step 2: obtain the residual (ols_resuld.resid)
    # Step 3: apply Augmented Dickey-Fuller test to see whether 
    #        the residual is unit root    
    return ts.adfuller(ols_result.resid)

The above method works; however, it is not very efficient. When I run sm.OLS, a lot of things are calculated, not just the residuals, this of course increases the run time. I could of course write my own code that calculates just the residuals, but I don't think this will be very efficient either.

I looking for either a build in test that just tests for cointegration directly. I was thinking Pandas, but don't seem to be able to find anything. Or maybe there is a clever to test for cointegration without running a regression, or some efficient method.

I have to run a lot of cointegration tests, and it would nice to improve on my current method.

Ado answered 6/7, 2012 at 13:16 Comment(1)

from statsmodels.tsa.vector_ar.vecm import coint_johansen – Inhalation 21/2, 2021 at 7:15

You could try the following:

import statsmodels.tsa.stattools as ts 
result=ts.coint(x, y)

Edit:

import statsmodels.tsa.stattools as ts
import numpy as np
import pandas as pd
import pandas.io.data as web

data1 = web.DataReader('FB', data_source='yahoo',start='4/4/2015', end='4/4/2016')


data2 = web.DataReader('AAPL', data_source='yahoo',start='4/4/2015', end='4/4/2016')


data1['key']=data1.index

data2['key']=data2.index

result = pd.merge(data1, data2, on='key')


x1=result['Close_x']


y1=result['Close_y']


coin_result = ts.coint(x1, y1)

The code is self explanatory:- 1) Import the necessary packages 2) Fetch data of Facebook and Apple stock for an year duration 3) Merge the data according to the date column 4) Choose the closing price 5) Conduct the cointegration test 6) The variable coin_result has the statistics of cointegration test

Dermatoplasty answered 18/8, 2016 at 8:36 Comment(4)

Please add an explanation to why this code could be tried (possibly using the example of the OP). – Beefwood 18/8, 2016 at 8:55

with this coint method I got different results than the ts.adfuller on the residuals, therefore I have high doubts about the usability of the ts.coint – Natter 10/3, 2017 at 19:58

Can be done. Even after downloading data for matching dates, it is important to merge them as there can be "nan" and/or missing values for non overlapping dates. – Dermatoplasty 29/12, 2017 at 6:11

ts.coint(x1, y1) and ts.coint(y1, x1) get very different results. I think this tool is not reliable. – Huynh 12/11, 2022 at 4:49

a "better way to test" as you've requested is the johansens test.

Johansens test removes the need to test variable pairs for cointegration, because you can test all of them at once.

This will significantly speed up your program, since a loop by definition is order N complexity, by removing the loop it becomes order 1 complexity, meaning the scaling to many variables is not an issue (and thus enabling faster computation of what is cointegrated).

For more information, the original article on the test is: Estimation and Hypothesis Testing of Cointegration Vectors in Gaussian Vector Autoregressive Models Søren Johansen Econometrica Vol. 59, No. 6 (Nov., 1991), pp. 1551-1580 Published by: The Econometric Society DOI: 10.2307/2938278 Stable URL: http://www.jstor.org/stable/2938278 Page Count: 30

statsmodels has the vecm module which includes johansens test for cointegration. To get it, you will have to git it.

Garlic answered 22/2, 2018 at 12:49 Comment(0)

The residuals can easily be calculated with linear algebra Assuming y is n x 1 and X is n x m then residuals = y-X(X'X)^-1X'y

But a more efficient way is to use Johansen test https://en.m.wikipedia.org/wiki/Johansen_test

I found a python implementation here: https://github.com/iisayoo/johansen

I have not tested it.

Eeg answered 20/2, 2018 at 3:57 Comment(0)

Recommended topics

Hot tags