Durbin–Watson statistic for one dimensional time series data

R

2

8

I'm experimenting to decide if a time-series (as in, one list of floats) is correlated with itself. I've already had a play with the acf function in statsmodels (http://statsmodels.sourceforge.net/devel/generated/statsmodels.tsa.stattools.acf.html), now I'm looking at whether the Durbin–Watson statistic has any worth.

It seems like this kind of thing should work:

from statsmodels.regression.linear_model import OLS
import numpy as np

data = np.arange(100)  # this should be highly correlated
ols_res = OLS(data)
dw_res = np.sum(np.diff(ols_res.resid.values))

If you were to run this, you would get:

Traceback (most recent call last):
...
  File "/usr/lib/pymodules/python2.7/statsmodels/regression/linear_model.py", line 165, in initialize
    self.nobs = float(self.wexog.shape[0])
AttributeError: 'NoneType' object has no attribute 'shape'

It seems that D/W is usually used to compare two time-series (e.g. http://connor-johnson.com/2014/02/18/linear-regression-with-python/) for correlation, so I think the problem is that i've not passed another time-series to compare to. Perhaps this is supposed to be passed in the exog parameter to OLS?

exog : array-like

A nobs x k array where nobs is the number of observations and k is
the number of regressors.

(from http://statsmodels.sourceforge.net/devel/generated/statsmodels.regression.linear_model.OLS.html)

Side-note: I'm not sure what a "nobs x k" array means. Maybe an array with is x by k?

So what should I be doing here? Am I expected to pass the data twice, or to lag it manually myself, or?

Thanks!

Rickierickman answered 10/4, 2017 at 11:33 Comment(0)

A

3

OLS is a regression that needs y and x (or endog and exog). x needs to be at least a constant in your case, ie. np.ones(len(endog), 1).

Also, you need to fit the model, i.e. ols_res = OLS(y, x).fit().

nobs x k means 2 dimensional with nobs observation in rows and k variables in columns, i.e. exog.shape is (nobs, k)

Durbin Watson is a test statistic for serial correlation. It is included in the OLS summary output. There are other tests for no autocorrelation included in statsmodels.

(I would recommend working through some example or tutorial notebooks.)

Acceptable answered 10/4, 2017 at 14:31 Comment(2)

Thanks. Can you confirm that to correlate the data with itself, I do not need to lag the data myself? Only pass the data as the independent variable and a vector of constant ones as the dependent variable. Right? – Rickierickman 10/4, 2017 at 17:7

OLS will in this case just demean the data. Then diagnostic tests on the residuals like DW or similar can then be used, e.g. statsmodels.org/stable/… statsmodels.org/stable/diagnostic.html#autocorrelation-tests – Acceptable 10/4, 2017 at 19:7

R

4

I've accepted user333700's answer, but I wanted to post a code snippet follow up.

This small program computes the durbin-watson correlation for a linear range (which should be highly correlated, thus giving a value close to 0) and then for random values (which should not be correlated, thus giving a value close to 2):

from statsmodels.regression.linear_model import OLS
import numpy as np
from statsmodels.stats.stattools import durbin_watson



def dw(data):
    ols_res = OLS(data, np.ones(len(data))).fit()
    return durbin_watson(ols_res.resid)


print("dw of range=%f" % dw(np.arange(2000)))
print("dw of rand=%f" % dw(np.random.randn(2000)))

When run:

dw of range=0.000003
dw of rand=2.036162

So I think that looks good :)

Rickierickman answered 10/4, 2017 at 21:2 Comment(0)

A

3