numpy and statsmodels give different values when calculating correlations, How to interpret this?

About

Asked 7/7, 2014 at 17:48 Answered 7/7, 2014 at 18:43

Solved python numpy statsmodels cross-correlation

I can't find a reason why calculating the correlation between two series A and B using numpy.correlate gives me different results than the ones I obtain using statsmodels.tsa.stattools.ccf

Here's an example of this difference I mention:

import numpy as np
from matplotlib import pyplot as plt
from statsmodels.tsa.stattools import ccf

#Calculate correlation using numpy.correlate
def corr(x,y):
    result = numpy.correlate(x, y, mode='full')
    return result[result.size/2:]

#This are the data series I want to analyze
A = np.array([np.absolute(x) for x in np.arange(-1,1.1,0.1)])
B = np.array([x for x in np.arange(-1,1.1,0.1)])

#Using numpy i get this
plt.plot(corr(B,A))

enter image description here

#Using statsmodels i get this
plt.plot(ccf(B,A,unbiased=False))

enter image description here

The results seem qualitatively different, where does this difference come from?

Wellesz answered 7/7, 2014 at 17:48 Comment(0)

statsmodels.tsa.stattools.ccf is based on np.correlate but does some additional things to give the correlation in the statistical sense instead of the signal processing sense, see cross-correlation on Wikipedia. What happens exactly you can see in the source code, it's very simple.

For easier reference I copied the relevant lines below:

def ccovf(x, y, unbiased=True, demean=True):
    n = len(x)
    if demean:
        xo = x - x.mean()
        yo = y - y.mean()
    else:
        xo = x
        yo = y
    if unbiased:
        xi = np.ones(n)
        d = np.correlate(xi, xi, 'full')
    else:
        d = n
    return (np.correlate(xo, yo, 'full') / d)[n - 1:]

def ccf(x, y, unbiased=True):
    cvf = ccovf(x, y, unbiased=unbiased, demean=True)
    return cvf / (np.std(x) * np.std(y))

Weinberger answered 7/7, 2014 at 18:43 Comment(3)

So, the difference is that numpy doesn't normalize the covariance by the product of the standard deviations? – Wellesz 8/7, 2014 at 1:40

@EttoreMajorana, in addition, the statsmodels ccf substracts the means of the signals before convolution and divides the result by the length of the first signal, to arrive at the definition of correlation as in statistics. – Weinberger 8/7, 2014 at 7:40

Ohhh I see, that's just what I was looking for, thanks for your help. – Wellesz 8/7, 2014 at 17:38

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags