numpy and statsmodels give different values when calculating correlations, How to interpret this?
Asked Answered
W

1

7

I can't find a reason why calculating the correlation between two series A and B using numpy.correlate gives me different results than the ones I obtain using statsmodels.tsa.stattools.ccf

Here's an example of this difference I mention:

import numpy as np
from matplotlib import pyplot as plt
from statsmodels.tsa.stattools import ccf

#Calculate correlation using numpy.correlate
def corr(x,y):
    result = numpy.correlate(x, y, mode='full')
    return result[result.size/2:]

#This are the data series I want to analyze
A = np.array([np.absolute(x) for x in np.arange(-1,1.1,0.1)])
B = np.array([x for x in np.arange(-1,1.1,0.1)])

#Using numpy i get this
plt.plot(corr(B,A))

enter image description here

#Using statsmodels i get this
plt.plot(ccf(B,A,unbiased=False))

enter image description here

The results seem qualitatively different, where does this difference come from?

Wellesz answered 7/7, 2014 at 17:48 Comment(0)
W
10

statsmodels.tsa.stattools.ccf is based on np.correlate but does some additional things to give the correlation in the statistical sense instead of the signal processing sense, see cross-correlation on Wikipedia. What happens exactly you can see in the source code, it's very simple.

For easier reference I copied the relevant lines below:

def ccovf(x, y, unbiased=True, demean=True):
    n = len(x)
    if demean:
        xo = x - x.mean()
        yo = y - y.mean()
    else:
        xo = x
        yo = y
    if unbiased:
        xi = np.ones(n)
        d = np.correlate(xi, xi, 'full')
    else:
        d = n
    return (np.correlate(xo, yo, 'full') / d)[n - 1:]

def ccf(x, y, unbiased=True):
    cvf = ccovf(x, y, unbiased=unbiased, demean=True)
    return cvf / (np.std(x) * np.std(y))
Weinberger answered 7/7, 2014 at 18:43 Comment(3)
So, the difference is that numpy doesn't normalize the covariance by the product of the standard deviations?Wellesz
@EttoreMajorana, in addition, the statsmodels ccf substracts the means of the signals before convolution and divides the result by the length of the first signal, to arrive at the definition of correlation as in statistics.Weinberger
Ohhh I see, that's just what I was looking for, thanks for your help.Wellesz

© 2022 - 2024 — McMap. All rights reserved.