Interpreting (and comparing) output from numpy.correlate
Asked Answered
E

1

8

I have looked at this question but it hasn't really given me any answers.

Essentially, how can I determine if a strong correlation exists or not using np.correlate? I expect the same output as I get from matlab's xcorr with the coeff option which I can understand (1 is a strong correlation at lag l and 0 is no correlation at lag l), but np.correlate produces values greater than 1, even when the input vectors have been normalised between 0 and 1.

Example input

import numpy as np
x = np.random.rand(10)
y = np.random.rand(10)

np.correlate(x, y, 'full')

This gives the following output:

array([ 0.15711279,  0.24562736,  0.48078652,  0.69477838,  1.07376669,
    1.28020871,  1.39717118,  1.78545567,  1.85084435,  1.89776181,
    1.92940874,  2.05102884,  1.35671247,  1.54329503,  0.8892999 ,
    0.67574802,  0.90464743,  0.20475408,  0.33001517])

How can I tell what is a strong correlation and what is weak if I don't know the maximum possible correlation value is?

Another example:

In [10]: x = [0,1,2,1,0,0]

In [11]: y = [0,0,1,2,1,0]

In [12]: np.correlate(x, y, 'full')
Out[12]: array([0, 0, 1, 4, 6, 4, 1, 0, 0, 0, 0])

Edit: This was a badly asked question, but the marked answer does answer what was asked. I think it is important to note what I have found whilst digging around in this area, you cannot compare outputs from cross-correlation. In other words, it would not be valid to use the outputs from cross-correlation to say signal x is better correlated to signal y than signal z. Cross-correlation does not provide this kind of information

Extraordinary answered 7/5, 2016 at 20:29 Comment(2)
From what I have read about xcorr, the output is not normalized to [0,1] either. It seems to behave identical to numpy.correlate.Chlori
@ChristophTerasa sorry, I meant xcorr with the coeff option. Question corrected.Extraordinary
R
15

numpy.correlate is under-documented. I think that we can make sense of it, though. Let's start with your sample case:

>>> import numpy as np
>>> x = [0,1,2,1,0,0]
>>> y = [0,0,1,2,1,0]
>>> np.correlate(x, y, 'full')
array([0, 0, 1, 4, 6, 4, 1, 0, 0, 0, 0])

Those numbers are the cross-correlations for each of the possible lags. To make that more clear, let's put the lag numbers above the correlations:

>>> np.concatenate((np.arange(-5, 6)[None,...], np.correlate(x, y, 'full')[None,...]), axis=0)
array([[-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [ 0,  0,  1,  4,  6,  4,  1,  0,  0,  0,  0]])

Here, we can see that the cross-correlation reaches its peak at a lag of -1. If you look at x and y above, that makes sense: it one shifts y to the left by one place, it matches x exactly.

To verify this, let's try again, this time shifting y further:

>>> y = [0, 0, 0, 0, 1, 2]
>>> np.concatenate((np.arange(-5, 6)[None,...], np.correlate(x, y, 'full')[None,...]), axis=0)
array([[-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [ 0,  2,  5,  4,  1,  0,  0,  0,  0,  0,  0]])

Now, the correlation peaks at a lag of -3, meaning that the best match between x and y occurs when y is shifted to the left by 3 places.

Revolution answered 7/5, 2016 at 21:7 Comment(2)
But given the max correlation of 6 in the first example and 5 in the second, how do I know if those are strong correlations? I know that you can say that is the lag point at which they are most correlating and I know that a correlation of 6 is stronger than a correlation of 5, but is a correlation of 6 strong? Is a correlation of 5 strong?Extraordinary
For that, you need a normalized cross-correlation. There is a proposed patch that would add that to numpy but the patch hasn't been acted on.Revolution

© 2022 - 2024 — McMap. All rights reserved.