How to find the lag between two time series using cross-correlation
Asked Answered
C

1

7

Say the two series are:

x = [4,4,4,4,6,8,10,8,6,4,4,4,4,4,4,4,4,4,4,4,4,4,4]
y = [4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,6,8,10,8,6,4,4]

Series x clearly lags y by 12 time periods. However, using the following code as suggested in Python cross correlation:

import numpy as np
c = np.correlate(x, y, "full")
lag = np.argmax(c) - c.size/2

leads to an incorrect lag of -0.5.
What's wrong here?

Cystic answered 9/9, 2021 at 11:46 Comment(1)
What's the desired output?Thibault
C
9

If you want to do it the easy way you should simply use scipy correlation_lags

Also, remember to subtract the mean from the inputs.

import numpy as np
from scipy import signal
x = [4,4,4,4,6,8,10,8,6,4,4,4,4,4,4,4,4,4,4,4,4,4,4]
y = [4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,6,8,10,8,6,4,4]
correlation = signal.correlate(x-np.mean(x), y - np.mean(y), mode="full")
lags = signal.correlation_lags(len(x), len(y), mode="full")
lag = lags[np.argmax(abs(correlation))]

This gives lag=-12, that is the difference between the index of the first six in x and in y, if you swap inputs it gives +12

Edit

Why to subtract the mean

If the signals have non-zero mean the terms at the center of the correlation will become larger, because there you have a larger support sample to compute the correlation. Furthermore, for very large data, subtracting the mean makes the calculations more accurate.

Here I illustrate what would happen if the mean was not subtracted for this example.

plt.plot(abs(correlation))
plt.plot(abs(signal.correlate(x, y, mode="full")))
plt.plot(abs(signal.correlate(np.ones_like(x)*np.mean(x), np.ones_like(y)*np.mean(y))))
plt.legend(['subtracting mean', 'constant signal', 'keeping the mean'])

enter image description here

Notice that the maximum on the blue curve (at 10) does not coincide with the maximum of the orange curve.

Chabazite answered 9/9, 2021 at 11:56 Comment(8)
why do you need to subtract the mean when calculating the correlation?Dungeon
If the two signals have the same length the number of terms in each will be a triangle shape, that will probably place the maximum correlation at the center.Chabazite
Added one plot to help there.Chabazite
Thank you. You say 'the terms at the center of the correlation will become larger'. Why is this not reported in any official documentation? Do you have an official link that elaborates more on that and the use of the mean?Dungeon
They give the the definition documentation. I use this to calculate an unnormalized Pearson correlation coefficient version for all the possible shifts.Chabazite
Sure, I just can't get from the definition how the correlation becomes stronger towards the center of the array, nor there is any mention to the mean subtraction.Dungeon
So maybe maybe need to post a specific question for your specific doubts. I will be happy to give an answer with more details if I can.Chabazite
I have done it here following this thread.Dungeon

© 2022 - 2024 — McMap. All rights reserved.