Estimating small time shift between two time series
Asked Answered
C

6

27

I have two time series, and i suspect that there is a time shift between them, and i want to estimate this time shift.

This question has been asked before in: Find phase difference between two (inharmonic) waves and find time shift between two similar waveforms but in my case, the time shift is smaller than the resolution of the data. for example the data is available at hourly resolution, and the time shift is only few minutes(see image).

The cause of this is that the datalogger used to measure one of the series has few minutes shift in its time.

Any algorithms out there that can estimate this shift, preferably without using interpolation?

solar irradiation forecast and solar irradiation measurement

Cyna answered 11/12, 2012 at 18:31 Comment(12)
(+1) Nice question. Out of interest, why are you banning the use of interpolation?Helsell
i just thought that if you want to estimate the shift to high accuracy then you need to interpolate to a very high resolution. and since i have lots of data, i wanted to avoid that.Cyna
It seems to me that fourier series might be helpful if your data is roughly periodic...Wage
Do you have any sort of synchronization events that occur in both time series?Amersfoort
If the data looks like anything in the graph it is very periodic and an FFT might show you the shift. Though the FFT is itself an interpolation...Do you have a sample data for us to test, this is interesting.7Improvvisatore
@user948652 -- That data isn't actually very periodic. It's half-periodic (periodicity implies that the derivative at either end of the domain is the same as well), but it still might work out OK.Wage
@Wage I have seen FFT give out reasonable results for far uglier data.Improvvisatore
@user948652 -- Sure it can. There's just no reason to believe that it should a-priori (as far as I'm aware). Anyway, feel free to have a look at the FFT solution I posted below (you might be more familiar with this stuff than I am). I'd happily convert it to a community wiki if others wanted to contribute and make it better.Wage
You can cross-correlate the two data series using shifts near the suspected lag value; the maximization of the cross-correlation will give you the time shift. Since you are looking for a shift for times smaller than your resolution, a continous (sub-sampling) correlation method is required.December
@brentlance: no, the data is solar radiation and forecast of solar radiation, so there are no events that simultanously effect bothseries(if i understood your question).Cyna
@user948652: if you give me your email address ill send you the dataCyna
The figure shown is a clear day and the series looks highly correlated and smooth, but for most of the days of the year the data is looks very different, the forecasts are much worse and the series is less smooth.Cyna
W
4

This is quite an interesting problem. Here's an attempt at a partial solution using fourier transforms. This relies on the data being moderately periodic. I'm not sure if it will work with your data (where the derivatives at the endpoints don't seem to match).

import numpy as np

X = np.linspace(0,2*np.pi,30)  #some X values

def yvals(x):
    return np.sin(x)+np.sin(2*x)+np.sin(3*x)

Y1 = yvals(X)
Y2 = yvals(X-0.1)  #shifted y values

#fourier transform both series
FT1 = np.fft.fft(Y1)
FT2 = np.fft.fft(Y2)

#You can show that analyically, a phase shift in the coefficients leads to a 
#multiplicative factor of `exp(-1.j * N * T_d)`

#can't take the 0'th element because that's a division by 0.  Analytically, 
#the division by 0 is OK by L'hopital's<sp?> rule, but computers don't know calculus :)
print np.log(FT2[1:]/FT1[1:])/(-1.j*np.arange(1,len(X)))

A quick inspection of the printed output shows that the frequencies with the most power (N=1,N=2) give reasonable estimates, N=3 does OK too if you look at the absolute value (np.absolute), although I'm at a loss to explain why that would be.

Maybe someone more familiar with the math can take it from here to give a better answer...

Wage answered 11/12, 2012 at 20:7 Comment(0)
I
3

One of the links you provided has the right idea (in fact I am doing pretty much the same thing here)

import numpy as np
import matplotlib.pyplot as plt
from scipy.signal import correlate

a,b, N = 0, 10, 1000        #Boundaries, datapoints
shift = -3                  #Shift, note 3/10 of L = b-a

x = np.linspace(a,b,N)
x1 = 1*x + shift
time = np.arange(1-N,N)     #Theoritical definition, time is centered at 0

y1 = sum([np.sin(2*np.pi*i*x/b) for i in range(1,5)])
y2 = sum([np.sin(2*np.pi*i*x1/b) for i in range(1,5)])

#Really only helps with large irregular data, try it
# y1 -= y1.mean()
# y2 -= y2.mean()
# y1 /= y1.std()
# y2 /= y2.std()

cross_correlation = correlate(y1,y2)
shift_calculated = time[cross_correlation.argmax()] *1.0* b/N
y3 = sum([np.sin(2*np.pi*i*(x1-shift_calculated)/b) for i in range(1,5)])
print "Preset shift: ", shift, "\nCalculated shift: ", shift_calculated



plt.plot(x,y1)
plt.plot(x,y2)
plt.plot(x,y3)
plt.legend(("Regular", "Shifted", "Recovered"))
plt.savefig("SO_timeshift.png")
plt.show()

This has the following output:

Preset shift:  -3
Calculated shift:  -2.99

enter image description here

It might be necessary to check

  1. Scipy Correlate
  2. Time Delay Analaysis

Note that the the argmax() of the correlation shows the position of the alignment, it has to be scaled by the length of b-a = 10-0 = 10 and N to get the actual value.

Checking the source of correlate Source it is not entirely obvious what the imported function from sigtools behaves. For large datasets circular correlation (via Fast Fourier Transforms) is much faster than the straight-forward method. I suspect this is what is implemented in sigtools but I cannot tell for sure. A search for the file in my python2.7 folder only returned the compiled C pyd file.

Improvvisatore answered 11/12, 2012 at 23:21 Comment(2)
Have you experimented with this as your shift gets really small? For example, what if shift = (x[1]-x[0])/4.0. This is a more realistic test when compared with OP's request ("the time shift is smaller than the resolution of the data")Wage
It fails when the shift is smaller than the resolution of the data as the resolution of the time used to find the shift is the same as the datas. Didn't consider that. I wonder what the OPs data looks like when it is downsampled. Else it has to be interpolated.Improvvisatore
A
3

This is a very interesting problem. Originally, I was going to suggest a cross-correlation based solution similar to user948652's. However, from your problem description, there are two issues with that solution:

  1. The resolution of the data is larger than the time shift, and
  2. On some days, the predicted value and measured values have a very low correlation to each other

As a result of these two issues, I think that directly applying the cross-correlation solution is likely to actually increase your time shift, particularly on days where the predicted and measured values have a very low correlation to each other.

In my comment above, I asked if you had any events which occur in both time series, and you said that you do not. However, based on your domain, I think that you actually have two:

  1. Sunrise
  2. Sunset

Even if the rest of the signal is poorly correlated, the sunrise and sunset should be somewhat correlated, since they will monotonically increase from / decrease to the night time baseline. So here's a potential solution, based on these two events, that should both minimize the interpolation needed, and not be dependent on the cross correlation of poorly-correlated signals.

1. Find approximate Sunrise/Sunset

This should be easy enough, simply take the first and last data points which are higher than the night time flat line, and label those the approximate sunrise and sunset. Then, I would focus on that data, as well as the points immediately on either side, i.e.:

width=1
sunrise_index = get_sunrise()
sunset_index = get_sunset()

# set the data to zero, except for the sunrise/sunset events.
bitmap = zeros(data.shape)
bitmap[sunrise_index - width : sunrise_index + width] = 1
bitmap[sunset_index - width : sunset_index + width] = 1
sunrise_sunset = data * bitmap 

There are several ways to implement get_sunrise() and get_sunset() depending on how much rigor you need in your analysis. I would use numpy.diff, threshold it at a specific value, and take the first and last points above that value. You could also read the night time data in from a large number of files, calculate the mean & standard deviation, and look for the first and last data points that exceed, say, 0.5 * st_dev of the night time data. You could also do some sort of cluster-based template matching, in particular if different classes of day (i.e., sunny vs. partly cloudy vs. very cloudy) have highly stereotypical sunrise/sunset events.

2. Resample Data

I don't think that there is any way to solve this problem without some interpolation. I would use resample the data to a higher sample rate than the shift. If the shift is on the scale of minutes, then upsample to 1 minute or 30 seconds.

num_samples = new_sample_rate * sunrise_sunset.shape[0]
sunrise_sunset = scipy.signal.resample(sunrise_sunset, num_samples)

Alternatively, we could use a cubic spline to interpolate the data (see here).

3. Gaussian Convolution

Since there is some interpolation, then we don't know how precisely the actual sunrise and sunset were predicted. So, we can convolve the signal with a gaussian, to represent this uncertainty.

gaussian_window = scipy.signal.gaussian(M, std)
sunrise_sunset_g = scipy.signal.convolve(sunrise_sunset, gaussian_window)

4. Cross-Correlation

Use the cross-correlation method in user948652's answer to obtain the time shift.

There are a lot of unanswered questions in this method that would require examination of and experimentation with the data to more specifically nail down, such as what's the best method for identifying sunrise/sunset, how wide the gaussian window should be, etc. But it's how I would begin to attack the problem. Good luck!

Amersfoort answered 13/12, 2012 at 16:34 Comment(1)
Add a filter to throw out cloudy mornings/evenings, e.g., by looking at the deviation one hour later/earlier. Anyways, this should be the accepted answer.Heartsease
U
2

Indeed, interesting problem, but no satisfying answer yet. Let's try to change that...

You say that you prefer not to use interpolation, but, as I understand from your comment, what you really mean is that you would like to avoid upsampling to a higher resolution. A basic solution makes use of a least squares fit with a linear interpolation function, but without upsampling to a higher resolution:

import numpy as np
from scipy.interpolate import interp1d
from scipy.optimize import leastsq

def yvals(x):
    return np.sin(x)+np.sin(2*x)+np.sin(3*x)

dx = .1
X = np.arange(0,2*np.pi,dx)
Y = yvals(X)

unknown_shift = np.random.random() * dx
Y_shifted = yvals(X + unknown_shift)

def err_func(p):
    return interp1d(X,Y)(X[1:-1]+p[0]) - Y_shifted[1:-1]

p0 = [0,] # Inital guess of no shift
found_shift = leastsq(err_func,p0)[0][0]

print "Unknown shift: ", unknown_shift
print "Found   shift: ", found_shift

A sample run gives a quite accurate solution:

Unknown shift:  0.0695701123582
Found   shift:  0.0696105501967

If one includes noise in the shifted Y:

Y_shifted += .1*np.random.normal(size=X.shape)

One gets somewhat less precise results:

Unknown shift:  0.0695701123582
Found   shift:  0.0746643381744

The accuracy under presence of noise improves when more data is available, e.g. with:

X = np.arange(0,200*np.pi,dx)

A typical result is:

Unknown shift:  0.0695701123582
Found   shift:  0.0698527939193
Ursala answered 11/12, 2012 at 18:31 Comment(3)
The least-squares method assumes a normal distribution of deviations, which you used in the simulation. In the stated problem, however, the deviations tend to be one-sided due to cloud cover. In addition, the sum of squares is dominated by the large deviations on cloudy days, which is the opposite of what you want.Heartsease
@Heartsease You're right that we don't have a normal distribution. That would matter indeed when fitting a model, but it really doesn't matter a lot for estimating the time shift.Ursala
Indeed, the one-sidedness of the distribution of deviations would cancel if a cloudy sky is as probable in the morning as in the evening.Heartsease
B
0

I've successfully used (in awgn channel) matched filter approach, that gives peak energy m[n] at index n; then fitting a 2nd degree polynomial f(n) to m[n-1], m[n], m[n+1] and finding the minimum by setting f'(n)==0.

The response is not necessarily absolutely linear, especially if the autocorrelation of the signal doesn't vanish at m[n-1], m[n+1].

Bywaters answered 13/12, 2012 at 7:15 Comment(0)
L
0

Optimize for the best solution

For the constraints given, namely that the solution is phase-shifted by a small amount less than the sampling method, a simple downhill simplex algorithm works well. I've modified the sample problem of @mgilson to show how to do this. Note that this solution is robust, in that it can handle noise.

Error function: There may be more optimal things to optimize over, but this works surprisingly well:

np.sqrt((X1-X2+delta_x)**2+(Y1-Y2)**2).sum()

That is, minimize the Euclidean distance between the two curves by only adjusting the x-axis (phase).

import numpy as np

def yvals(x):
    return np.sin(x)+np.sin(2*x)+np.sin(3*x)

dx = .1
unknown_shift = .03 * np.random.random() * dx

X1  = np.arange(0,2*np.pi,dx)  #some X values
X2  = X1 + unknown_shift

Y1 = yvals(X1)
Y2 = yvals(X2) # shifted Y
Y2 += .1*np.random.normal(size=X1.shape)  # now with noise

def err_func(p):
    return np.sqrt((X1-X2+p[0])**2+(Y1-Y2)**2).sum()

from scipy.optimize import fmin

p0 = [0,] # Inital guess of no shift
found_shift = fmin(err_func, p0)[0]

print "Unknown shift: ", unknown_shift
print "Found   shift: ", found_shift
print "Percent error: ", abs((unknown_shift-found_shift)/unknown_shift)

A sample run gives:

Optimization terminated successfully.
         Current function value: 4.804268
         Iterations: 6
         Function evaluations: 12
Unknown shift:  0.00134765446268
Found   shift:  0.001375
Percent error:  -0.0202912082305
Lejeune answered 13/12, 2012 at 17:19 Comment(1)
Why not simply execute X2 - X1 ? No iterations needed and a perfect result! No, seriously, X2 is unknown, so you are actually cheating when you use it in your err_func! Though I must admit you inspired me for my answer...Ursala

© 2022 - 2024 — McMap. All rights reserved.