This is a very interesting problem. Originally, I was going to suggest a cross-correlation based solution similar to user948652's. However, from your problem description, there are two issues with that solution:
- The resolution of the data is larger than the time shift, and
- On some days, the predicted value and measured values have a very low correlation to each other
As a result of these two issues, I think that directly applying the cross-correlation solution is likely to actually increase your time shift, particularly on days where the predicted and measured values have a very low correlation to each other.
In my comment above, I asked if you had any events which occur in both time series, and you said that you do not. However, based on your domain, I think that you actually have two:
- Sunrise
- Sunset
Even if the rest of the signal is poorly correlated, the sunrise and sunset should be somewhat correlated, since they will monotonically increase from / decrease to the night time baseline. So here's a potential solution, based on these two events, that should both minimize the interpolation needed, and not be dependent on the cross correlation of poorly-correlated signals.
1. Find approximate Sunrise/Sunset
This should be easy enough, simply take the first and last data points which are higher than the night time flat line, and label those the approximate sunrise and sunset. Then, I would focus on that data, as well as the points immediately on either side, i.e.:
width=1
sunrise_index = get_sunrise()
sunset_index = get_sunset()
# set the data to zero, except for the sunrise/sunset events.
bitmap = zeros(data.shape)
bitmap[sunrise_index - width : sunrise_index + width] = 1
bitmap[sunset_index - width : sunset_index + width] = 1
sunrise_sunset = data * bitmap
There are several ways to implement get_sunrise()
and get_sunset()
depending on how much rigor you need in your analysis. I would use numpy.diff
, threshold it at a specific value, and take the first and last points above that value. You could also read the night time data in from a large number of files, calculate the mean & standard deviation, and look for the first and last data points that exceed, say, 0.5 * st_dev
of the night time data. You could also do some sort of cluster-based template matching, in particular if different classes of day (i.e., sunny vs. partly cloudy vs. very cloudy) have highly stereotypical sunrise/sunset events.
2. Resample Data
I don't think that there is any way to solve this problem without some interpolation. I would use resample the data to a higher sample rate than the shift. If the shift is on the scale of minutes, then upsample to 1 minute or 30 seconds.
num_samples = new_sample_rate * sunrise_sunset.shape[0]
sunrise_sunset = scipy.signal.resample(sunrise_sunset, num_samples)
Alternatively, we could use a cubic spline to interpolate the data (see here).
3. Gaussian Convolution
Since there is some interpolation, then we don't know how precisely the actual sunrise and sunset were predicted. So, we can convolve the signal with a gaussian, to represent this uncertainty.
gaussian_window = scipy.signal.gaussian(M, std)
sunrise_sunset_g = scipy.signal.convolve(sunrise_sunset, gaussian_window)
4. Cross-Correlation
Use the cross-correlation method in user948652's answer to obtain the time shift.
There are a lot of unanswered questions in this method that would require examination of and experimentation with the data to more specifically nail down, such as what's the best method for identifying sunrise/sunset, how wide the gaussian window should be, etc. But it's how I would begin to attack the problem.
Good luck!