I have two 2d arrays, obs1
and obs2
. They represent two independent measurement series, and both have dim0 = 2, and slightly different dim1, say obs1.shape = (2, 250000)
, and obs2.shape = (2, 250050)
. obs1[0]
and obs2[0]
signify time, and obs1[1]
and obs2[1]
signify some spatial coordinate. Both arrays are (more or less) sorted by time. The times and coordinates should be identical between the two measurement series, but in reality they aren't. Also, not each measurement from obs1
has a corresponding value in obs2
and vice-versa. Another problem is that there might be a slight offset in the times.
I'm looking for an efficient algorithm to associate the best matching value from obs2
to each measurement in obs1
. Currently, I do it like this:
define dt = some_maximum_time_difference
define dx = 3
j = 0
i = 0
matchresults = np.empty(obs1.shape[1])
for j in obs1.shape[1]:
while obs1[0, j] - obs2[0, j] < dt:
i += 1
matchresults[j] = i - dx + argmin(abs(obs1[1, i] - obs2[1, i-dx:i+dx+1]))
This yields good results. However, it is extremely slow, running in a loop.
I would be very thankful for ideas on how to improve this algorithm speed-wise, e.g. using KDtree or something similar.
first_obs2_index_with_time_difference_lessthan_delta_t resulting_i = i + argmin(abs(obs1[1, i] - obs2[1, i-delta_x:i+delta_x+1])
? Assuming that you are not performing another loop here, I don't believe KDTrees would speed things up too much. – Folklore