Efficient matching of two arrays (how to use KDTree)
Asked Answered
L

1

2

I have two 2d arrays, obs1 and obs2. They represent two independent measurement series, and both have dim0 = 2, and slightly different dim1, say obs1.shape = (2, 250000), and obs2.shape = (2, 250050). obs1[0] and obs2[0] signify time, and obs1[1] and obs2[1] signify some spatial coordinate. Both arrays are (more or less) sorted by time. The times and coordinates should be identical between the two measurement series, but in reality they aren't. Also, not each measurement from obs1 has a corresponding value in obs2 and vice-versa. Another problem is that there might be a slight offset in the times.

I'm looking for an efficient algorithm to associate the best matching value from obs2 to each measurement in obs1. Currently, I do it like this:

define dt = some_maximum_time_difference
define dx = 3
j = 0
i = 0
matchresults = np.empty(obs1.shape[1])
for j in obs1.shape[1]:
    while obs1[0, j] - obs2[0, j] < dt:
        i += 1
    matchresults[j] = i - dx + argmin(abs(obs1[1, i] - obs2[1, i-dx:i+dx+1]))

This yields good results. However, it is extremely slow, running in a loop.

I would be very thankful for ideas on how to improve this algorithm speed-wise, e.g. using KDtree or something similar.

Ludwog answered 20/3, 2013 at 13:52 Comment(5)
Can you please expand a bit on first_obs2_index_with_time_difference_lessthan_delta_t resulting_i = i + argmin(abs(obs1[1, i] - obs2[1, i-delta_x:i+delta_x+1])? Assuming that you are not performing another loop here, I don't believe KDTrees would speed things up too much.Folklore
@MikyDinescu thanks for your question; I updated the algorithm in the original post.Ludwog
can you post a bit of data, and your expected results?Mcleroy
I was wondering of anybody replied to you. Thanks for posting more details but unfortunately it's still unclear to me how your current algo. works.. I'm not a python developer so maybe somebody else will chime in. I would suggest though to present your algorithm in pseudo code instead. Or maybe ask on a more CS stack exchange..Folklore
@Ludwog did you check the answer below?Staccato
S
1

Using cKDTree for this case would look like:

from scipy.spatial import cKDTree

obs2 = array with shape (2, m)
obs1 = array with shape (2, n)

kdt = cKDTree(obs2.T)
dist, indices = kdt.query(obs1.T)

where indices will contain the column indices in obs2 corresponding to each observation in obs1. Note that I had to transpose obs1 and obs2.

Staccato answered 17/10, 2014 at 12:51 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.