I have a set of curves defined as 2D arrays (number of points, number of coordinates). I am calculating a distance matrix for them using Hausdorff distance. My current code is as follows. Unfortunately it is too slow with 500-600 curves each having 50-100 3D points. Is there any faster way for that?
def distanceBetweenCurves(C1, C2):
D = scipy.spatial.distance.cdist(C1, C2, 'euclidean')
#none symmetric Hausdorff distances
H1 = np.max(np.min(D, axis=1))
H2 = np.max(np.min(D, axis=0))
return (H1 + H2) / 2.
def distanceMatrixOfCurves(Curves):
numC = len(Curves)
D = np.zeros((numC, numC))
for i in range(0, numC-1):
for j in range(i+1, numC):
D[i, j] = D[j, i] = distanceBetweenCurves(Curves[i], Curves[j])
return D
scipy.spatial.distance.cdist
the slow piece or the double loop insidedistanceMatrixOfCurves
? If these curves are convex it might be possible to optimize the first possible slow piece. Do these curves intersect or are contained inside others ? I feel like you could reuse earlier found distances to speed up new calculations. This is just babbling of course, I have myself a similar issue with min(min(..)) measures and had trouble to generalize these considerations I'm putting here. Did you try or think about anything beyond the code you posted ? – Refugecdist
computes - if you can add some code that generates some small example data it will be easier to help you. – Groynecdist
(just use a form of wave propagation) that might cost more memory and in the worst case it doesn't help. This doesn't mean it can't help in reducing execution time, but it is problematic. – RefugeD
, or can you live with just upper- or lower-triangular matrix? ThisD[i,j] = D[j,i] =...
is definitely no good for data locality; 2. Have you tried using list compresensions ormap
instead of double loops? – Baca