I have an 1D array of numbers, and want to calculate all pairwise euclidean distances. I have a method (thanks to SO) of doing this with broadcasting, but it's inefficient because it calculates each distance twice. And it doesn't scale well.
Here's an example that gives me what I want with an array of 1000 numbers.
import numpy as np
import random
r = np.array([random.randrange(1, 1000) for _ in range(0, 1000)])
dists = np.abs(r - r[:, None])
What's the fastest implementation in scipy/numpy/scikit-learn that I can use to do this, given that it has to scale to situations where the 1D array has >10k values.
Note: the matrix is symmetric, so I'm guessing that it's possible to get at least a 2x speedup by addressing that, I just don't know how.
scipy.spatial.distance.pdist
. I dunno whether this is the fastest option, since it needs to have checks for multidimensional data, non-Euclidean norms, and other things, but it's built in. – Romienumpy
is compiled with BLAS or MKL, the one you download straightly from sourceforge is likely not. – Doverscipy
is always compiled with BLAS, it's not optional as withnumpy
. – Impetuspdist
with'cityblock'
for the metric should do the trick. – Conradoscipy.spatial.distance
only calls BLAS for a limited number of cases, and only after doing some more operations because BLAS has no distance functions. It's not faster for the OP's use case. – Baccarat