Pairwise Distances Between Two "islands"/"connected components" in Numpy Array
Asked Answered
M

2

9

Consider the following image, stored as a numpy array:

a = [[0,0,0,0,0,1,1,0,0,0],
     [0,0,0,0,1,1,1,1,0,0],
     [0,0,0,0,0,1,1,0,0,0],
     [0,0,0,0,0,0,0,0,0,0],
     [0,0,0,0,0,2,0,0,0,0],
     [0,0,0,0,0,2,2,0,0,0],
     [0,0,0,0,0,2,0,0,0,0],
     [0,0,0,0,3,3,3,0,0,0],
     [4,0,0,0,0,0,0,0,0,0],
     [4,4,0,0,0,0,0,0,0,0],
     [4,4,4,0,0,0,0,0,0,0]]

a = np.array(a)

Zeros represent background pixels, 1,2,3 and 4 represent pixels that belong to objects. You can see that objects always form contiguous islands or regions in the image. I would like to know the distance between every pair of objects. As distance measure I'd like to have the shortest, staightline distance, between those pixels of the object, that are closest to each other. Example: Distance(2,3) = 1, because they are touching. Distance(1,2) = 2, because there is exactly one background pixel separating the two regions, or in other words, the closest pixels of the objects are two pixels apart.

Can anybody tell me how one would approach this problem in Python? Or link me to some resources?

Manicure answered 9/7, 2020 at 8:43 Comment(6)
Does this answer your question? Fastest pairwise distance metric in pythonLm
No unfortunately not. The link you posted treats the 1D case, and a slightly different problem altogether.Manicure
I think the solution to your problem is fundamentally the same. I'd start trying adapting what's in there and see if you get stuck somewhere.Lm
@yatu I am curious, would that take longer or simply comparing all points of two islands (post below)?Gurgle
I think if the islands are already "labeled" the bellow appraoch should work just fine. The connectec components are more if they werent necessarily labeledAnora
@Manicure What is the straight-line between island 3 and 4? I am not sure if the distance definition is clear for the cases like that. Please elaborate on it. Thank you.Gurgle
G
8

This is what you would need:

from scipy.spatial.distance import cdist
def Distance(a, m, n):
  return cdist(np.argwhere(a==m),np.argwhere(a==n),'minkowski',p=1.).min()

or similarly per @MaxPowers comment (claim: cityblock is faster):

  return cdist(np.argwhere(a==m),np.argwhere(a==n),'cityblock').min()

Find the locations of islands and calculate pairwise distance of locations and get the minimum. I am not 100% sure of your desired distance, but I think you are looking for l1 norm. If not, you can change the cdist measure to your desired metric.

output:

Distance(a,2,3)
1.0
Distance(a,2,1)
2.0
Distance(a,3,1)
5.0
Distance(a,4,3)
5.0
Gurgle answered 9/7, 2020 at 8:55 Comment(6)
This answers the question. You could directly use metric='cityblock', which is way faster and does not need the p parameter.Irreparable
@Irreparable Thank you. Didn't know it has its own metric. Could you please elaborate on why that one is faster in performance? Shouldn't they be almost similar?Gurgle
The L1 norm is calculated as the sum of the differences of the vector elements. Minkowski is calculated as the generalized Lp norm using np.norm. Hence, the ladder takes the powers of each vector element and computes the root of their total difference.Irreparable
@Irreparable Interesting. Thank you. I was expecting it to be smarter for the case of p=1Gurgle
FYI, @Ehsan: as per pull request 12375 minkowski will be smarter for the cases p=1 and p=2. This will be availabel in release 1.16.0.Irreparable
@Irreparable Thank you for note. Does this mean, minkowski will be as fast as city block in the coming release version 1.16.0 (I assume it cannot be faster)?Gurgle
F
6

For many blobs or bigger blobs or if performance/memory efficiency is a criteria, you might want to work with contours of those islands. With that in mind, we will use OpenCV's findContours to get the contours, then perform pairwise distance computation and get the min one as the final output. The implementation would look something like this that gets all possible pairiwise distances -

from scipy.spatial.distance import cdist
import cv2

ids = np.arange(1, a.max()+1) #np.unique(a)[1:] if not in ranged sequence

idxs = []
for id_ in ids:
    im = (a == id_).astype(np.uint8)
    contours,_ = cv2.findContours(im, cv2.RETR_TREE, cv2.CHAIN_APPROX_NONE)
    idx = contours[0][:, 0]
    idxs.append(idx)

# Get pairwise indices and then distances
r,c = np.triu_indices(len(ids), 1)
pdists = {(ids[i],ids[j]):cdist(idxs[i], idxs[j]).min() for (i, j) in zip(r, c)}

Output dict for given sample -

In [225]: pdists
Out[225]: 
{(1, 2): 2.0,
 (1, 3): 5.0,
 (1, 4): 7.810249675906654,
 (2, 3): 1.0,
 (2, 4): 5.0,
 (3, 4): 3.605551275463989}

By default,cdist uses euclidean distance as the metric. Depending on your definition of straight line between islands, you might want to try out other metrics, namely 'minkowski' and 'cityblock' for Minkowski and Manhattan distances respectively.

So, cdist(idxs[i], idxs[j]) would change to cdist(idxs[i], idxs[j], metric=...).

Forgive answered 9/7, 2020 at 9:35 Comment(1)
note: this assumes that each island (connected component) has a unique value. If multiple connected components have the same value, idx = [c[0] for contour in contours for c in contour] would be more accurate.Gomulka

© 2022 - 2024 — McMap. All rights reserved.