I have a list of points P=[p1,...pN] where pi=(latitudeI,longitudeI).
Using Python 3, I would like to find a smallest set of clusters (disjoint subsets of P) such that every member of a cluster is within 20km of every other member in the cluster.
Distance between two points is computed using the Vincenty method.
To make this a little more concrete, suppose I have a set of points such as
from numpy import *
points = array([[33. , 41. ],
[33.9693, 41.3923],
[33.6074, 41.277 ],
[34.4823, 41.919 ],
[34.3702, 41.1424],
[34.3931, 41.078 ],
[34.2377, 41.0576],
[34.2395, 41.0211],
[34.4443, 41.3499],
[34.3812, 40.9793]])
Then I am trying to define this function:
from geopy.distance import vincenty
def clusters(points, distance):
"""Returns smallest list of clusters [C1,C2...Cn] such that
for x,y in Ci, vincenty(x,y).km <= distance """
return [points] # Incorrect but gives the form of the output
NOTE: Many questions cluster on geo location and attribute. My question is for location only. This is for lat/lon, not Euclidean distance. There are other questions out there that give sort-of answers but not the answer to this question (many unanswered):
- https://datascience.stackexchange.com/questions/761/clustering-geo-location-coordinates-lat-long-pairs
- https://gis.stackexchange.com/questions/300171/clustering-geo-points-and-export-borders-in-kml
- https://gis.stackexchange.com/questions/194873/clustering-geographical-data-based-on-point-location-and-associated-point-values
- https://gis.stackexchange.com/questions/256477/clustering-latitude-longitude-data-based-on-distance
- and more, none of which answer this question.