I am trying to find all the nearest neighbors which are within 1 KM radius. Here is my script to construct tree and search the nearest points,
from pysal.cg.kdtree import KDTree
def construct_tree(s):
data_geopoints = [tuple(x) for x in s[['longitude','latitude']].to_records(index=False)]
tree = KDTree(data_geopoints, distance_metric='Arc', radius=pysal.cg.RADIUS_EARTH_KM)
return tree
def get_neighbors(s,tree):
indices = tree.query_ball_point(s, 1)
return indices
#Constructing the tree for search
tree = construct_tree(data)
#Finding the nearest neighbours within 1KM
data['neighborhood'] = data['lat_long'].apply(lambda row: get_neighbors(row,tree))
From what I read in pysal page, it says -
kd-tree built on top of kd-tree functionality in scipy. If using scipy 0.12 or greater uses the scipy.spatial.cKDTree, otherwise uses scipy.spatial.KDTree.
In my case it should be using cKDTree. This is working fine for a sample dataset, but since the tree.query_ball_point
returns the list of indices as a result. Each list will have 100s of elements. For my data points (2 Million records), this is growing bigger and bigger and stops due to memory issue after certain point. Any idea on how to solve this?
neighborhood
data not in a DataFrame?networkx.Graph
comes to mind. – Crier