Optimize scipy nearest neighbor search
Asked Answered
C

1

6

I am trying to find all the nearest neighbors which are within 1 KM radius. Here is my script to construct tree and search the nearest points,

from pysal.cg.kdtree import KDTree

def construct_tree(s):
    data_geopoints = [tuple(x) for x in s[['longitude','latitude']].to_records(index=False)]
    tree = KDTree(data_geopoints, distance_metric='Arc', radius=pysal.cg.RADIUS_EARTH_KM)
    return tree

def get_neighbors(s,tree):
    indices = tree.query_ball_point(s, 1)
    return indices

#Constructing the tree for search
tree = construct_tree(data)

#Finding the nearest neighbours within 1KM
data['neighborhood'] = data['lat_long'].apply(lambda row: get_neighbors(row,tree))

From what I read in pysal page, it says -

kd-tree built on top of kd-tree functionality in scipy. If using scipy 0.12 or greater uses the scipy.spatial.cKDTree, otherwise uses scipy.spatial.KDTree.

In my case it should be using cKDTree. This is working fine for a sample dataset, but since the tree.query_ball_point returns the list of indices as a result. Each list will have 100s of elements. For my data points (2 Million records), this is growing bigger and bigger and stops due to memory issue after certain point. Any idea on how to solve this?

Carny answered 31/7, 2017 at 4:12 Comment(5)
Have you considered storing the neighborhood data not in a DataFrame? networkx.Graph comes to mind.Crier
Sorry never heard about it. Can you write an example? I can try that may be.Carny
networkx.github.io is a library for working with graph data. In your case, I would store location ids as vertices and add edges between locations less than 1 km apart. The docs include a good tutorial.Crier
see query_ball_treePentangular
How is it different?Carny
C
0

Just in case if anyone looking for an answer for this, I have solved it by finding the nearest neighbours for a group (tree.query_ball_point can handle batches) and write in to database and then process next group, rather than keeping all in memory. Thanks.

Carny answered 13/8, 2017 at 22:39 Comment(1)
You state "tree.query_ball_point can handle batches". Can you post some example code?Expansible

© 2022 - 2024 — McMap. All rights reserved.