How does the KD-tree nearest neighbor search work?

Asked 11/12, 2010 at 19:14 Answered 28/7, 2018 at 2:16

python machine-learning nearest-neighbor kdtree

I am looking at the Wikipedia page for KD trees. As an example, I implemented, in python, the algorithm for building a kd tree listed.

The algorithm for doing KNN search with a KD tree, however, switches languages and isn't totally clear. The English explanation starts making sense, but parts of it (such as the area where they "unwind recursion" to check other leaf nodes) don't really make any sense to me.

How does this work, and how can one do a KNN search with a KD tree in python? This isn't meant to be a "send me the code!" type question, and I don't expect that. Just a brief explanation please :)

Koodoo answered 11/12, 2010 at 19:14 Comment(1)

Did you click on the animation to the right of the "nearest neighbor search" algorithm? Watching it might make the written description clearer. – Saturation 11/12, 2010 at 20:1

This book introduction, page 3:

Given a set of n points in a d-dimensional space, the kd-tree is constructed recursively as follows. First, one finds a median of the values of the ith coordinates of the points (initially, i = 1). That is, a value M is computed, so that at least 50% of the points have their ith coordinate greater-or-equal to M, while at least 50% of the points have their ith coordinate smaller than or equal to M. The value of x is stored, and the set P is partitioned into PL and PR , where PL contains only the points with their ith coordinate smaller than or equal to M, and |PR | = |PL |±1. The process is then repeated recursively on both PL and PR , with i replaced by i + 1 (or 1, if i = d). When the set of points at a node has size 1, the recursion stops.

The following paragraphs discuss its use in solving nearest neighbor.

Or, here is the original 1975 paper by Jon Bentley.

EDIT: I should add that SciPy has a kdtree implementation:

Lubbock answered 12/12, 2010 at 3:33 Comment(3)

That link seems to be broken but try here: orion.math.iastate.edu/reu/2001/nearest_neighbor/… – Rodi 14/1, 2011 at 19:11

A follow on paper gives a great more detail about the algorithm for nearest neighbor searching with kd-trees, citation on ACM is at Paper by Friedman and Bentley. – Men 13/8, 2011 at 14:55

This link is broken yet again. This seems to work: citeseerx.ist.psu.edu/viewdoc/… – Pandora 30/3, 2017 at 4:49

I've just spend some time puzzling out the Wikipedia description of the algorithm myself, and came up with the following Python implementation that may help: https://gist.github.com/863301

The first phase of closest_point is a simple depth first search to find the best matching leaf node.

Instead of simply returning the best node found back up the call stack, a second phase checks to see if there could be a closer node on the "away" side: (ASCII art diagram)

            n     current node
 b          |     best match so far
 |      p   |     point we're looking for
 |<    >|   |     error
        |< >|     distance to "away" side
        |<  | >|  error "sphere" extends to "away" side
            | x   possible better match on the "away" side

The current node n splits the space along a line, so we only need to look on the "away" side if the "error" between the point p and the best match b is greater than the distance from point p and the line though n. If it is, then we check to see if there are any points on the "away" side that are closer.

Because our best matching node is passed into this second test, it doesn't have to do a full traversal of the branch and will stop pretty quickly if it's on the wrong track (only heading down the "near" child nodes until it hits a leaf.)

To compute the distance between the point p and the line splitting the space through the node n, we can simply "project" the point down onto the axis by copying the appropriate coordinate as the axes are all orthogonal (horizontal or vertical).

Knifeedged answered 10/3, 2011 at 4:13 Comment(1)

I see location variable doesn't change across the closest_point method. can you explain how it's work – Darr 6/2, 2016 at 22:1

lets consider a example,for simplicity consider d=2 and the result of the Kd tree is show below

Your query point is Q and you want to find out k-nearest neighbours

The above tree is represents of kd-tree
we will search through the tree to fall into one of the regions.In kd-tree each region is represented by a single point.

then we will find out the distance between this point and query point

Then we will draw a circle with radius of that distance to ensure whether is there any point which are nearer to the query point.

Then axis which are fall in that circle area,we backtrack to those axis and find near point

Birthday answered 28/7, 2018 at 2:16 Comment(0)

Recommended topics

Hot tags