How do I traverse a KDTree to find k nearest neighbors?
Asked Answered
S

3

14

This question concerns the implementation of KNN searching of KDTrees. Traversal of a KDTree to find a single best match (nearest neighbor) is straightforward, akin to a modified binary search.

How is the traversal modified to exhaustively and efficiently find k-best matches (KNN)?

Edit for clarification: After finding the nearest node M to the input query I, how does the traversal algorithm continue to find the remaining K-1 closest matches to the query? Is there a traversal pattern which guarantees that nodes are visited in order of best to worst match to the query?

Saenz answered 9/1, 2016 at 2:11 Comment(1)
This is not a duplicate of link; I explicitly stated that I understood traversal to find a single nearest neighbor. The question asks how to modify the traversal to exhaustively find k nearest neighbors of a single query. I suspect that there is an efficient path back down the tree from the initial best match which may sequentially find more distant neighbors.Saenz
C
9

You can maintain a max heap of size k (k is the count of nearest neighbors which we wanted to find).

Start from the root node and insert the distance value in the max heap node. Keep on searching in k-d tree using dimensional splitting , criteria and keep updating Max Heap tree.

https://gopalcdas.wordpress.com/2017/05/24/construction-of-k-d-tree-and-using-it-for-nearest-neighbour-search/

~Ashish

Cocky answered 20/10, 2017 at 8:9 Comment(0)
I
2

Adding to @Ashish's answer, you can use a max-heap in the following manner:

1) Build a max-heap of the first k elements (arr[0] to arr[k-1]) of the given array. 

This step is O(k). Then

2) For each element, after the kth element (arr[k] to arr[n-1]), compare it with 
   root of the max-heap.
    a) If the element is smaller than the root then make it root 
       and call heapify for max-heap.
    b) Else ignore it.

The step 2 is O((n-k)*log(k)).

3) Finally, the max-heap has k smallest elements and root of the heap 
   is the kth smallest element.

Time Complexity: O(k + (n-k)*log(k)) without sorted output. If sorted output is needed then O(k + (n-k)*log(k) + k*log(k)).

Illuminance answered 18/8, 2020 at 14:12 Comment(0)
S
1

For the time being, I've settled on the sub-optimal solution of executing a series of progressively larger range searches until K nodes have been found.

Saenz answered 11/1, 2016 at 19:0 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.