nearest neighbor - k-d tree - wikipedia proof

Asked 26/10, 2009 at 20:58 Answered 9/5, 2016 at 2:33

On the wikipedia entry for k-d trees, an algorithm is presented for doing a nearest neighbor search on a k-d tree. What I don't understand is the explanation of step 3.2. How do you know there isn't a closer point just because the difference between the splitting coordinate of the search point and the current node is greater than the difference between the splitting coordinate of the search point and the current best?

Nearest neighbor search Animation of NN searching with a KD Tree in 2D

The nearest neighbor (NN) algorithm aims to find the point in the tree which is nearest to a given input point. This search can be done efficiently by using the tree properties to quickly eliminate large portions of the search space. Searching for a nearest neighbor in a kd-tree proceeds as follows:

Starting with the root node, the algorithm moves down the tree recursively, in the same way that it would if the search point were being inserted (i.e. it goes right or left depending on whether the point is greater or less than the current node in the split dimension).

Once the algorithm reaches a leaf node, it saves that node point as the "current best"

The algorithm unwinds the recursion of the tree, performing the following steps at each node: 1. If the current node is closer than the current best, then it becomes the current best. 2. The algorithm checks whether there could be any points on the other side of the splitting plane that are closer to the search point than the current best. In concept, this is done by intersecting the splitting hyperplane with a hypersphere around the search point that has a radius equal to the current nearest distance. Since the hyperplanes are all axis-aligned this is implemented as a simple comparison to see whether the difference between the splitting coordinate of the search point and current node is less than the distance (overall coordinates) from the search point to the current best. 1. If the hypersphere crosses the plane, there could be nearer points on the other side of the plane, so the algorithm must move down the other branch of the tree from the current node looking for closer points, following the same recursive process as the entire search. 2. If the hypersphere doesn't intersect the splitting plane, then the algorithm continues walking up the tree, and the entire branch on the other side of that node is eliminated.

When the algorithm finishes this process for the root node, then the search is complete.

Generally the algorithm uses squared distances for comparison to avoid computing square roots. Additionally, it can save computation by holding the squared current best distance in a variable for comparison.

Iceni answered 26/10, 2009 at 20:58 Comment(1)

Check this https://mcmap.net/q/744666/-k-d-tree-nearest-neighbor-search-algorithm-with-tractable-pseudo-code It provides an algorithm - in very clear term and with a nice explanation – Doriandoric 4/4, 2021 at 18:25

Look carefully at the 6th frame of the animation on that page.

As the algorithm is going back up the recursion, it is possible that there is a closer point on the other side of the hyperplane that it's on. We've checked one half, but there could be an even closer point on the other half.

Well, it turns out we can sometimes make a simplification. If it's impossible for there to be a point on the other half closer than our current best (closest) point, then we can skip that hyperplane half entirely. This simplification is the one shown on the 6th frame.

Figuring out whether this simplification is possible is done by comparing the distance from the hyperplane to our search location. Because the hyperplane is aligned to the axes, the shortest line from it to any other point will a line along one dimension, so we can compare just the coordinate of the dimension that the hyperplane splits.

If it's farther from the search point to the hyperplane than from the search point to your current closest point, then there's no reason to search past that splitting coordinate.

Even if my explanation doesn't help, the graphic will. Good luck on your project!

Neal answered 10/1, 2010 at 8:12 Comment(1)

This is the missing link that made be understand the algorithm. It seems like none of the other explanations take time to explain the simplification step (or they just mention it as a by the way thing). – Synagogue 1/10, 2013 at 14:16

Yes, the description of NN (Nearest Neighbour) search in a KD Tree on Wikipedia is a little hard to follow. It doesn't help that a lot of the top Google search results on NN KD Tree searches are just plain wrong!

Here's some C++ code to show you how to get it right:

template <class T, std::size_t N>
void KDTree<T,N>::nearest (
    const const KDNode<T,N> &node,
    const std::array<T, N> &point, // looking for closest node to this point
    const KDPoint<T,N> &closest,   // closest node (so far)
    double &minDist,
    const uint depth) const
{
    if (node->isLeaf()) {
        const double dist = distance(point, node->leaf->point);
        if (dist < minDist) {
            minDist = dist;
            closest = node->leaf;
        }
    } else {
        const T dim = depth % N;
        if (point[dim] < node->splitVal) {
            // search left first
            nearest(node->left, point, closest, minDist, depth + 1);
            if (point[dim] + minDist >= node->splitVal)
                nearest(node->right, point, closest, minDist, depth + 1);
        } else {
            // search right first
            nearest(node->right, point, closest, minDist, depth + 1);
            if (point[dim] - minDist <= node->splitVal)
                nearest(node->left, point, closest, minDist, depth + 1);
        }
    }
}

API for NN searching on a KD Tree:

// Nearest neighbour
template <class T, std::size_t N>
const KDPoint<T,N> KDTree<T,N>::nearest (const std::array<T, N> &point) const {
    const KDPoint<T,N> closest;
    double minDist = std::numeric_limits<double>::max();
    nearest(root, point, closest, minDist);
    return closest;
}

Default distance function:

template <class T, std::size_t N>
double distance (const std::array<T, N> &p1, const std::array<T, N> &p2) {
    double d = 0.0;
    for (uint i = 0; i < N; ++i) {
        d += pow(p1[i] - p2[i], 2.0);
    }
    return sqrt(d);
}

Edit: some people are asking for help with the data structures too (not just the NN algorithm), so here is what I have used. Depending on your purpose, you might wish to modify the data structures slightly. (Note: but you almost certainly do not want to modify the NN algorithm.)

KDPoint class:

template <class T, std::size_t N>
class KDPoint {
    public:
        KDPoint<T,N> (std::array<T,N> &&t) : point(std::move(t)) { };
        virtual ~KDPoint<T,N> () = default;
        std::array<T, N> point;
};

KDNode class:

template <class T, std::size_t N>
class KDNode
{
    public:
        KDNode () = delete;
        KDNode (const KDNode &) = delete;
        KDNode & operator = (const KDNode &) = delete;
        ~KDNode () = default;

        // branch node
        KDNode (const T                       split,
                std::unique_ptr<const KDNode> &lhs,
                std::unique_ptr<const KDNode> &rhs) : splitVal(split), left(std::move(lhs)), right(std::move(rhs)) { };
        // leaf node
        KDNode (std::shared_ptr<const KDPoint<T,N>> p) : splitVal(0), leaf(p) { };

        bool isLeaf (void) const { return static_cast<bool>(leaf); }

        // data members
        const T                                   splitVal;
        const std::unique_ptr<const KDNode<T,N>>  left, right;
        const std::shared_ptr<const KDPoint<T,N>> leaf;
};

KDTree class: (Note: you'll need to add a member function to build/fill your tree.)

template <class T, std::size_t N>
class KDTree {
    public:
        KDTree () = delete;
        KDTree (const KDTree &) = delete;
        KDTree (KDTree &&t) : root(std::move(const_cast<std::unique_ptr<const KDNode<T,N>>&>(t.root))) { };
        KDTree & operator = (const KDTree &) = delete;
        ~KDTree () = default;

        const KDPoint<T,N> nearest (const std::array<T, N> &point) const;

        // Nearest neighbour search - runs in O(log n)
        void nearest (const std::unique_ptr<const KDNode<T,N>> &node,
                      const std::array<T, N> &point,
                      std::shared_ptr<const KDPoint<T,N>> &closest,
                      double &minDist,
                      const uint depth = 0) const;

        // data members
        const std::unique_ptr<const KDNode<T,N>> root;
};

Dilatant answered 9/5, 2016 at 2:33 Comment(9)

My C++ is kindof rough, but I think you're missing some important code here. There's no definition of KDNode or KDPoint. – Equitant 15/10, 2016 at 19:36

distance(point, node->leaf->point); I guess this also fill the array point with all the points in that subregion? Could you plz elaborate on this? – Bedplate 25/10, 2016 at 13:46

@Project: the question was about the NN algorithm, but I've added info about the data structures to make it an overly-comprehensive answer. :) – Dilatant 25/10, 2016 at 22:28

@Axl: distance() is simply the separation between 2 points. I edited the answer to include my default implementation. Hopefully this simple but critical concept is clearer now? – Dilatant 25/10, 2016 at 22:32

@ScottSmedley Thanks, I find the extra code helpful, not being very familiar with tree structures. – Equitant 26/10, 2016 at 2:34

What is "splitVal"? I'm trying to implement this in Java. – Oblivious 11/10, 2021 at 11:52

Am loathed to downvote this, since the answer is so comprehensive, but (crucially) step 3.1 from the Wikipedia solution is missing. This code just compares query points with leaf nodes. Rather than "(a) nearest(left) (b) perhaps(nearest(right))" ... we should have "(a) nearest(left) (b) check this node (c) perhaps(nearest(right))". Ditto for RL, rather than LR – Lineament 12/6, 2023 at 12:3

Step 3.1 is performed early in nearest() - there's a call to distance() and then a check to see if the current node is closer than the best found so far. The code does "just compare query points with leaf nodes". It doesn't make sense to compare with this (non-leaf) node as only leaf nodes store full coordinates. (Non leaf nodes only store the split value.) Because it's so hard to find good, clear explanations of how KD Trees worked I ran millions of test cases verifying the code above - to ensure it works properly. I'm very confident it is correct. – Dilatant 14/6, 2023 at 1:23

Thanks. I came here to say that with a en.wikipedia.org/wiki/K-d_tree#Points_only_in_leaves tree, your code is correct, and saw your comment. As you've guessed by now, mine is not such a tree! I found your answer very helpful – Lineament 15/6, 2023 at 16:21

Recommended topics

Hot tags