I have a set of points for which I want to construct KD Tree. After some time I want to add few more points to this KDTree periodically. Is there any way to do this in scipy implementation
The problem with k-d-trees is that they are not designed for updates.
While you can somewhat easily insert objects (if you use a pointer based representation, which needs substantially more memory than an array-based tree), and do deletions with tricks such as tombstone messages, doing such changes will degrate the performance of the tree.
I am not aware of a good method for incrementally rebalancing a k-d-tree. For 1-dimensional trees you have red-black-trees, B-trees, B*-trees, B+-trees and such things. These don't obviously work with k-d-trees because of the rotating axes and thus different sorting. So in the end, with a k-d-tree, it may be best to just collect changes, and from time to time do a full tree rebuild. Then at least this part of the tree will be quite good.
However, there exists a similar structure (that in my experiments often outperforms the k-d-tree!): the R*-tree. Instead of performing binary splits, it uses rectangular bounding boxes to collect objects, and a lot of thought was put into making the tree a dynamic data structure. This is also where the R*-tree performs much better than the R-tree: it has a much more clever split for kNN search, and it performs incremental rebalancing to improve its structure.
With https://github.com/stefankoegl/kdtree, it's possible to start with an empty KD-Tree and add nodes one after the other.
pip install kdtree
import kdtree
# Create an empty tree by specifying the number of
# dimensions its points will have
tree = kdtree.create(dimensions=3)
tree.add((1, 2, 3))
I have no idea if it's slower than the one provided by SciPy, but at least it worked fine for my needs (adding points to the Tree, unless there's a point which is too close already).
Here's a more complete example:
import random
import matplotlib.pyplot as plt
import kdtree
SIZE = 100
N = 10_000
MIN_DISTANCE = 5
MIN_DISTANCE_2 = MIN_DISTANCE * MIN_DISTANCE
x, y = random.uniform(0, SIZE), random.uniform(0, SIZE)
tree = kdtree.create(dimensions=2)
tree.add((x, y))
xs, ys = [], []
not_xs, not_ys = [], []
for _ in range(N):
x, y = random.uniform(0, SIZE), random.uniform(0, SIZE)
_neighbour, distance_2 = tree.search_nn((x, y))
if distance_2 > MIN_DISTANCE_2:
xs.append(x)
ys.append(y)
tree.add((x, y))
else:
not_xs.append(x)
not_ys.append(y)
plt.axes().set_aspect(1)
plt.scatter(not_xs, not_ys, color='red', s=0.1)
plt.scatter(xs, ys)
plt.show()
It tries to place 10000 points randomly in a 100 * 100 square, but not closer than 5. Placed points are blue, rejected points are red.
© 2022 - 2024 — McMap. All rights reserved.