Is kd-tree always balanced?

Asked 26/8, 2015 at 8:12 Answered 28/7, 2018 at 0:13

algorithm data-structures avl-tree red-black-tree kdtree

I have used kd-tree algoritham and make tree.

But i found that tree is not balanced so my question is if we used kd-tree algoritham then that tree is always balanced if not then how can we make it balance ?.

We can use another algoritham likes AVL or Red-Black for balancing kd tree ?

I have some sample data for that i used kd-tree algoritham but that tree is not balanced.

 (14,31), (15,32), (17,42), (16,44), (18,52), (16,62)

Theo answered 26/8, 2015 at 8:12 Comment(5)

maybe this can help: en.wikipedia.org/wiki/K-D-B-tree. And no, techniques you mentioned will not work for balacing KD tree. – Barrens 26/8, 2015 at 8:21

Thanks john ....can K-D-B-tree can used for storing geospatial data ? – Theo 26/8, 2015 at 8:23

Do you have one set of points that you know completely before constructing the kd-tree? Or do you have to update the kd-tree all the time with new points? – Disquisition 25/10, 2015 at 19:38

If your six sample points are named P1,P2,...,P6 then here is a balanced tree that can be generated algorithmically: (P4,(P1,P2,nil),(P5,P3,P6)). What problem do you see? – Disquisition 26/10, 2015 at 13:18

Points comes randomly.. there are not fixed point. Point can be any lat,long pair.It comes from request. – Theo 27/10, 2015 at 4:46

This is a fairly broad topic and the questions themselves are kind of general. Hopefully this will give you some useful insights and material to work with:

Kd tree is not always balanced.
AVL and Red-Black will not work with K-D Trees, you will have either construct some balanced variant such as K-D-B-tree or use other balancing techniques.
K-d Tree are commonly used to store GeoSpatial data because they let you search over more then one key, contrary to 'traditional' tree which lets you do single dimensional search. GeoSpatial data certainly cannot be represented in single dimension.

Note that there are also specialized databases working with GeoSpatial data so it might be worth checking if the overhead could be shifted to them instead of making your own solution: Although i don't have much experience with this, maybe it is worth checking the postgis.

postgis

Here are some useful links showing how to build balanced K-D tree variant and usage of K-D trees with Spatial data:

balancing K-D-Tree

K-D-B-tree

spatial data k-d-trees

Barrens answered 26/8, 2015 at 8:34 Comment(7)

Hey john as you said that k-d trree is used for storing Geospatial data right ?but if my first data is something like this (1, -1).. now whatever point comes after this point will insert on right-side of the tree so its not true. – Theo 26/8, 2015 at 12:51

Yeah that is correct in this case, which i guess doesn't make much sense. But if you are working with geospatial data, won't you use 3 dimensions ? – Barrens 26/8, 2015 at 13:11

I want store latitude and longitude – Theo 26/8, 2015 at 13:54

Root should be something like median of search space. – Barrens 26/8, 2015 at 14:36

If i used kd-tree then how can be root as median ? because first point comes that small and other points are large then in that case root is not mean. – Theo 27/8, 2015 at 6:59

And what does your first point (1,-1) represent to you? If you are making location based queries they should be somehow localized. – Barrens 27/8, 2015 at 7:4

Let us continue this discussion in chat. – Theo 27/8, 2015 at 7:4

It depends on how you build the tree.

If built as originally published, the tree will be balanced, i.e. only at the leaf level it will have at most a height difference of 1. If your data set has 2^n-1 elements, the tree will be perfectly balanced.

When constructed with the median, then half of the objects must be on either branch of the tree, thus it has minimal height and is balanced.

However, this tree cannot be changed then. I am not aware of an insert or remove algorithm that would preserve this property, but YMMV. I bet there are two dozens of kd-tree extensions that aim at rebalancing and making insertions/deletions more effective.

The k-d-tree is not designed for changes, and will quickly lose efficiency. It relies on the median, and thus any change to the tree would worst-case propagate through all of the tree. Therefore, you need to allow some tolerance in the tree quality to support changes. It appears to be a common approach to just keep track of insertions/deletions and rebuild the tree eventually. You cannot combine it with red-black-trees or AVL-trees, because data with more than 1 dimension is not ordered; these trees only work for ordered data. Upon rotation of the tree the splitting axis changes; and there may be elements in either half that suddenly would need to move to the other branch. This does not happen in AVL or red-black trees.

But as you can imagine, people have published several indexes that remain balanced. Such as k-d-b-trees, and R-trees. These also work better for large data that needs to be stored on disk.

Noranorah answered 27/8, 2015 at 20:1 Comment(7)

Thanks Anony..But i have no idea why should i have to use kd-b and R tree because my goal is storing just latitude and longitude in node. node is like c structures which contains four fields laitutude,longitude,right pointer and left pointer). – Theo 28/8, 2015 at 5:32

K-d-tree is for euclidean distance, but for lat/long you will want haversine distance... also in r-tree, a leaf page then simply is a list of latitude,longitude pairs, with much fewer pointers. – Noranorah 28/8, 2015 at 5:42

Also they allow updates much better than the k-d-tree. – Noranorah 28/8, 2015 at 5:42

yes i have used haversine distance...can you give how to used r-tree instead of kd-tree and find nearest point fron r -tree? – Theo 28/8, 2015 at 5:54

There was a publication for the R*-tree with haversine somewhere, try Google. For all I know, the k-d-tree does not support other distances than Euclidean. – Noranorah 28/8, 2015 at 6:7

But i have replaced havesine instead of euclidean distance and its work but its not balance tree – Theo 28/8, 2015 at 6:10

If the tree isn't balanced, then you didn't use the median. Also, make sure your results are correct. You do realize that it is not enough to replace the last distance computation? k-d-tree needs a distance-to-subtree function. – Noranorah 28/8, 2015 at 6:13

In order to make your kd-tree balanced use median value.
(14,31), (15,32), (17,42), (16,44), (18,52), (16,62)
In the root choose median of x-cordinates [14,15,16,16,17,18] which is 16,
So all the elements less than 16 goes to left part of the tree and
greater than or equal to goes to right side of tree.
as of now,
left part tree consists of [14,31],[15,32] ,now for y-axis find the median for [31,32] so that the tree is balanced

Perseid answered 28/7, 2018 at 0:13 Comment(0)

Recommended topics

Hot tags