Segment tree - query complexity

Asked 14/5, 2015 at 11:57 Answered 5/11, 2020 at 22:35

I am having problems with understanding segment tree complexity. It is clear that if you have update function which has to change only one node, its complexity will be log(n). But I have no idea why complexity of query(a,b), where (a,b) is interval that needs to be checked, is log(n). Can anyone provide me with intuitive / formal proof to understand this?

Aleras answered 14/5, 2015 at 11:57 Comment(0)

There are four cases when query the interval (x,y)

FIND(R,x,y) //R is the node
% Case 1
    if R.first = x and R.last = y   
        return {R}
% Case 2
    if y <= R.middle
        return FIND(R.leftChild, x, y) 
% Case 3
    if x >= R.middle + 1 
        return FIND(R.rightChild, x, y) 
% Case 4
    P = FIND(R.leftChild, x, R.middle)
    Q = FIND(R.rightChild, R.middle + 1, y)    
    return P union Q.

Intuitively, first three cases reduce the level of tree height by 1, since the tree has height log n, if only first three cases happen, the running time is O(log n).

For the last case, FIND() divide the problem into two subproblems. However, we assert that this can only happen at most once. After we called FIND(R.leftChild, x, R.middle), we are querying R.leftChild for the interval [x, R.middle]. R.middle is the same as R.leftChild.last. If x > R.leftChild.middle, then it is Case 1; if x <= R.leftChild, then we will call

FIND ( R.leftChild.leftChild, x, R.leftChild.middle );
FIND ( R.leftChild.rightChild, R.leftChild.middle + 1, , R.leftChild.last );

However, the second FIND() returns R.leftChild.rightChild.sum and therefore takes constant time, and the problem will not be separate into two subproblems (strictly speaking, the problem is separated, though one subproblem takes O(1) time to solve).

Since the same analysis holds on the rightChild of R, we conclude that after case4 happens the first time, the running time T(h) (h is the remaining level of the tree) would be

T(h) <= T(h-1) + c (c is a constant)
T(1) = c

which yields:

T(h) <= c * h = O(h) = O(log n) (since h is the height of the tree)

Hence we end the proof.

This is my first time to contribute, hence if there are any problems, please kindly point them out and I would edit my answer.

Tufthunter answered 28/6, 2015 at 5:42 Comment(1)

Another way to look at it is, the recursion would call at most 2 subroutine, and its guaranteed that both of the subroutines have at least one border matching with a node's border (eg, [2,5] being searched in [0,5], '5' border is same). In any scenario if one of the border matches, in the subroutines, for one of the child, both borders would match, which would return in O(1) time as in first case (like [3,5] in the given case). So, in all, the main routine would spun up at most 2 log(n) subroutines, and 2 log(n) (height) amount of O(1) subroutines. – Y 2/12, 2023 at 4:8

A range query using a segment tree basically involves recursing from the root node. You can think of the entire recursion process as a traversal on the segment tree: any time a recursion is needed on a child node, you are visiting that child node in your traversal. So analyzing the complexity of a range query is equivalent to finding the upper bound for the total number of nodes that are visited.

It turns out that at any arbitrary level, there are at most 4 nodes that can be visited. Since the segment tree has a height of log(n) and that at any level there are at most 4 nodes that can be visited, the upper bound is actually 4*log(n). The time complexity is therefore O(log(n)).

Now we can prove this with induction. The base case is at the first level where the root node lies. Since the root node has at most two child nodes, we can only visit at most those two child nodes, which is at most 4 nodes.

Now suppose it is true that at an arbitrary level (say level i) we visit at most 4 nodes. We want to show that we will visit at most 4 nodes at the next level (level i+1) as well. If we had visited only 1 or 2 nodes at level i, it's trivial to show that at level i+1 we will visit at most 4 nodes because each node can have at most 2 child nodes.

So let's focus on the assumption that 3 or 4 nodes were visited at level i, and try to show that at level i+1 we can also have at most 4 visited nodes. Now since the range query is asking for a contiguous range, we know that the 3 or 4 nodes visited at level i can be categorized into 3 partitions of nodes: a leftmost single node whose segment range is only partially covered by the query range, a rightmost single node whose segment range is only partially covered by the query range, and 1 or 2 middle nodes whose segment range is fully covered by the query range. Since the middle nodes have their segment range(s) fully covered by the query range, there would be no recursion at the next level; we just use their precomputed sums. We are left with possible recursions on the leftmost node and the rightmost node at the next level, which is obviously at most 4.

This completes the proof by induction. We have proven that at any level at most 4 nodes are visited. The time complexity for a range query is therefore O(log(n)).

Intelligentsia answered 16/4, 2019 at 8:53 Comment(2)

Two child nodes, which is at most 4 nodes. HOW? Why 4? – Propagandism 28/10, 2019 at 4:53

@KPMG I didn't understand this either - I assume that for the node you're at, you can visit only one child and its children. So parent > child > grandchildren (2 nodes). Each node can only have 2 children, so therefore 4 nodes visited. – Masakomasan 26/5, 2020 at 11:7

-1

An interval of length n can be represented by k nodes where k <= log(n)

We can prove it based on how the binary system works.

Arrogance answered 5/11, 2020 at 22:35 Comment(0)

Recommended topics

Hot tags