C++ : Running time of next() and prev() in a multiset iterator?
Asked Answered
M

2

15

What is the time complexity of applying the next() and prev() function on an multiset<int>::iterator type object where the corresponding multiset contains the N elements?

I understand that in STL, a multiset is implemented as a balanced binary search tree and hence I expect the time complexity to be O(log N) per operation (in the worst case) in case we just traverse the tree until we find the appropriate value, but I have a hunch that this should be O(1) on average.

But what if the tree is implemented as follows - when inserting element x in the balanced binary search tree, we can also retrieve the the largest number in the tree smaller than x and the smallest number in the tree larger than x in O(log N). Thus in theory, we can have each node in the tree maintain pointer to its next and prev elements so that next() and prev() then run in constant time per query.

Can anybody share some light on what's up?

Misvalue answered 8/9, 2017 at 14:56 Comment(6)
Hmm, good question; I figured O(1) (not even on average, just O(1)), but without standard delving cannot find evidence. And it might take work to keep it O(1) and not O(1) on average.Quechuan
Yes! In my O(1) approach it might end up making the insert and delete operations in the set upto twice as slow since you have to make two lookups to do book-keeping of the pointers before you can insert an element. But that doesn't mean that there isn't a more efficient approach :)Misvalue
I think the standard only directly guarantees constant amortised time complexity for iterator operations (iterator.requirements.general#10). I did not find more precise information for specific container (there are specific complexity for operations on containers, but not on containers' iterators).Integration
@BanachTarski: You don't need to make additional lookups in order to maintain a threaded tree (where each node has next/prev pointers). You can always insert a new node into a doubly-linked list either before or after a given node, and at the end of the BST search algorithm, you always know one of the neighbouring nodes. However, the threading significantly increases the size of a node, making it a space/time tradeoff, and I think implementations typically opt for space.Jellybean
@Yakk while it is possible to implement O(1) worst-case iterators, nobody ever does it that way because of the extra cache footprint.Flavouring
@Jellybean yeah we don't need extra look ups! It does boil down to just the memory/time tradeoff :) Thanks for the insight!Misvalue
V
15

The standard mandates that all operations on iterators run in amortized constant time: http://www.eel.is/c++draft/iterator.requirements#general-10. The basic idea is that each iterator category only defines operations which can be implemented in amortized time.

Iteration is a common thing to do, and if operator++ on an iterator (I guess that's what you mean by next?) was logN, then traversing a container in a loop would be NlogN. The standard makes this impossible; since operator++ is amortized constant, iterating over any data structure in the standard is always O(N).

However, I dug into the implementation of multiset on gcc5.4 to at least have one example. Both set and multiset are implemented in terms of the same underlying structure, _Rb_tree. Delving into that structure a bit, it's nodes not only have left and right node pointers, but also a parent node pointer, and an iterator is just a pointer to a node.

Given a node in a binary search tree that includes a pointer to its parent, it's easy to figure out what the next node in the tree is:

  1. If it has a right child, descend to the right child. Then descend left child as far as you can; that's the next node.
  2. If it does not have a right child, ascend to your parent, and determine whether the original node is the left or right child of the parent. If the node is the left child of the parent, then the parent is the next node. If the node is the right of the parent, the parent was already processed, so you need to apply the same logic recursively between the parent and its grandparent.

This SO question shows the source code with the core logic: What is the definition of _Rb_tree_increment in bits/stl_tree.h? (it's surprisingly hard to find for some reason).

This does not have constant time, in particular in both 1. and 2. we have loops that either descend or ascend and could take at most log(N) time. However, you can easily convince yourself that the amortized time is constant because as you traverse the tree with this algorithm, each node is touched at most 4 times:

  1. Once on the way down to its left child.
  2. Once when it comes back up from the left child and needs to consider itself.
  3. Once when it descends to its right child.
  4. Once when ascending from the right child.

In retrospect I would say this is the fairly-obvious choice. Iteration over the whole data structure is a common operation, so the performance is very important. Adding a third pointer to the node is not a trivial amount of space, but it's not the end of the world either; at most it will bloat the data structure from 3 to 4 words (2 pointers + data, which alignment makes 3 at the minimum, vs 3 pointers + data). If you work with ranges, as opposed to two iterators, an alternative would be to maintain a stack and then you don't need the parent pointer, but this only works if you iterate from the very beginning to the end; it wouldn't allow iteration from an iterator in the middle to the end (which is also an important operation for BST's).

Vanwinkle answered 8/9, 2017 at 15:32 Comment(2)
eel.is/c++draft/iterator.requirements#general-10 specify the complexity requirements for all iterator operations.Integration
@Integration That comment would be improved by extracting the important part; all iterator methods must be constant time (amortized).Quechuan
V
1

I think the next() and prev() will take anywhere between 1 and h where h is the height of the tree which is approx O(log N). If you use next() to walk from beginning to end, N nodes, the iterator should visit the entire tree and that is about 2N (2 because the iterator has to traverse the downwards then upwards through the pointers that link the nodes). Total traversal is not O(N * log N) as some steps are better than other steps. At the very worst, a next() might be from a leaf node to the head node which is h approximately O(log N). But that will only occur twice (once to arrive at begin(), the second time at the right most node of the left tree to the head node). So on average next() and prev() are 2 which is O(1).

Vanderpool answered 10/9, 2017 at 1:12 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.