Complexity of std::unordered_multiset insert
Asked Answered
T

1

6

Why is the worst case complexity of std::unordered_multiset insert is linear? I understand why is that the case for std::unordered_set (you have to check that inserted value is not in the set) but for multiset I don't get it. Am I missing something obvious?

Taber answered 7/4, 2014 at 20:0 Comment(0)
C
5

The worst case complexity for std::unordered_multiset::insert() is linear because:

  • Unordered associative containers that support non-unique keys are said to support equivalent keys. When iterating these containers, elements with equivalent keys are adjacent to one another in iteration, forming equivalent-key groups.
  • Iterator functions require constant amortized time.

For example, consider the case where 5, 13, and 13 are inserted into an unordered_multiset that has 4 buckets, and unordered_multiset::key_eq(5, 13) returns false. In this case, unordered_multiset::hash_function(5) returns different hash codes for both 5 and 13. Despite having different hash codes, these elements may still be inserted into the same bucket. If the hash function for an integer returns the integer itself, and the bucket index is the result of the hash code modulus the number of buckets, then:

  • Element 5 is hashed to 5, and with 4 buckets, it is placed in bucket 1.
  • Element 13 is hashed to 13, and with 4 buckets, it is placed into bucket 1 as well.

While unordered_set::insert() checks to prevent duplicates during insertion, unordered_multiset::insert() identifies where to insert the element for equivalent-key grouping. In the worst case, the bucket contains [5, 13] when inserting the final 13, and upon iterating over all elements, the bucket contains [5, 13, 13]. As iteration over all elements occurs, the complexity is linear in size().

It is worth noting that a rehash may occur during unordered_multiset::insert(), and unordered_multiset::rehash() is specified as having a complexity with an average case linear in size() and the worst case is quadratic. During a rehash, all elements in the original hash table are iterated over and inserted into a new hash table. As iteration has a complexity linear in size(), and as identified above, each insertion has a worse case linear in size(), the resulting worst case is O(size()*size()).

Celisse answered 8/4, 2014 at 3:19 Comment(2)
This seems to be right but quadratic complexity for rehashing is not obvious. Before the rehash your equal keys are grouped, so you don't need to group them again during rehash and hence you dont't need quadratic complexity. Moreover unordered_set doesn't have groups of equal keys so why its rehash needs quadratic complexity is not obvious too. My assumption that this is done to support open addressing but I'm not sure.Taber
@Taber In order to be standard complaint, an implementation's worst case cannot exceed the worst case specified by the standard. The implementation is allowed to have its actual worst case be below the upper bound. The standard specifies complexity requirements on the bucket interface, which become fairly easy to meet with collision chaining and much more difficult to meet with open addressing. Consider reading this answer for more details.Celisse

© 2022 - 2024 — McMap. All rights reserved.