multiset, map and hash map complexity
Asked Answered
C

2

82

I would like to know the complexity in Big O notation of the STL multiset, map and hash map classes when:

  • inserting entries
  • accessing entries
  • retrieving entries
  • comparing entries
Clausius answered 21/10, 2008 at 17:2 Comment(1)
this is actually my post and I cannot understand why i appear inactive and thus cannot change it...Clausius
A
119

map, set, multimap, and multiset

These are implemented using a red-black tree, a type of balanced binary search tree. They have the following asymptotic run times:

Insertion: O(log n)
Lookup: O(log n)
Deletion: O(log n)

hash_map, hash_set, hash_multimap, and hash_multiset

These are implemented using hash tables. They have the following runtimes:

Insertion: O(1) expected, O(n) worst case
Lookup: O(1) expected, O(n) worst case
Deletion: O(1) expected, O(n) worst case

If you use a proper hash function, you'll almost never see the worst case behavior, but it is something to keep in mind — see Denial of Service via Algorithmic Complexity Attacks by Crosby and Wallach for an example of that.

Academician answered 21/10, 2008 at 17:8 Comment(8)
Does all what you say on hash_* refer to C++11 unordered and Boost.Unordered containers?Godfather
The hash_* class templates are part of Silicon Graphics STL. These were incorporated into the C++11 revision under unordered_* names (unordered_map, unordered_set, etc.) Also, they have been included into libstdc++, Visual C++, and Boost C++ libraries.Bigotry
@CEOatApartico: Fixed the dead linkAcademician
I don't understand the "expected O and worst-case O". Big-O is by definition "worst case".Muriel
@PauliusLiekis You don't know what you are talking about. Big-O is, by definition, "upper bound", which has nothing to do with worst case, avg. case, best case.Danie
To explain it: The Landau notation describes growth of a function. If you discern between cases, like in the hashmap, you deal with different complexity functions, and each of them have their own growth and limiting.Danie
@Danie yes, I admit - I do not understand. It's actually bothering me :) But I do not understand your explanation either :/ let me rephrase the question: let's say we have std::hash_map<std::string, T> - I can construct such an object where all keys will live in the same bucket, thus finding entries will take O(N) or O(log N) depending on implementation. So how can one claim that finding entries is O(1)? I honestly want to understan.Muriel
I see the rationale as this. Given a hash map with appropriate hash function and size, you expect the bucket size to not grow with n and obtain the average case of O(1). Picking an appropriate hash function is the developer's responsibility. To guarantee appropriate size (load factor), rehashing may be triggered on insertion and we obtain worst case O(n)+O(1) = O(n). Landau symbols do not cover such a distinction between two different algorithms! So all you have left is to specify two different measures for average and worst case, and both may use O(), o(), Ө(), etc.Danie
F
0

For set, multiset, map, multimap the time complexity for insertion, deletion and retrieving information is O(logn) as they follow the balance binary tree to structure the data.

For unordered_set and unordered_map, hashing method is utilize to structure the data. So the time complexity for this is O(1). This is perfect way to perform any operation on information in case your prerequisite is not to have data in sorted order.

Feathering answered 20/8, 2021 at 17:47 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.