Is there any advantage of using map over unordered_map in case of trivial keys?
Asked Answered
B

15

539

A recent talk about unordered_map in C++ made me realize that I should use unordered_map for most cases where I used map before, because of the efficiency of lookup ( amortized O(1) vs. O(log n) ). Most times I use a map, I use either int or std::string as the key type; hence, I've got no problems with the definition of the hash function. The more I thought about it, the more I came to realize that I can't find any reason of using a std::map over a std::unordered_map in the case of keys with simple types -- I took a look at the interfaces, and didn't find any significant differences that would impact my code.

Hence the question: is there any real reason to use std::map over std::unordered_map in the case of simple types like int and std::string?

I'm asking from a strictly programming point of view -- I know that it's not fully considered standard, and that it may pose problems with porting.

Also, I expect that one of the correct answers might be "it's more efficient for smaller sets of data" because of a smaller overhead (is that true?) -- hence I'd like to restrict the question to cases where the amount of keys is non-trivial (>1 024).

Edit: duh, I forgot the obvious (thanks GMan!) -- yes, maps are ordered of course -- I know that, and am looking for other reasons.

Barouche answered 4/2, 2010 at 2:37 Comment(6)
I like asking this question in interviews: "When is quick-sort better than bubble-sort?" The answer to the question provides insight into the practical application of complexity theory and not just plain black and white statements such as O(1) is better than O(n) or O(k) is equivalent to O(logn) etc....Sturges
@Beh, I think you meant "when is bubble-sort better than quick-sort" :PBarouche
Would a smart pointer be a trivial key?Zachary
Here is one of the cases in which map is the advantageous one: #51964919Carrolcarroll
@Matthieu N. In your place, using this kind of question which will hardly ever be useful and which unnecessarily embarrasses a lot of candidates, I would rather be embarrassed :/Aesthetic
FWIW, in response to @user6547518, it does seem like a rather stupid interview question. If I had any need to implement a sorting algorithm from scratch, I'd consider the particular needs/constraints in my situation, and revisit the descriptions of sorting algorithms before implementing it. 35 years ago, before the internet and standard implementations in the languages I use, I probably knew the difference between those two, but I no longer feel the need to retain information like that.Vincenzovincible
R
542

Don't forget that map keeps its elements ordered. If you can't give that up, obviously you can't use unordered_map.

Something else to keep in mind is that unordered_map generally uses more memory. map just has a few house-keeping pointers, and memory for each object. Contrarily, unordered_map has a big array (these can get quite big in some implementations), and then additional memory for each object. If you need to be memory-aware, map should prove better, because it lacks the large array.

So, if you need pure lookup-retrieval, I'd say unordered_map is the way to go. But there are always trade-offs, and if you can't afford them, then you can't use it.

Just from personal experience, I found an enormous improvement in performance (measured, of course) when using unordered_map instead of map in a main entity look-up table.

On the other hand, I found it was much slower at repeatedly inserting and removing elements. It's great for a relatively static collection of elements, but if you're doing tons of insertions and deletions the hashing + bucketing seems to add up. (Note, this was over many iterations.)

Roehm answered 4/2, 2010 at 2:43 Comment(12)
+1: yes, forgot the obvious ordered property :), and the memory tip is something I wasn't aware of -- thanksBarouche
One more thing about the large(r) memory block property of unordered_map vs. map (or vector vs list) , the default process heap (talking Windows here) is serialized. Allocating (small) blocks in large quantities in a multithreaded application is very expensive.Rattray
RA: You can somewhat control that with your own allocator type combined with any container, if you think it matters for any particular program.Equivalency
If you know the size of the unordered_map and reserve that at the start - do you still pay a penalty of many insertions? Say, you're only inserting once when you built the lookup table - and then later only read from it.Zachary
@Zachary As far as I can tell, there should be no penalty in terms of performance. The reason performance takes a hit is due to the fact that if the array grows too large, it will do a rehash of all the elements. If you call reserve, it will potentially rehash the existing elements but if you call it at the start, then there should be no penalty, at least according to cplusplus.com/reference/unordered_map/unordered_map/reserveIslas
for(auto it = map.begin(); it!=map.end(); ++it) cout<<it->first<<endl; will print out the key in orderDeidradeidre
I'm quite sure that memory-wise it is the opposite. Assuming the default 1.0 load factor for an unordered container: you have one pointer per element for the bucket and one pointer per element for the next-element-in-bucket, therefore you end up with two pointers plus data per each element. For an ordered container, on the other hand, a typical RB-tree implementation will have: three pointers (left/right/parent) plus a color bit which due to alignment takes a forth word. That is four pointers plus data per each element.Tomas
@ybungalobill It's possible to embed the color bit in one of the pointers. This is what is actually done in the linux kernel: they embed the color bit in the parent pointer. I would the STL implementation to do the same :)Fults
@Rerito: Linux kernel doesn't use std::map. And typical std::map impementations don't do that (I had never seen one that does). There is a reason for that: it trades memory for performance, and you cannot do that generically if you don't know the alignment guarantees and the (templated) pointer type (remember that std containers work with allocators?). Now, even if you squeezed that color bit somewhere else, or used a self balancing binary tree that doesn't need it (like scapegoat trees), you still remain with 2 pointers vs 3 pointers in favor of unordered_map.Tomas
@ybungalobill It was just a remark about the possibility of compressing the color bit. I'll take your word on how std::map is implementedFults
The part about higher memory consumption seems to be true only for very small (single digit) map sizes. For larger maps, unordered_map is more memory-efficient. Sources: jsteemann.github.io/blog/2016/06/14/…, chromium.googlesource.com/chromium/src/+/refs/heads/main/base/…Lumen
On the last paragraph, this should be true up to sizes N where the O(log(N)) is still faster than the O(1) of the unordered_map. In my experiments with set and unordered_set holding integers (and involving many insertions) that point was about N=250K after which the unordered_set starts pulling away fast (8 times faster for N=10M). For other types of keys that breakeven point may never be reached, so I guess it depends on the key type / hash function.Beckwith
C
162

If you want to compare the speed of your std::map and std::unordered_map implementations, you could use Google's sparsehash project which has a time_hash_map program to time them. For example, with gcc 4.4.2 on an x86_64 Linux system

$ ./time_hash_map
TR1 UNORDERED_MAP (4 byte objects, 10000000 iterations):
map_grow              126.1 ns  (27427396 hashes, 40000000 copies)  290.9 MB
map_predict/grow       67.4 ns  (10000000 hashes, 40000000 copies)  232.8 MB
map_replace            22.3 ns  (37427396 hashes, 40000000 copies)
map_fetch              16.3 ns  (37427396 hashes, 40000000 copies)
map_fetch_empty         9.8 ns  (10000000 hashes,        0 copies)
map_remove             49.1 ns  (37427396 hashes, 40000000 copies)
map_toggle             86.1 ns  (20000000 hashes, 40000000 copies)

STANDARD MAP (4 byte objects, 10000000 iterations):
map_grow              225.3 ns  (       0 hashes, 20000000 copies)  462.4 MB
map_predict/grow      225.1 ns  (       0 hashes, 20000000 copies)  462.6 MB
map_replace           151.2 ns  (       0 hashes, 20000000 copies)
map_fetch             156.0 ns  (       0 hashes, 20000000 copies)
map_fetch_empty         1.4 ns  (       0 hashes,        0 copies)
map_remove            141.0 ns  (       0 hashes, 20000000 copies)
map_toggle             67.3 ns  (       0 hashes, 20000000 copies)
Celebration answered 22/10, 2010 at 18:38 Comment(5)
It looks like unordered map beats the map on most of the operations.Event on insertion...Houri
sparsehash doesn't exist anymore. it has been deleted or taken down.Mahmud
@Mahmud I've edited the question to refer to a waybackmachine link.Durian
Just to ensure that others notice the other numbers besides the time as well: Those tests were done with 4 byte objects/datastructures aka an int. If you store something that requires heavier hashing or is larger (making the copy operations heavier), the standard map might quickly have an advantage!Beet
That's just in relation to the key type, yes? So potentially really long strings as a key type could make unordered_map much slower?Bewray
R
100

I'd echo roughly the same point GMan made: depending on the type of use, std::map can be (and often is) faster than std::tr1::unordered_map (using the implementation included in VS 2008 SP1).

There are a few complicating factors to keep in mind. For example, in std::map, you're comparing keys, which means you only ever look at enough of the beginning of a key to distinguish between the right and left sub-branches of the tree. In my experience, nearly the only time you look at an entire key is if you're using something like int that you can compare in a single instruction. With a more typical key type like std::string, you often compare only a few characters or so.

A decent hash function, by contrast, always looks at the entire key. IOW, even if the table lookup is constant complexity, the hash itself has roughly linear complexity (though on the length of the key, not the number of items). With long strings as keys, an std::map might finish a search before an unordered_map would even start its search.

Second, while there are several methods of resizing hash tables, most of them are pretty slow -- to the point that unless lookups are considerably more frequent than insertions and deletions, std::map will often be faster than std::unordered_map.

Of course, as I mentioned in the comment on your previous question, you can also use a table of trees. This has both advantages and disadvantages. On one hand, it limits the worst case to that of a tree. It also allows fast insertion and deletion, because (at least when I've done it) I've used a fixed-size of table. Eliminating all table resizing allows you to keep your hash table a lot simpler and typically faster.

One other point: the requirements for hashing and tree-based maps are different. Hashing obviously requires a hash function, and an equality comparison, where ordered maps require a less-than comparison. Of course the hybrid I mentioned requires both. Of course, for the common case of using a string as the key, this isn't really a problem, but some types of keys suit ordering better than hashing (or vice versa).

Row answered 4/2, 2010 at 5:15 Comment(3)
Hash resizing can be dampen down by dynamic hashing technics, which consist in having a transition period where each time you insert an item, you also rehash k other items. Of course, it means that during the transition you have to search 2 different tables...Debt
"With long strings as keys, an std::map might finish a search before an unordered_map would even start its search." -- if the key is not present in the collection. If it is present then of course the full length needs to be compared to confirm the match. But likewise unordered_map needs to confirm a hash match with a full comparison, so it all depends what parts of the lookup process you're contrasting.Shulamith
you can usually replace the hash function based on knowledge of the data. for example if your long strings vary more in the last 20 bytes than in the first 100, just hash the last 20.Kept
A
72

I was intrigued by the answer from @Jerry Coffin, which suggested that the ordered map would exhibit performance increases on long strings, after some experimentation (which can be downloaded from pastebin), I've found that this seems to hold true only for collections of random strings, when the map is initialised with a sorted dictionary (which contain words with considerable amounts of prefix-overlap), this rule breaks down, presumably because of the increased tree depth necessary to retrieve value. The results are shown below, the 1st number column is insert time, 2nd is fetch time.

g++ -g -O3 --std=c++0x   -c -o stdtests.o stdtests.cpp
g++ -o stdtests stdtests.o
gmurphy@interloper:HashTests$ ./stdtests
# 1st number column is insert time, 2nd is fetch time
 ** Integer Keys ** 
 unordered:      137      15
   ordered:      168      81
 ** Random String Keys ** 
 unordered:       55      50
   ordered:       33      31
 ** Real Words Keys ** 
 unordered:      278      76
   ordered:      516     298
Actual answered 20/9, 2012 at 11:56 Comment(1)
Thanks for the test. To make sure we're not measuring noise I changed it to do each operation many times (and inserted the counter instead of 1 into the map). I ran it over a different number of keys (from 2 to 1000) and up to ~100 keys in the map, std::map typically outperforms std::unordered_map, especially for integer keys but ~100 keys it seems it loses its edge and std::unordered_map starts to win. Inserting an already ordered sequence into a std::map is very bad, you'll get its worst case scenario (O(N)).Accidental
D
46

Significant differences that have not really been adequately mentioned here:

  • map keeps iterators to all elements stable, in C++17 you can even move elements from one map to the other without invalidating iterators to them (and if properly implemented without any potential allocation).
  • map timings for single operations are typically more consistent since they never need large allocations.
  • unordered_map using std::hash as implemented in the libstdc++ is vulnerable to DoS if fed with untrusted input (it uses MurmurHash2 with a constant seed - not that seeding would really help, see https://emboss.github.io/blog/2012/12/14/breaking-murmur-hash-flooding-dos-reloaded/).
  • Being ordered enables efficient range searches, e.g. iterate over all elements with key ≥ 42.
Dominations answered 29/12, 2016 at 16:57 Comment(0)
H
33

Summary

Assuming ordering is not important:

  • If you are going to build large table once and do lots of queries, use std::unordered_map
  • If you are going to build small table (may be under 100 elements) and do lots of queries, use std::map. This is because reads on it are O(log n).
  • If you are going to change table a lot then may be std::map is good option.
  • If you are in doubt, just use std::unordered_map.

Historical Context

In most languages, unordered map (aka hash based dictionaries) are the default map however in C++ you get ordered map as default map. How did that happen? Some people erroneously assume that C++ committee made this decision in their unique wisdom but the truth is unfortunately uglier than that.

It is widely believed that C++ ended up with ordered map as default because there are not too many parameters on how they can be implemented. On the other hand, hash based implementations has tons of things to talk about. So to avoid gridlocks in standardization they just got along with ordered map. Around 2005, many languages already had good implementations of hash based implementation and so it was more easier for the committee to accept new std::unordered_map. In a perfect world, std::map would have been unordered and we would have std::ordered_map as separate type.

Performance

Below two graphs should speak for themselves (source):

enter image description here

enter image description here

Hengel answered 22/8, 2018 at 7:30 Comment(3)
Interesting data; how many platforms did you include in your tests?Mickimickie
why should I use std::map for small table when doing lots of queries since std::unordered_map always performs better than std::map according to the 2 images you posted here?Shabbygenteel
Graph shows performance for 0.13M or more elements. If you have small (may be <100) elements then O(log n) might become smaller than unordered map.Hengel
D
32

I would just point out that... there are many kind of unordered_maps.

Look up the Wikipedia Article on hash map. Depending on which implementation was used, the characteristics in term of look-up, insertion and deletion might vary quite significantly.

And that's what worries me the most with the addition of unordered_map to the STL: they will have to choose a particular implementation as I doubt they'll go down the Policy road, and so we will be stuck with an implementation for the average use and nothing for the other cases...

For example some hash maps have linear rehashing, where instead of rehashing the whole hash map at once, a portion is rehash at each insertion, which helps amortizing the cost.

Another example: some hash maps use a simple list of nodes for a bucket, others use a map, others don't use nodes but find the nearest slot and lastly some will use a list of nodes but reorder it so that the last accessed element is at the front (like a caching thing).

So at the moment I tend to prefer the std::map or perhaps a loki::AssocVector (for frozen data sets).

Don't get me wrong, I'd like to use the std::unordered_map and I may in the future, but it's difficult to "trust" the portability of such a container when you think of all the ways of implementing it and the various performances that result of this.

Debt answered 4/2, 2010 at 7:59 Comment(1)
+1: valid point -- life was easier when I was using my own implementation -- at least I knew where it sucked :>Barouche
R
32

I think the question is partially answered, as no information was provided about the performance with "int" types as keys. I did my own analysis and I find out that std::map can outperform (in speed) std::unordered_map in many practical situations when using integers as keys.

Integer Test

The test scenario consisted in populating maps with sequential and random keys, and with strings values with lengths in the range [17:119] in multiples of 17. Tests where performed with elements count in the range [10:100000000] in powers of 10.

Labels:

Map64: std::map<uint64_t,std::string>
Map32: std::map<uint32_t,std::string>
uMap64: std::unordered_map<uint64_t,std::string>
uMap32: std::unordered_map<uint32_t,std::string>

Insertion

Labels:

Sequencial Key Insert: maps were constructed with keys in the range [0-ElementCount]
Random Key Insert: maps were constructed with random keys in the full range of the type

sequencial-key-insert random-key-insert

Conclusion on insertion:

  • Insertion of spread keys in std::map tends to outperform std::unordered_map when map size is under 10000 elements.
  • Insertion of dense keys in std::map doesn't present performance difference with std::unordered_map under 1000 elements.
  • In all other situations std::unordered_map tends to perform faster.

Look up

Labels:

Sequential Key - Seq. Search: Search is performed in the dense map (keys are sequential). All searched keys exists in the map.
Random Key - Rand. Search: Search is performed in the sparse map (keys are random). All searched keys exists in the map.

(label names can be miss leading, sorry about that)

sequential_key random_key

Conclusion on look up:

  • Search on spread std::map tends to slightly outperform std::unordered_map when map size is under 1000000 elements.
  • Search on dense std::map outperforms std::unordered_map

Failed Look up

Labels:

Sequential Key - Rand. Search: Search is performed in the dense map. Most keys do not exists in the map.
Random Key - Seq. Search: Search is performed in the sparse map. Most keys do not exists in the map.

(label names can be miss leading, sorry about that)

sequential_key_rs random_key_ss

Conclusion on failed look up:

  • Search miss is a big impact in std::map.

General conclusion

Even when speed is needed std::map for integer keys can still be a better option in many situations. As a practical example, I have a dictionary where look-ups never fail and although the keys have a sparse distribution it will be perform at worse at the same speed as std::unordered_map, as my element count is under 1K. And the memory footprint is significantly lower.

String Test

For reference I present here the timings for string[string] maps. Key strings are formed from a random uint64_t value, Value strings are the same used in the other tests.

Labels:

MapString: std::map<std::string,std::string>
uMapString: std::unordered_map<std::string,std::string>

string_string_maps

Evaluation Platform

OS: Linux - OpenSuse Tumbleweed

Compiler: g++ (SUSE Linux) 11.2.1 20210816

CPU: Intel(R) Core(TM) i9-9900 CPU @ 3.10GHz

RAM: 64Gb

Reims answered 8/10, 2021 at 3:25 Comment(2)
Performance for low map sizes (under 10) would also be relevant.Lamentation
Why does this answer have so low votes? (I think it should have more than the question itself because it answers other very relevant questions too in the simplest and most powerful manner)Baht
M
24

Reasons have been given in other answers; here is another.

std::map (balanced binary tree) operations are amortized O(log n) and worst case O(log n). std::unordered_map (hash table) operations are amortized O(1) and worst case O(n).

How this plays out in practice is that the hash table "hiccups" every once in a while with an O(n) operation, which may or may not be something your application can tolerate. If it can't tolerate it, you'd prefer std::map over std::unordered_map.

Monocoque answered 5/10, 2016 at 3:2 Comment(0)
E
15

Hash tables have higher constant factors in the time complexity than common map implementations, which become significant for small containers. Max size is 10, 100, or maybe even 1,000 or more? Constant factors are the same as ever, but O(log n) is close to O(1). (Remember logarithmic complexity is still really good.)

What makes a good hash function depends on your data's characteristics; so if I don't plan on looking at a custom hash function (but can certainly change my mind later, and easily since I typedef damn near everything) and even though defaults are chosen to perform decently for many data sources, I find the ordered nature of map to be enough of a help initially that I still default to map rather than a hash table in that case.

Plus that way you don't have to even think about writing a hash function for other (usually UDT) types, and just write op< (which you want anyway).

Equivalency answered 4/2, 2010 at 2:52 Comment(3)
@Roger, do you know the approximate amount of elements at which unordered_map bests map? I'll probably write a test for it though, anyway... (+1)Barouche
@Kornel: It doesn't take very many; my tests were with about 10,000 elements. If we want a really accurate graph, you could look at an implementation of map and one of unordered_map, with certain platform and certain cache size, and do a complex analysis. :PRoehm
Depends on implementation details, compile-time tuning parameters (easy to support if you're writing your own implementation), and even the specific machine used for the tests. Just like for the other containers, the committee only sets the broad requirements.Equivalency
D
10

I've made a test recently which makes 50000 merge&sort. That means if the string keys are the same, merge the byte string. And the final output should be sorted. So this includes a look up for every insertion.

For the map implementation, it takes 200 ms to finish the job. For the unordered_map + map, it takes 70 ms for unordered_map insertion and 80 ms for map insertion. So the hybrid implementation is 50 ms faster.

We should think twice before we use the map. If you only need the data to be sorted in the final result of your program, a hybrid solution may be better.

Drachma answered 11/3, 2013 at 3:32 Comment(1)
IIUC, what you mentioned test is a batch insertion test. Unordered_map is used first to solve multiple merge entries, and then sort it by map (so for the same key, map insert is called once). However, If we want map to be ready constantly, and the data entry is once a while (not in a batch), the situation would be different: you need all insertion for map anyway: no time savings.Stepsister
A
5

Small addition to all of above:

Better use map, when you need to get elements by range, as they are sorted and you can just iterate over them from one boundary to another.

Analeptic answered 28/8, 2018 at 20:20 Comment(0)
D
4

if you compile project with Visual Studio 2010 - forget about unordered_map for strings. If you use more modern Studio like 2017 - then unordered_map much faster than ordered map.

Disadvantaged answered 17/3, 2021 at 15:32 Comment(0)
C
4

By using std::unordered_map you declare that nowhere in your code you rely on the map being ordered. In some cases this additional context information may help to understand how this map is actually used in the program. Clarity may be more important with performance coming as a side effect.

Of course, no compiler will stop you from using unordered map when you need the ordered one, but this is so unlikely to work well that the reader could probably rely on not being just a mistake.

Crescen answered 23/11, 2021 at 13:47 Comment(0)
R
-1

From: http://www.cplusplus.com/reference/map/map/

"Internally, the elements in a map are always sorted by its key following a specific strict weak ordering criterion indicated by its internal comparison object (of type Compare).

map containers are generally slower than unordered_map containers to access individual elements by their key, but they allow the direct iteration on subsets based on their order."

Roup answered 3/9, 2016 at 16:14 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.