Why use an unordered container? (C++)
Asked Answered
Q

3

14

I have already tried searching for this but haven't found anything.

I am learning about STL containers, and understand the pros and cons of sequential and associative containers, however am not sure why anyone would prefer an unordered container over an associative one, as surely it would not affect element insertion, lookup and removal.

Is it purely a performance thing, i.e it would take more processing to insert / remove to an associative container as it has to go through sorting? I don't know too much about the system side of things but in my head I feel like an unordered container would require more 'upkeep' than one that is automatically organised.

If anyone could shed some light it would be really appreciated.

Quaquaversal answered 25/1, 2013 at 20:17 Comment(6)
Obviously, the one that is automatically organised takes more upkeep: it needs to be reorganised any time it changes. The other one doesn't.Thematic
Read books or try to code something by yourself. With experience understanding will come.Mandelbaum
You're confusing me with your terminology. You contrast "unordered container" from "associative container", but those two things are not mutually exclusive. std::map and std::unordered_map are both associative arrays. Whereas std::set and std::unordered_set are not.Borges
"Associative container" means "lookup by value", and is in contrast to "sequence container", which is "lookup by insertion position".Cooker
Well, I was mistaken. I guess "Associative Container" means something different in C++. It includes std::set and variants. But it also includes the unordered variants, so your terminology is still confusing.Borges
Associative Containers + Unordered Associaitve Containers are different types of containers according to Josuttis' STD Library book. But thanks for clarification on lookup by value and lookup by insertion position!Quaquaversal
C
10

Purely abstractly, consider the fact that an ordering of the elements is an extra "feature" that you have to pay for, so if you don't need it (like in lookup-only dictionary), then you shouldn't have to pay for it.

Technically this means that an unordered container can be implemented with expected lookup and in­ser­tion complexity O(1), rather than the O(log n) of ordered containers, by using hash tables.

On a tangentially related note, though, there is a massive practical advantage when using strings as keys: An ordered container has to perform full string comparison everywhere along the tree walk, while a hash container only performs a single hashing operation (which can even be "optimized" to only sam­ple a fixed number of characters from very long strings), and often turns out to be a lot faster in practice.

If ordering is not a requirement, then the best thing to do is to try out both container types (whose inter­face is almost identical) and compare the performance in your usage profile.

Cooker answered 25/1, 2013 at 20:27 Comment(8)
Your statement about the string comparison seems unintuitive to me. I'd expect std::less<std::string> to be an eager comparison that returns as soon as it finds the first non-matching character. On the contrary, std::hash<std::string> would probably compute the hash over the entire string. So, in cases where you have long strings as keys, but the strings all differ in the first few characters (I know that's a contrived example) a map might fare better than an unordered_map. Also, +1 for the suggestion to actually try it out.Dougall
In the common case, the cost of linked containers dramatically exceeds the cost of the comparisons/hash computations themselves. See Bjarne's talk on C++11 style around minute 45 for a shocking discussion about the magnitude of this effect.Esp
@Dougall the string comparison would be done probably more times than the hash though.Blanks
Sorry for my ignorance, but surely O(1) means that the time/space required is independent of the data set size, so does this mean that there is no iteration through the 'order' (albeit unordered) to find a 'match', because otherwise surely it would take longer as more data is inputted.Quaquaversal
@Dougall Think again. Hash is computed once, and cost of this is basically equal to comparing same string with equal string once. And string has to be compared in its entirety at least once, at the final matching element, if key is in the map at all. So, for nearly all practical applications, hash is faster.Simmon
thank you so much, this has helped me greatly! EDIT: one more question, again sorry for my ignorance but is a map essentially the same as a binary tree? For example is element lookup and insertion basically the same or is it different? i.e does a map still have logarithmic complexity due to binary search?Quaquaversal
@aspiring_programmer: Well, as I said, you have to profile the differences for your usecase. You can imagine that a map with lots of really long strings that only differ in a suffix may suffer more than one with extremely short, completely different strings. By comparison, the hash table performs one hash computation and then jumps straight into the bucket.Cooker
@aspiring_programmer No, hash really is O(1) no matter the hash size. Each item in hash at any given moment of time has been rehashed once on average, so insertion cost per item on average stays at O(1), it does not grow, millionth item has 99.9999% chanse of having insertion cost 1, and 0.0001% chance of requiring rehashing with cost of million, average is 2, ie. constant time. Hash is basically used as table index, and table access is of course O(1) as well.Simmon
V
3

We use the Unordered Container When the ordering of the Objects is not necessary and you care most about performance of objects lookup because the Unordered Container have a fastest search /insert at any place is O(1) rather than the Ordered Container(Associative Container take O(log n)) and Sequence containers take O(n)).

Viscountess answered 30/11, 2016 at 10:31 Comment(0)
N
2

not sure why anyone would prefer an unordered container over an associative one

Those features are not exclusive. A container may be associative, and if it is, separately may also be unordered.

If you are familiar with hash maps, that is the technology being leveraged by unordered containers. The standard library uses the term "unordered" instead of "hash" so as not to impose a specific technology when what is desired is just specific performance promises. (see comment)

Nelan answered 25/1, 2013 at 20:24 Comment(2)
The reason unordered was chosen instead of hash is because many implementations already had non-standard versions that used the word hash.Pacer
@JesseGood, It really is a shame, seeing as how the std:: part goes to waste.Arithmetician

© 2022 - 2024 — McMap. All rights reserved.