Why STL unordered_map and unordered_set cannot be sorted by STL algorithms?
Asked Answered
C

3

5

I'll start by illustrating a simple use case example:

  • Consider the problem of a social security ID database, where in C++ code is modelled as a std::unordered_map where its key is the social security ID of a person and its value is a std::string with the full-name of that person (e.g., std::unordered_map<int, std::string> DB;).

  • Consider also, that there's a request for printing this database sorted in ascending order based on the person's ID (i.e., std::unordered_map's key).

  • Naively, one would think to use std::sort in order to sort the std::unordered_map according to the requested criteria and then print it, like the example code below:


   std::sort(DB.begin(), DB.end());
   for(auto p : DB) std::cout << "ID(" << p.first
                              << ") - " 
                              << p.second 
                              << std::endl;

  • However, this is not the case, because use of std::sort with a range of either a std::unordered_map or a std::unordered_set will raise a compiler error.

Questions:

  1. Why STL's unordered containers cannot be sorted by std::sort?
  2. Is there a legitimate and efficient way to sort either a std::unordered_map or a std::unordered_set?
Checkerboard answered 13/6, 2014 at 19:9 Comment(3)
If it doesn't make sense to add two togeather, do not store things as integers. Adding two SSIDs togeather makes no sense, store them as strings or something.Thrasher
It only makes sense to use mutating algorithms like that on sequential containers, of which unordered_* are not.Thrasher
1) sort has no clue what container the iterators are from, the error is because sort requires RandomAccessIterators while (unordered_)map|set iterators are BiDirectionalIterators. 2) There's a reason those containers are named unordered _map|set. If you want ordering use a map|setEros
L
6

unordered containers store internally hashed data and thus it's not possible to order them after the hash has been generated.

In order to sort the data you can use an additional non-hashed container (e.g. map or set) and either use them along with the unordered version (so you can use the normal one to sort the data and the unordered one to have fast per-item access) or you can do something like

std::map<int, int> ordered(unordered.begin(), unordered.end());
for(auto it = ordered.begin(); it != ordered.end(); ++it)
     std::cout << it->second;

I recommend not to do the above often (unordered containers have slow sequential access)

https://mcmap.net/q/542163/-sorting-std-unordered_map-by-key

Limbert answered 13/6, 2014 at 19:13 Comment(0)
M
5

Sorting only makes sense for sequence containers, which are containers whose elements are determined by the order in which they were added to the container. The dynamic sequence containers in the standard library are vector, deque, list and forward_list.

Maps and sets, on the other hand, are associative containers, in which elements are identified by their value. Thus it makes no sense to ask for an "ordering", since the container elements aren't arranged in any kind of sequence. (It's true that an ordered map can be iterated in a comparison order on the key, but that order emerges from the container; it is not provided by the user.)

Minion answered 13/6, 2014 at 19:38 Comment(0)
D
2

1.Why STL's unordered containers cannot be sorted by std::sort?

Because unordered containers are already "sorted", albeit not directly by their keys, but by (typically) hash_function(key) % bucket_count() (also accessible as bucket(key)). This "sort" order isn't cosmetic - it's the whole basis on which hash tables are able to find elements quickly. If std::sort were allowed to re-order the elements by key instead, then the container would no longer be able to function as a hash table: elements couldn't be reliably found or erased, insertions might put duplicates in the container etc..

2.Is there a legitimate and efficient way to sort either a std::unordered_map or a std::unordered_set?

In the general case, only by first copying the elements to a sortable or sorted container such as std::vector or std::set (the former will usually be faster, but benchmark both if you really care):

std::unordered_set<T> my_set = ...;
std::vector<T> my_vec{my_set.begin(), my_set.end(), my_set.size()};
std::sort(my_vec.begin(), my_vec.end());

In your case with std::unordered_map<int, std::string> DB;, I'd suggest copying only the int keys to a vector for sorting, then during iteration look up each key in the unordered_map: that will avoid a lot of string copying.

(It is sometimes possible to orchestrate an unordered container with ordering by key (e.g. hash function returns key, container presized so max bucket index >= max key value) but anyone considering such abuse would be better off using a vector.)

Dendroid answered 22/6, 2015 at 3:56 Comment(1)
This should be the answer, as it contains more complete information.Kristalkristan

© 2022 - 2024 — McMap. All rights reserved.