Why doesn't unordered_set provide an array access operator

Asked 10/4, 2015 at 1:56 Answered 10/4, 2015 at 2:59

I'm curious as to why the STL container unordered_set, which has constant time complexity for random access on average, does not provide a method for accessing elements by some distance from the first element in the container. For example:

T& unordered_set::operator[](size_t index)
{
    return *(begin() + index);
}

Dowery answered 10/4, 2015 at 1:56 Comment(1)

I don't see how this fits into an unordered container. Sets don't have indices. – Sigh 10/4, 2015 at 1:58

Accessing an element "by some distance" implies that there is some meaningful way to measure that distance. The trouble with std::unordered_set is that is is, well, unordered. Hence, there is no meaningful way of explaining "some distance from the beginning" in a non-arbitrary way.

If you want to access by distance, copy the data into a vector:

std::vector tmp(unordered.begin(), unordered.end());

Deranged answered 10/4, 2015 at 2:0 Comment(0)

unordered_set is implemented as a hash table, in which there are randomly-accessible "buckets" each of which effectively contains a linked list of 0 or more elements. So, a snapshot of an unordered_set storing the numbers 1 through 7 might happen to look something like this (the exact positioning of elements depends on the hash function used, so this is just illustrative):

buckets    linked-list of elements
[0]        1 --> 5 --> nullptr
[1]        nullptr
[2]        4 --> nullptr
[3]        nullptr
[4]        nullptr
[5]        7 --> nullptr
[6]        6 --> 3 --> 2 --> nullptr
[7]        nullptr

As you can see, there's no easy way to advance n elements... you basically have to follow linked lists, skipping to the next bucket when you find a nullptr. That's why the begin() operation can't return a Random Access Iterator with O(1) times for + n movement (it only provides a Forward Iterator)....

So when you ask...

unordered_set, which has constant time complexity for random access on average

...I think you're confusing random access by key with random access by index. You can find any given key in O(1) amortised constant time, but finding the nth element is O(n).

(Note: the C++11 Standard does not leave implementations the freedom to choose closed hashing (aka open addressing) to implement unordered_set... that's evident from the max_load_factor after construction being required to be 1.0, and the rule that during insert/emplace iterator invalidation can only happen when the max_load_factor is exceeded.)

Tupiguarani answered 10/4, 2015 at 2:59 Comment(0)

unordered_set is unordered by definition, so accessing it by index isn't terribly useful. The index of any particular element may change any time you make an insertion.

Also, according to this reference, the iterator for an unordered_set is a forward iterator, not random access.

Hartmann answered 10/4, 2015 at 2:0 Comment(0)

In order to provide constant amortized time for item membership checking, which is the advantage of an unordered set, for the general case of some arbitrary item type it has to be implemented as a hash table.

It's not difficult to ensure, in constant time, that every key (item) in a hash table is also referred to by a node of a linked list. This provides a means to iterate over the items in the table, in unspecified order. However, going to the i-th item in a linked list is linear time.

There is a compromise solution where as each item is added to the hash table, it's also added to a sorted tree. This requires the items to be comparable, and it increases the add and remove complexity to logarithmic (keeping constant time for checking). But while this supports access of the i-th item in logarithmic time, which item is the i-th one will vary, and there is really no big demand for this functionality.

The clincher is that C++11 requires O(1) average time for inserts in an unordered container, which is incompatible with the ordered tree.

So, since direct indexing is impractical (linear time) and not in demand, it's not offered, but you can always use *std::next( s.begin(), i ) as a linear time alternative to a hypothetical s[i]. In principle you can optimize that for a set that doesn't change, by copying it to a std::vector. But in most cases it will be better to use iterators.

Blouson answered 10/4, 2015 at 2:32 Comment(0)

Recommended topics

Hot tags