How expensive is comparing two unordered sets for equality?
Asked Answered
S

2

15

Given two std::sets, one can simply iterate through both sets simultaneously and compare the elements, resulting in linear complexity. This doesn't work for std::unordered_sets, because the elements may be stored in any order. So how expensive is a == b for std::unordered_set?

Scrapbook answered 12/4, 2012 at 6:33 Comment(3)
Do you have an efficient way to check set membership (for example are they backed by hashtables)?Fortunato
In the clear, straightforward, easy to comprehend and understand words of the C++ Standard: "Two unordered containers a and b compare equal if a.size() == b.size() and, for every equivalent-key group [Ea1,Ea2) obtained from a.equal_range(Ea1), there exists an equivalent-key group [Eb1,Eb2) obtained from b.equal_range(Ea1), such that distance(Ea1, Ea2) == distance(Eb1, Eb2) and is_permutation(Ea1, Ea2, Eb1) returns true. For unordered_set...the complexity of operator==...is proportional to N in the average case and to N^2 in the worst case, where N is a.size()."Bailsman
Some related C++20 discussion that might be of interest: open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0809r0.pdfAutomaton
A
5

Complexity of operator== and operator!=:

Linear complexity in the average case. N2 in the worst case, where N is the size of the container.

More details in the standard §23.2.5, point 11:

For unordered_set and unordered_map, the complexity of operator== (i.e., the number of calls to the == operator of the value_type, to the predicate returned by key_equal(), and to the hasher returned by hash_function()) is proportional to N in the average case and to N2 in the worst case, where N is a.size().

Accidie answered 12/4, 2012 at 6:41 Comment(0)
F
15

The very worst case is O(n²).

But unordered sets are in fact ordered by hash. So it is possible to compare the hashes (if this fails the sets cannot be equal) and then verify that same hashes (linear) have real same values (O(n²) for different values with the same hash) behind.

In the best case this is O(n).

Normally the complexity tends to O(n) if the hash function is "good" (different objects -> always different hash) and to O(n²) if the hash function is "bad" (everything always has the same hash value)

Fadden answered 12/4, 2012 at 6:43 Comment(2)
"hash function is good (different objects -> always different hash)" -> different hashes can be true even for a terrible hash algorithm (e.g. hash strings of up to 128 characters by returning a 8*128-bit hash value cloned from the string), but mod that into the number of buckets and the result is ugly. When there's no special insight into inputs that facilitates collision avoidance, a good hash function post modding generally has collisions in the ratio of used to unused buckets... which does still result in O(n) averages.Louannlouanna
@TonyDelroy: Thank you for pointing this out! A "good hash" must not only return "different values", but also "well distributed" respect to the buckets (The hash space should be uniform and prime respect to the buckets, just to minimize the effect you mention)Fadden
A
5

Complexity of operator== and operator!=:

Linear complexity in the average case. N2 in the worst case, where N is the size of the container.

More details in the standard §23.2.5, point 11:

For unordered_set and unordered_map, the complexity of operator== (i.e., the number of calls to the == operator of the value_type, to the predicate returned by key_equal(), and to the hasher returned by hash_function()) is proportional to N in the average case and to N2 in the worst case, where N is a.size().

Accidie answered 12/4, 2012 at 6:41 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.