I have a collection of std::set
. I want to find the intersection of all the sets in this collection, in the fastest manner. The number of sets in the collection is typically very small (~5-10), and the number of elements in each set is is usually less than 1000, but can occasionally go upto around 10000. But I need to do these intersections tens of thousands of time, as fast as possible. I tried to benchmark a few methods as follows:
- In-place intersection in a
std::set
object which initially copies the first set. Then for subsequent sets, it iterates over all element of itself and the ith set of the collection, and removes items from itself as needed. - Using
std::set_intersection
into a temporarystd::set
, swap contents to a current set, then again find intersection of the current set with the next set and insert into the temp set, and so on. - Manually iterate over all the elements of all sets like in 1), but using a
vector
as the destination container instead ofstd::set
. - Same as in 4, but using a
std::list
instead of avector
, suspecting alist
will provide faster deletions from the middle. - Using hash sets (
std::unordered_set
) and checking for all items in all sets.
As it turned out, using a vector
is marginally faster when the number of elements in each set is small, and list
is marginally faster for larger sets. In-place using set
is a substantially slower than both, followed by set_intersection
and hash sets. Is there a faster algorithm/datastructure/tricks to achieve this? I can post code snippets if required. Thanks!
std::unordered_map
and count the number of occurrences of each elements. It's O(N) in the total number of elements. Then, you just pick the elements that have a total equal to the number of sets, O(M) in the number of distinct elements. No idea how well it would perform. – Arnesonstd::list
due to hashing and other overheads. Thanks! – Orthoclaseunordered_set
). – Orthoclase