How to compute intersection of N sorted sets?
Asked Answered
S

2

7

The example below shows how to compute the intersection of two sets. Does the STL provide tools that allow to do this not only for 2 but for N sets?

#include <iostream>    
#include <algorithm>
#include <vector>

int main()
{
    std::vector<int> v1 = { 1,2,9,3,4,5 };
    std::vector<int> v2 = { 9,4,2,7,4,1 };
    std::vector<int> v(v1.size() + v2.size());
    std::vector<int>::iterator it;

    std::sort(v1.begin(), v1.end());
    std::sort(v2.begin(), v2.end());
    it = std::set_intersection(v1.begin(), v1.end(), v2.begin(), v2.end(), v.begin());

    v.resize(it - v.begin());
    std::cout << "Both sets have " << (v.size()) << " elements in common:\n";
    for (it = v.begin(); it != v.end(); ++it)
    {
        std::cout << *it << ' ';
    }
    std::cout << '\n';

    return 0;
}
Solidarity answered 14/8, 2020 at 7:53 Comment(1)
Intersection is associative, so to find intersection of v1, v2, v3, you can just call std::set_intersection twice: result = (v1 * v2) * v3Fouquet
O
8

Does the STL provide tools that allow to do this not only for 2 but for N sets?

No. But you can easily make one, by providing a recursive Variadic template like as follows.

The if constexpr part need support. However, there are many examples, how you could do it for prior to c++17. In addition, due to recursive call the argument must be passed opposite ordered, to get the behavior you were trying.

(See Online Demo)

#include <vector>
#include <algorithm>  // std::set_intersection
#include <iterator>   // std::back_inserter

template<typename Container, typename... Rest>
Container NSetIntersections(
    const Container& container1, const Container& container2, Rest&&... rest) noexcept
{
    if constexpr (sizeof...(Rest) == 0)
    {
        Container result;
        std::set_intersection(container1.begin(), container1.end(),
            container2.begin(), container2.end(), std::back_inserter(result));
        return result;
    }
    else
    {
        Container result;
        std::set_intersection(container1.begin(), container1.end(),
            container2.begin(), container2.end(), std::back_inserter(result));
        return NSetIntersections(result, std::forward<Rest>(rest)...);
    }
}

int main()
{
    // sorted vectors
    std::vector<int> v1 = { 1, 2, 3, 4, 5, 6 };
    std::vector<int> v2 = { 2, 3, 4, 7, 8, 9 };
    std::vector<int> v3 = { 3, 4, 7, 200 };
    std::vector<int> v4 = { 4, 100, 200, 300 };
    std::vector<int> v5 = { 4, 100, 200 };
    // call the function like
    const auto res1 = NSetIntersections(v2, v1);              // 2 3 4
    const auto res2 = NSetIntersections(v3, v2, v1);          // 3 4
    const auto res3 = NSetIntersections(v4, v3, v2, v1);      // 4
    const auto res4 = NSetIntersections(v5, v4, v3, v2, v1);  // 4
    return 0;
}

In order to pass the arguments to the NSetIntersections function in natural way, I would suggest following helper functions manner. As a plus, it will also handle the case of passing single arguments (in case, by mistake!) to the NSetIntersections, and compatible.

(See Online Demo)

#include <vector>
#include <algorithm>  // std::set_intersection
#include <iterator>   // std::back_inserter

namespace helper { // helper NSetIntersections functions
    template<typename Container>
    Container NSetIntersections(const Container& container1) noexcept {
        return container1;
    }

    template<typename Container>
    Container NSetIntersections(const Container& container1, const Container& container2) noexcept
    {
        Container result;
        std::set_intersection(container1.begin(), container1.end(),
            container2.begin(), container2.end(), std::back_inserter(result));
        return result;
    }

    template<typename Container, typename... Rest>
    Container NSetIntersections(
        const Container& container1, const Container& container2, Rest&&... rest) noexcept
    {
        return helper::NSetIntersections(
            helper::NSetIntersections(container1, container2), std::forward<Rest>(rest)...);
    }
}

template<typename... Containers>
auto NSetIntersections(Containers&&... rest) noexcept
  -> decltype(helper::NSetIntersections(std::forward<Containers>(rest)...))
{
    return helper::NSetIntersections(std::forward<Containers>(rest)...);
}

Now you could call the function with args like this:

// sorted vectors
std::vector<int> v1 = { 1, 2, 3, 4, 5, 6 };
std::vector<int> v2 = { 2, 3, 4, 7, 8, 9 };
std::vector<int> v3 = { 3, 4, 7, 200 };
std::vector<int> v4 = { 4, 100, 200, 300 };
std::vector<int> v5 = { 4, 100, 200 };
// call the function like
const auto res1 = NSetIntersections(v1);                 // 1 2 3 4 5 6 
const auto res2 = NSetIntersections(v1, v2);             // 2 3 4 
const auto res3 = NSetIntersections(v1, v2, v3);         // 3 4 
const auto res4 = NSetIntersections(v1, v2, v3, v4);     // 4
const auto res5 = NSetIntersections(v1, v2, v3, v4, v5); // 4

Side note: The bench mark done in quick-bench.com shows (almost) same performance (for 5 sorted containers), when we would have done N times std::set_intersection.

enter image description here

(See Online Quick-bench)

Overhappy answered 14/8, 2020 at 8:23 Comment(8)
Great approach! How can I port that to c++11?Solidarity
How does the performance of this compare to an n way intersection?Merkle
@Dani "n way intersection" Did you mean this? quick-bench.com/q/kT3NFFdkFQGUMEOSO1u5vAMzlew . If yes, look like the same!Overhappy
This has quite bad complexity. O(nk) where k is the number of arrays with n elements. You can do it in O(n log k).Fabozzi
@Fabozzi Can you provide relavant references? I think, the answer is correct in the sence, it does the same as when we do set_intersection for N number of arrays. Without calling set_intersection N times or do we have any better/ efficient ways?Carrasco
@Carrasco A common technique is to use a min-heap, see here.Fabozzi
@Fabozzi You have DVed for the wrong reason(IMHO). Please read the question. The question was not about the complexity, rather a solution for intersections of N sorted containers. And this answer provides it. If you have better thoughts, feel free to provide it as an answer!Overhappy
@Fabozzi BTW if you really had to comment about the complexity, you should have at least consider which one of the given answers was had less effiency.Overhappy
A
2

You could put all the vectors you want an intersection of in another vector and then create a function that will loop through them all and calculate the intersection of v1 and v2 and compare their intersection to v3 and then compare their intersection to v4 etc...

Here is a function that does it for you.

using V = std::vector<std::vector<int>>;

std::vector<int> intersections(V vectors)
{
   int largest = vectors[0].size();
   for (int i = 0; i < vectors.size(); i++)
   {
      std::sort(vectors[i].begin(), vectors[i].end());
      if (vectors[i].size() > largest)
         largest = vectors[i].size();
   }

   std::vector<int> res(largest);
   std::vector<int>::iterator it;
   for (int i = 0; i < vectors.size() - 1; i++)
   {
      it = std::set_intersection(vectors[i].begin(), vectors[i].end(),
         vectors[i + 1].begin(), vectors[i + 1].end(),
         res.begin()
      );
      res.resize(it - res.begin());
      vectors[i + 1].resize(res.size());
      std::copy(res.begin(), res.end(), vectors[i + 1].begin());
   }

   return res;
}

Note: I have only done some very basic testing but it should work.

And this is how you call it

std::vector<int> v1 = { 1,2,9,3,5 };
std::vector<int> v2 = { 9,4,2,7,4,1 };
std::vector<int> v3 = { 4,2,7 };
V vectors = { v1,v2, v3 };

auto res = intersections(vectors);
for (int i = 0; i < res.size(); i++)
   std::cout << res[i] << std::endl;
Ambitious answered 14/8, 2020 at 8:25 Comment(1)
Your idea is good. But it is much worst performant than having N times std::set_intersections for N+1 containers. See an online benchmark did at quick-bench.comOverhappy

© 2022 - 2024 — McMap. All rights reserved.