The intersection of all combinations of n sets
Asked Answered
S

3

9

I need help finding an efficient algorithm to solve this problem:

Given n unsorted sets of integers, find all possible combinations of n and their intersections. For example:

Input (n=3):

Set 1 = 1, 10, 6, 11, 14, 3
Set 2 = 3, 7, 11, 9, 5
Set 3 = 11, 6, 9, 1, 4

Output:

Set 1 & 2: 3, 11
Set 1 & 3: 1, 6, 11
Set 2 & 3: 9, 11
Set 1, 2, & 3: 11

I was thinking of first finding all possible combinations of sets and then using an algorithm to find the intersection of n sets found here: Intersection of n sets. I'm worried about the time efficiency of this approach, though.

If you can find something better than my naive approach, an answer in pseudo-code would be most helpful.

Sokotra answered 24/9, 2014 at 23:27 Comment(0)
T
26

Here is a solution inspired by MapReduce: Simplified Data Processing on Large Clusters, which can therefore be written in a distributed way if you want.

Map all of your elements in sets to pairs [element, set]. Group this list by element. You can just sort and pull out elements. Or you can create a hash of arrays whose keys are the elements and values are the list of sets that element is found in. Then for each element, emit a list of [combination of sets, element]. Group that by combination. And now for each non-empty combination, you can just read off the elements in it.

If you wish to distribute the computation with a real map-reduce, the first map would map to a key of element, and value of set. The first reduce would just store by element the list of sets it is in. The second map would emit for each element one key for each combination of sets it is in, with the element as the value. And the second reduce would store your final answer.

It may help to work your example in detail. You start with:

Set 1 = 1, 10, 6, 11, 14, 3
Set 2 = 3, 7, 11, 9, 5
Set 3 = 11, 6, 9, 1, 4

You map that to the list:

[1, Set 1], [10, Set 1], [6, Set 1], [11, Set 1], [14, Set 1], [3, Set 1],
[3, Set 2], [7, Set 2], [11, Set 2], [9, Set 2], [5, Set 2],
[11, Set 3], [6, Set 3], [9, Set 3], [1, Set 3], [4, Set 3],

Now group by element (I did it by hand by sorting) to get:

1: Set 1, Set 3
3: Set 1, Set 2
4: Set 3
5: Set 2
6: Set 1, Set 3
7: Set 2
9: Set 2, Set 3
10: Set 1
11: Set 1, Set 2, Set 3
14: Set 1

And now we do the second mapping (skipping the elements that are only in one set) to get:

[(Set 1, Set 3), 1],
[(Set 1, Set 2), 3],
[(Set 1, Set 3), 6],
[(Set 2, Set 3), 9],
[(Set 1, Set 2), 11],
[(Set 1, Set 3), 11],
[(Set 2, Set 3), 11],
[(Set 1, Set 2, Set 3), 11]

Group that by combination of sets and we get:

(Set 1, Set 2): 3, 11
(Set 1, Set 3): 1, 6, 11
(Set 2, Set 3): 9, 11
(Set 1, Set 2, Set 3): 11
Toluidine answered 25/9, 2014 at 0:59 Comment(0)
A
1

Using your example, just note that

1 n 2 n 3

is also

(1 n 2) n 3

So if you cache your (1 n 2) solution, you can reuse it, so that computing 1 n 2 n 3 will only require one additional intersection calculation. Generally, if there are, as in your example, four possible set combinations, then only four intersections will ultimately have to be done if you save and reuse prior results.

Anamorphosis answered 24/9, 2014 at 23:45 Comment(0)
H
0

Implementation in Java 10

Short: first collect a map of intersections for each element Map<Integer,Set<Integer>>, then for this map collect a map of intersections of intersections Map<Set<Integer>,Set<Integer>>, and then append the larger intersections sets to the smaller intersections sets if they intersect.

// List<Set<Integer>>
var setList = List.of(
        Set.of(1, 10, 6, 11, 14, 3),
        Set.of(3, 7, 11, 9, 5),
        Set.of(11, 6, 9, 1, 4));
// TreeMap<Integer,List<Map.Entry<TreeSet<Integer>,TreeSet<Integer>>>>
var map = IntStream
        // iterate over indices of
        // the list elements, i.e. sets
        .range(0, setList.size())
        // for each set iterate over its elements
        // and map pairs 'element=set'
        .mapToObj(i -> setList.get(i).stream()
                // key - element, value - index
                // of the set starting from '1'
                .map(e -> Map.entry(e, i + 1)))
        // Stream<Map.Entry<Integer,Integer>>
        .flatMap(Function.identity())
        // group indices of the sets by their elements,
        // i.e. accumulate the intersections
        .collect(Collectors.toMap(
                // key - element
                Map.Entry::getKey,
                // value - set of the indices of the sets
                e -> new TreeSet<>(List.of(e.getValue())),
                // accumulate the indices of the sets
                (e1, e2) -> {
                    e1.addAll(e2);
                    return e1;
                }))
        // Stream<Map.Entry<Integer,TreeSet<Integer>>>
        .entrySet().stream()
        // filter out unique elements
        // without intersections
        .filter(e -> e.getValue().size() > 1)
        // intermediate output
        //1=[1, 3]
        //3=[1, 2]
        //6=[1, 3]
        //9=[2, 3]
        //11=[1, 2, 3]
        .peek(System.out::println)
        // group the elements of the
        // sets by their intersections
        .collect(Collectors.toMap(
                // key - set of the indices of the sets
                Map.Entry::getValue,
                // value - set of the intersecting elements
                e -> new TreeSet<>(Set.of(e.getKey())),
                // accumulate the intersecting elements
                (e1, e2) -> {
                    e1.addAll(e2);
                    return e1;
                }))
        // Stream<Map.Entry<TreeSet<Integer>,TreeSet<Integer>>>
        .entrySet().stream()
        // intermediate output
        //[1, 2]=[3]
        //[1, 3]=[1, 6]
        //[2, 3]=[9]
        //[1, 2, 3]=[11]
        .peek(System.out::println)
        // group by the number of intersections
        .collect(Collectors.groupingBy(
                // size of the set of indices
                e -> e.getKey().size(),
                // sort by number of intersections in reverse order
                () -> new TreeMap<>(Comparator.<Integer>reverseOrder()),
                // list of map entries
                Collectors.toList()));
// intermediate output
map.forEach((k, v) -> System.out.println(k + "=" + v));
//3=[[1, 2, 3]=[11]]
//2=[[1, 2]=[3], [1, 3]=[1, 6], [2, 3]=[9]]
// process the lists of intersections, i.e. map entries
map.forEach((key, value) -> value
        // for each entry process the values of other
        // entries with less number of intersections
        .forEach(entry -> map.tailMap(key, false)
                // Stream<List<Map.Entry<TreeSet<Integer>,TreeSet<Integer>>>>
                .values().stream()
                // Stream<Map.Entry<TreeSet<Integer>,TreeSet<Integer>>>
                .flatMap(List::stream)
                // if the intersection set of the current entry contains
                // all intersections from the set of another entry
                .filter(other -> entry.getKey().containsAll(other.getKey()))
                // then add all intersecting elements of
                // the current entry to another entry
                .forEach(other -> other.getValue().addAll(entry.getValue()))));

// final output
map.forEach((k, v) -> v.forEach(entry -> System.out.println(
        "Sets: " + entry.getKey() + " contains values: " + entry.getValue())));

//Sets: [1, 2, 3] contains values: [11]
//Sets: [1, 2] contains values: [3, 11]
//Sets: [1, 3] contains values: [1, 6, 11]
//Sets: [2, 3] contains values: [9, 11]
Heterologous answered 5/2, 2021 at 17:40 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.