Best way to find the intersection of multiple sets?

Asked 29/3, 2010 at 22:44 Answered 13/11, 2020 at 15:17

400

I have a list of sets:

setlist = [s1,s2,s3...]

I want s1 ∩ s2 ∩ s3 ...

I can write a function to do it by performing a series of pairwise s1.intersection(s2), etc.

Is there a recommended, better, or built-in way?

Rao answered 29/3, 2010 at 22:44 Comment(0)

659

From Python version 2.6 on you can use multiple arguments to set.intersection(), like

u = set.intersection(s1, s2, s3)

If the sets are in a list, this translates to:

u = set.intersection(*setlist)

where *a_list is list expansion

Note that set.intersection is not a static method, but this uses the functional notation to apply intersection of the first set with the rest of the list. So if the argument list is empty this will fail.

Bowdlerize answered 29/3, 2010 at 22:55 Comment(7)

So what to do when there are possibly zero arguments? In one line? – Lek 8/7, 2020 at 13:18

@RadioControlled For a one liner that works when setlist is empty, use u = set.intersection(*setlist) if setlist else set() – Haemophilia 13/7, 2020 at 6:24

any comment on complexity of the sol. given above ? – Hilarius 27/9, 2020 at 4:26

@CKM, exactly, do we need to order the sets in setlist beforehand by size or does the function do this for us? This would be contradicted by the statement "apply intersection of the first set with the rest of the list". – Lek 22/10, 2020 at 14:1

@RadioControlled The intersection of no sets is not mathematically defined, so this should fail. See Patrick Suppes' "Axiomatic Set Theory" for a reference. – Drexler 6/12, 2020 at 5:46

math.stackexchange.com/questions/483002/… – Lek 7/12, 2020 at 10:30

Please explain your use of the non-static set.intersection() in more detail — I don't understand how you are able to use it this way. – Olomouc 7/12, 2020 at 17:49

101

As of 2.6, set.intersection takes arbitrarily many iterables.

>>> s1 = set([1, 2, 3])
>>> s2 = set([2, 3, 4])
>>> s3 = set([2, 4, 6])
>>> s1 & s2 & s3
set([2])
>>> s1.intersection(s2, s3)
set([2])
>>> sets = [s1, s2, s3]
>>> set.intersection(*sets)
set([2])

Angulo answered 29/3, 2010 at 22:58 Comment(1)

No, it cannot take zero iterables. – Lek 8/7, 2020 at 13:17

Clearly set.intersection is what you want here, but in case you ever need a generalisation of "take the sum of all these", "take the product of all these", "take the xor of all these", what you are looking for is the reduce function:

from operator import and_
from functools import reduce
print(reduce(and_, [{1,2,3},{2,3,4},{3,4,5}])) # = {3}

print(reduce((lambda x,y: x&y), [{1,2,3},{2,3,4},{3,4,5}])) # = {3}

Necropsy answered 29/2, 2012 at 1:18 Comment(1)

Here, I would be quite certain the order of list matters for speed. Order by increasing size -- or decreasing expected intersection size of neighboring sets in the list, to be more accurate. – Lek 22/10, 2020 at 14:6

If you don't have Python 2.6 or higher, the alternative is to write an explicit for loop:

def set_list_intersection(set_list):
  if not set_list:
    return set()
  result = set_list[0]
  for s in set_list[1:]:
    result &= s
  return result

set_list = [set([1, 2]), set([1, 3]), set([1, 4])]
print set_list_intersection(set_list)
# Output: set([1])

You can also use reduce:

set_list = [set([1, 2]), set([1, 3]), set([1, 4])]
print reduce(lambda s1, s2: s1 & s2, set_list)
# Output: set([1])

However, many Python programmers dislike it, including Guido himself:

About 12 years ago, Python aquired lambda, reduce(), filter() and map(), courtesy of (I believe) a Lisp hacker who missed them and submitted working patches. But, despite of the PR value, I think these features should be cut from Python 3000.

So now reduce(). This is actually the one I've always hated most, because, apart from a few examples involving + or *, almost every time I see a reduce() call with a non-trivial function argument, I need to grab pen and paper to diagram what's actually being fed into that function before I understand what the reduce() is supposed to do. So in my mind, the applicability of reduce() is pretty much limited to associative operators, and in all other cases it's better to write out the accumulation loop explicitly.

Francis answered 29/3, 2010 at 22:53 Comment(4)

Note that Guido says using reduce is "limited to associative operators", which is applicable in this case. reduce is very often hard to figure out, but for & isn't so bad. – Angulo 29/3, 2010 at 23:21

set_list and reduce(set.intersection, set_list) – Caughey 5/12, 2012 at 6:51

Check out python.org/doc/essays/list2str for useful optimizations involving reduce. It can in general be used quite nicely to build lists, sets, strings etc. Worth a look also is github.com/EntilZha/PyFunctional – Varioloid 16/11, 2016 at 6:3

Note you could optimize by breaking off your loop when result is empty. – Liver 25/8, 2017 at 13:52

I believe the simplest thing to do is:

#assuming three sets
set1 = {1,2,3,4,5}
set2 = {2,3,8,9}
set3 = {2,10,11,12}

#intersection
set4 = set1 & set2 & set3

set4 will be the intersection of set1 , set2, set3 and will contain the value 2.

print(set4)

set([2])

Makassar answered 13/11, 2020 at 15:17 Comment(1)

OP is asking to apply intersection to a list. The effort to spell out each element of the list with an & operator is futile at best. – Deroo 1/2, 2022 at 6:49

Here I'm offering a generic function for multiple set intersection trying to take advantage of the best method available:

def multiple_set_intersection(*sets):
    """Return multiple set intersection."""
    try:
        return set.intersection(*sets)
    except TypeError: # this is Python < 2.6 or no arguments
        pass

    try: a_set= sets[0]
    except IndexError: # no arguments
        return set() # return empty set

    return reduce(a_set.intersection, sets[1:])

Guido might dislike reduce, but I'm kind of fond of it :)

Dziggetai answered 31/3, 2010 at 22:50 Comment(4)

You should check the length of sets instead of trying to access sets[0] and catching the IndexError. – Liver 25/8, 2017 at 13:51

This isn't a plain check; a_set is used at the final return. – Dziggetai 26/8, 2017 at 17:53

Can’t you do return reduce(sets[0], sets[1:]) if sets else set()? – Liver 28/8, 2017 at 8:5

Ha yes, thank you. The code should change because relying on a try/except should be avoided if you can. It’s a code smell, is inefficient, and can hide other problems. – Liver 30/8, 2017 at 9:24

Jean-François Fabre set.intesection(*list_of_sets) answer is definetly the most Pyhtonic and is rightly the accepted answer.

For those that want to use reduce, the following will also work:

reduce(set.intersection, list_of_sets)

Cahoot answered 1/5, 2020 at 11:50 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags