Python list intersection efficiency: generator or filter()?

Asked 16/6, 2011 at 9:13 Answered 17/11, 2019 at 4:22

Solved python list intersection python-2.7 intersect

I would like to intersect two lists in Python (2.7). I need the result to be iterable:

list1 = [1,2,3,4]
list2 = [3,4,5,6]
result = (3,4) # any kind of iterable

Providing a full iteration will be performed first thing after the intersection, which of the following is more efficient?

Using a generator:

result = (x for x in list1 if x in list2)

Using filter():

result = filter(lambda x: x in list2, list1)

Other suggestions?

Thanks in advance,
Amnon

Mailand answered 16/6, 2011 at 9:13 Comment(0)

Neither of these. The best way is to use sets.

list1 = [1,2,3,4]
list2 = [3,4,5,6]
result = set(list1).intersection(list2)

Sets are iterable, so no need to convert the result into anything.

Aylward answered 16/6, 2011 at 9:16 Comment(5)

interesting, set(list1).intersection(list2) is faster than set(list1) & set(list2), i guess it's because the creation of two sets is more expensive than loading and calling .intersection() hmm .. – Involution 16/6, 2011 at 9:32

@Involution On my machine, set(list1) & set(list2) is faster than using .intersection(). But the difference is not very significant. – Scriptural 16/2, 2013 at 17:50

does this require the lists to be sorted? – Whichsoever 15/11, 2016 at 9:51

@Whichsoever The lists need not be sorted. Sets are implemented by hashing, so lookup occurs in amortized O(1) time regardless of location in the original list. – Savannahsavant 8/7, 2017 at 5:16

Fast forward 11 years, in my case [x for x in a if x in b] seems for be faster when being executed in multiple loops. Would this be because a conversion from a set to a list would be consuming when being done in a loop? – Pyrochemical 3/9, 2022 at 10:55

Your solution has a complexity of O(m*n), where m and n are the respective lengths of the two lists. You can improve the complexity to O(m+n) using a set for one of the lists:

s = set(list1)
result = [x for x in list2 if x in s]

In cases where speed matters more than readability (that is, almost never), you can also use

result = filter(set(a).__contains__, b)

which is about 20 percent faster than the other solutions on my machine.

Pibgorn answered 16/6, 2011 at 9:16 Comment(0)

I tried to compare the speed of 3 methods of list intersection:

import random

a = [random.randint(0, 1000) for _ in range(1000)]
b = [random.randint(0, 1000) for _ in range(1000)]

Solution 1: list comprehension

Time elapse: 8.95265507698059

import time
start = time.time()
for _ in range(1000):
    result = [x for x in a if x in b]
elapse = time.time() - start
print(elapse)

Solution 2: set

Time elapse: 0.09089064598083496

start = time.time()
for _ in range(1000):
    result = set.intersection(set(a), set(b))
elapse = time.time() - start
print(elapse)

Solution 3: numpy.intersect1d

Time elapse: 0.323300838470459

start = time.time()
for _ in range(1000):
    result = np.intersect1d(a, b)
elapse = time.time() - start
print(elapse)

Conclusion

I think use set.intersection is the fastest way.

Anatol answered 17/11, 2019 at 4:22 Comment(0)

for the case of lists, the most efficient way is to use:

result = set(list1).intersection(list2)

as mentioned, but for numpy arrays, intersection1d function is more efficient:

import numpy as np
result = np.intersection1d(list1, list2)

Especially, when you know that the lists don't have duplicate values, you can use it as:

result = np.intersection1d(list1, list2, assume_unique=True)

Dugger answered 14/7, 2017 at 18:21 Comment(0)

Solution 1: list comprehension

Solution 2: set

Solution 3: numpy.intersect1d

Conclusion

Recommended topics

Hot tags