Find intersection of two nested lists?
Asked Answered
L

21

477

I know how to get an intersection of two flat lists:

b1 = [1,2,3,4,5,9,11,15]
b2 = [4,5,6,7,8]
b3 = [val for val in b1 if val in b2]

or

def intersect(a, b):
    return list(set(a) & set(b))
 
print intersect(b1, b2)

But when I have to find intersection for nested lists then my problems starts:

c1 = [1, 6, 7, 10, 13, 28, 32, 41, 58, 63]
c2 = [[13, 17, 18, 21, 32], [7, 11, 13, 14, 28], [1, 5, 6, 8, 15, 16]]

In the end I would like to receive:

c3 = [[13,32],[7,13,28],[1,6]]

Can you guys give me a hand with this?

Related

Lithosphere answered 13/3, 2009 at 13:42 Comment(2)
What would your intersection be for c1 intersect c2? Do you want to simply find if c1 is in c2? Or do you want to find all elements in c1 that appear anywhere in c2?Flown
Read this and play in the interpreter.Indo
F
179

If you want:

c1 = [1, 6, 7, 10, 13, 28, 32, 41, 58, 63]
c2 = [[13, 17, 18, 21, 32], [7, 11, 13, 14, 28], [1, 5, 6, 8, 15, 16]]
c3 = [[13, 32], [7, 13, 28], [1,6]]

Then here is your solution for Python 2:

c3 = [filter(lambda x: x in c1, sublist) for sublist in c2]

In Python 3 filter returns an iterable instead of list, so you need to wrap filter calls with list():

c3 = [list(filter(lambda x: x in c1, sublist)) for sublist in c2]

Explanation:

The filter part takes each sublist's item and checks to see if it is in the source list c1. The list comprehension is executed for each sublist in c2.

Flown answered 13/3, 2009 at 14:11 Comment(4)
You can use filter(set(c1).__contains__, sublist) for efficiency. btw, the advantage of this solution is that filter() preserves strings and tuples types.Dina
i like this method, but i'm getting blank '' in my resulting listHernardo
I added Python 3 compat here, since I am using this as a dupe target for a dupe of a Python 3 questionAglitter
This reads better IMO with nested comprehensions: c3 = [[x for x in sublist if x in c1] for sublist in c2]Fealty
D
897

You don't need to define intersection. It's already a first-class part of set.

>>> b1 = [1,2,3,4,5,9,11,15]
>>> b2 = [4,5,6,7,8]
>>> set(b1).intersection(b2)
set([4, 5])
Deploy answered 13/3, 2009 at 14:16 Comment(9)
Will this be slower than lambda because of conversion to set?Centro
@S.Lott, anything wrong with set(b1) & set(b2)? IMO its cleaner to use the operator.Glogau
Neat. I didn't know you could use the & on sets. Apparently you can also use ^Davis
Plus, using set will lead to code that's orders of magnitude faster. Here's a sample benchmark®: gist.github.com/andersonvom/4d7e551b4c0418de3160Kirkuk
If it helps : set(b1) & set(b2) = set(b2) & set(b1)Alexina
I'd prefer calling the method over the operator, so it is way more readable, especially if you get into the code. But that preference may be only personally.Mikelmikell
Only works if the result doesn't have to be ordered.Pythagorean
So... this answer does in no way answer the question, right? Because this does now work with nested lists.Speiss
I favor using the method as well. It's explicit, whereas the & operator requires to be familiar with how the set type works. I also use an REPL (IPython) as a copilot to lookup unfamiliar types' API. In this case for example, I would simply assign a set to a variable like s = set() and then query for a listing of available properties with s.<tab><tab>. I can see the intersection() method and ask the REPL for its documentation help(set.intersection) .Overmaster
F
179

If you want:

c1 = [1, 6, 7, 10, 13, 28, 32, 41, 58, 63]
c2 = [[13, 17, 18, 21, 32], [7, 11, 13, 14, 28], [1, 5, 6, 8, 15, 16]]
c3 = [[13, 32], [7, 13, 28], [1,6]]

Then here is your solution for Python 2:

c3 = [filter(lambda x: x in c1, sublist) for sublist in c2]

In Python 3 filter returns an iterable instead of list, so you need to wrap filter calls with list():

c3 = [list(filter(lambda x: x in c1, sublist)) for sublist in c2]

Explanation:

The filter part takes each sublist's item and checks to see if it is in the source list c1. The list comprehension is executed for each sublist in c2.

Flown answered 13/3, 2009 at 14:11 Comment(4)
You can use filter(set(c1).__contains__, sublist) for efficiency. btw, the advantage of this solution is that filter() preserves strings and tuples types.Dina
i like this method, but i'm getting blank '' in my resulting listHernardo
I added Python 3 compat here, since I am using this as a dupe target for a dupe of a Python 3 questionAglitter
This reads better IMO with nested comprehensions: c3 = [[x for x in sublist if x in c1] for sublist in c2]Fealty
N
61

For people just looking to find the intersection of two lists, the Asker provided two methods:

b1 = [1,2,3,4,5,9,11,15]
b2 = [4,5,6,7,8]
b3 = [val for val in b1 if val in b2]

and

def intersect(a, b):
     return list(set(a) & set(b))

print intersect(b1, b2)

But there is a hybrid method that is more efficient, because you only have to do one conversion between list/set, as opposed to three:

b1 = [1,2,3,4,5]
b2 = [3,4,5,6]
s2 = set(b2)
b3 = [val for val in b1 if val in s2]

This will run in O(n), whereas his original method involving list comprehension will run in O(n^2)

Nikianikita answered 13/3, 2009 at 13:42 Comment(3)
As "if val in s2" runs in O(N), the proposed code snippet complexity is also O(n^2)Flourishing
The average case of "val in s2" is O(1) according to wiki.python.org/moin/TimeComplexity#set - thus over n operations the expected time is O(n) (whether the worst-case time is O(n) or O(n^2) depends on whether this average case represents an amortized time or not, but this isn't very important in practice).Raja
The runtime is O(N) not because it is amortized but because set membership is in average O(1) (for example when using hash table), it is big difference,for example because amortized time is guaranteed.Merrile
O
30

The functional approach:

input_list = [[1, 2, 3, 4, 5], [2, 3, 4, 5, 6], [3, 4, 5, 6, 7]]

result = reduce(set.intersection, map(set, input_list))

and it can be applied to the more general case of 1+ lists

Outsell answered 10/9, 2009 at 8:50 Comment(2)
to allow empty input list: set(*input_list[:1]).intersection(*input_list[1:]). Iterator version (it = iter(input_list)): reduce(set.intersection, it, set(next(it, []))). Both version doesn't require to convert all input lists to set. The latter is more memory efficient.Dina
Use from functools import reduce to use it in Python 3. Or better yet, use an explicit for loop.Atbara
D
27

Pure list comprehension version

>>> c1 = [1, 6, 7, 10, 13, 28, 32, 41, 58, 63]
>>> c2 = [[13, 17, 18, 21, 32], [7, 11, 13, 14, 28], [1, 5, 6, 8, 15, 16]]
>>> c1set = frozenset(c1)

Flatten variant:

>>> [n for lst in c2 for n in lst if n in c1set]
[13, 32, 7, 13, 28, 1, 6]

Nested variant:

>>> [[n for n in lst if n in c1set] for lst in c2]
[[13, 32], [7, 13, 28], [1, 6]]
Dina answered 13/3, 2009 at 14:35 Comment(0)
M
22

The & operator takes the intersection of two sets.

{1, 2, 3} & {2, 3, 4}
Out[1]: {2, 3}
Maramarabel answered 16/12, 2016 at 11:9 Comment(3)
Fine, but this topic is for lists!Emee
The result of the intersection of two lists is a set so this answer is perfectly valid.Alf
List can contain duplicate value but sets does not.Higherup
S
14

A pythonic way of taking the intersection of 2 lists is:

[x for x in list1 if x in list2]
Sty answered 27/1, 2017 at 0:18 Comment(1)
This question is about nested lists. Your answer does not answer the question.Dateline
W
8

You should flatten using this code ( taken from http://kogs-www.informatik.uni-hamburg.de/~meine/python_tricks ), the code is untested, but I'm pretty sure it works:


def flatten(x):
    """flatten(sequence) -> list

    Returns a single, flat list which contains all elements retrieved
    from the sequence and all recursively contained sub-sequences
    (iterables).

    Examples:
    >>> [1, 2, [3,4], (5,6)]
    [1, 2, [3, 4], (5, 6)]
    >>> flatten([[[1,2,3], (42,None)], [4,5], [6], 7, MyVector(8,9,10)])
    [1, 2, 3, 42, None, 4, 5, 6, 7, 8, 9, 10]"""

    result = []
    for el in x:
        #if isinstance(el, (list, tuple)):
        if hasattr(el, "__iter__") and not isinstance(el, basestring):
            result.extend(flatten(el))
        else:
            result.append(el)
    return result

After you had flattened the list, you perform the intersection in the usual way:


c1 = [1, 6, 7, 10, 13, 28, 32, 41, 58, 63]
c2 = [[13, 17, 18, 21, 32], [7, 11, 13, 14, 28], [1, 5, 6, 8, 15, 16]]

def intersect(a, b):
     return list(set(a) & set(b))

print intersect(flatten(c1), flatten(c2))

Weatherbeaten answered 13/3, 2009 at 13:47 Comment(1)
That's a nice bit of flattening code Geo, but it doesn't answer the question. The asker specifically expects the result in the form [[13,32],[7,13,28],[1,6]].Bamberger
L
8

Since intersect was defined, a basic list comprehension is enough:

>>> c3 = [intersect(c1, i) for i in c2]
>>> c3
[[32, 13], [28, 13, 7], [1, 6]]

Improvement thanks to S. Lott's remark and TM.'s associated remark:

>>> c3 = [list(set(c1).intersection(i)) for i in c2]
>>> c3
[[32, 13], [28, 13, 7], [1, 6]]
Latoyalatoye answered 21/2, 2012 at 11:38 Comment(0)
G
5

Given:

> c1 = [1, 6, 7, 10, 13, 28, 32, 41, 58, 63]

> c2 = [[13, 17, 18, 21, 32], [7, 11, 13, 14, 28], [1, 5, 6, 8, 15, 16]]

I find the following code works well and maybe more concise if using set operation:

> c3 = [list(set(f)&set(c1)) for f in c2] 

It got:

> [[32, 13], [28, 13, 7], [1, 6]]

If order needed:

> c3 = [sorted(list(set(f)&set(c1))) for f in c2] 

we got:

> [[13, 32], [7, 13, 28], [1, 6]]

By the way, for a more python style, this one is fine too:

> c3 = [ [i for i in set(f) if i in c1] for f in c2]
Govea answered 12/4, 2014 at 8:21 Comment(0)
G
3

Do you consider [1,2] to intersect with [1, [2]]? That is, is it only the numbers you care about, or the list structure as well?

If only the numbers, investigate how to "flatten" the lists, then use the set() method.

Groundhog answered 13/3, 2009 at 13:45 Comment(1)
I'd like to leave the structure of the lists unchanged.Lithosphere
G
3

I don't know if I am late in answering your question. After reading your question I came up with a function intersect() that can work on both list and nested list. I used recursion to define this function, it is very intuitive. Hope it is what you are looking for:

def intersect(a, b):
    result=[]
    for i in b:
        if isinstance(i,list):
            result.append(intersect(a,i))
        else:
            if i in a:
                 result.append(i)
    return result

Example:

>>> c1 = [1, 6, 7, 10, 13, 28, 32, 41, 58, 63]
>>> c2 = [[13, 17, 18, 21, 32], [7, 11, 13, 14, 28], [1, 5, 6, 8, 15, 16]]
>>> print intersect(c1,c2)
[[13, 32], [7, 13, 28], [1, 6]]

>>> b1 = [1,2,3,4,5,9,11,15]
>>> b2 = [4,5,6,7,8]
>>> print intersect(b1,b2)
[4, 5]
Goop answered 30/6, 2012 at 23:23 Comment(0)
A
1

I was also looking for a way to do it, and eventually it ended up like this:

def compareLists(a,b):
    removed = [x for x in a if x not in b]
    added = [x for x in b if x not in a]
    overlap = [x for x in a if x in b]
    return [removed,added,overlap]
Afeard answered 29/11, 2016 at 14:40 Comment(1)
If not using set.intersection then these simple one liners are what I would also do.Tbilisi
I
1

To define intersection that correctly takes into account the cardinality of the elements use Counter:

from collections import Counter

>>> c1 = [1, 2, 2, 3, 4, 4, 4]
>>> c2 = [1, 2, 4, 4, 4, 4, 5]
>>> list((Counter(c1) & Counter(c2)).elements())
[1, 2, 4, 4, 4]
Ieyasu answered 18/5, 2017 at 6:39 Comment(0)
S
0
c1 = [1, 6, 7, 10, 13, 28, 32, 41, 58, 63]

c2 = [[13, 17, 18, 21, 32], [7, 11, 13, 14, 28], [1, 5, 6, 8, 15, 16]]

c3 = [list(set(c2[i]).intersection(set(c1))) for i in xrange(len(c2))]

c3
->[[32, 13], [28, 13, 7], [1, 6]]
Snowinsummer answered 21/11, 2014 at 8:59 Comment(0)
G
0

We can use set methods for this:

c1 = [1, 6, 7, 10, 13, 28, 32, 41, 58, 63]
c2 = [[13, 17, 18, 21, 32], [7, 11, 13, 14, 28], [1, 5, 6, 8, 15, 16]]

   result = [] 
   for li in c2:
       res = set(li) & set(c1)
       result.append(list(res))

   print result
Gastrostomy answered 5/5, 2015 at 12:5 Comment(0)
U
0
# Problem:  Given c1 and c2:
c1 = [1, 6, 7, 10, 13, 28, 32, 41, 58, 63]
c2 = [[13, 17, 18, 21, 32], [7, 11, 13, 14, 28], [1, 5, 6, 8, 15, 16]]
# how do you get c3 to be [[13, 32], [7, 13, 28], [1, 6]] ?

Here's one way to set c3 that doesn't involve sets:

c3 = []
for sublist in c2:
    c3.append([val for val in c1 if val in sublist])

But if you prefer to use just one line, you can do this:

c3 = [[val for val in c1 if val in sublist]  for sublist in c2]

It's a list comprehension inside a list comprehension, which is a little unusual, but I think you shouldn't have too much trouble following it.

Unseasonable answered 24/8, 2017 at 20:19 Comment(0)
G
0
c1 = [1, 6, 7, 10, 13, 28, 32, 41, 58, 63]
c2 = [[13, 17, 18, 21, 32], [7, 11, 13, 14, 28], [1, 5, 6, 8, 15, 16]]
c3 = [list(set(i) & set(c1)) for i in c2]
c3
[[32, 13], [28, 13, 7], [1, 6]]

For me this is very elegant and quick way to to it :)

Gimcrack answered 21/5, 2018 at 9:47 Comment(0)
H
0

flat list can be made through reduce easily.

All you need to use initializer - third argument in the reduce function.

reduce(
   lambda result, _list: result.append(
       list(set(_list)&set(c1)) 
     ) or result, 
   c2, 
   [])

Above code works for both python2 and python3, but you need to import reduce module as from functools import reduce. Refer below link for details.

Hauptmann answered 1/7, 2019 at 22:25 Comment(0)
P
0

Simple way to find difference and intersection between iterables

Use this method if repetition matters

from collections import Counter

def intersection(a, b):
    """
    Find the intersection of two iterables

    >>> intersection((1,2,3), (2,3,4))
    (2, 3)

    >>> intersection((1,2,3,3), (2,3,3,4))
    (2, 3, 3)

    >>> intersection((1,2,3,3), (2,3,4,4))
    (2, 3)

    >>> intersection((1,2,3,3), (2,3,4,4))
    (2, 3)
    """
    return tuple(n for n, count in (Counter(a) & Counter(b)).items() for _ in range(count))

def difference(a, b):
    """
    Find the symmetric difference of two iterables

    >>> difference((1,2,3), (2,3,4))
    (1, 4)

    >>> difference((1,2,3,3), (2,3,4))
    (1, 3, 4)

    >>> difference((1,2,3,3), (2,3,4,4))
    (1, 3, 4, 4)
    """
    diff = lambda x, y: tuple(n for n, count in (Counter(x) - Counter(y)).items() for _ in range(count))
    return diff(a, b) + diff(b, a)
Pitchstone answered 20/3, 2020 at 20:47 Comment(0)
B
-1
from random import *

a = sample(range(0, 1000), 100)
b = sample(range(0, 1000), 100)
print(a)
print(b)
print(set(a).intersection(set(b)))
Baddie answered 11/1, 2022 at 22:30 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.