Python: Generating a "set of tuples" from a "list of tuples" that doesn't take order into consideration

Asked 2/12, 2015 at 20:17 Answered 4/11, 2021 at 21:18

If I have the list of tuples as the following:

[('a', 'b'), ('c', 'd'), ('a', 'b'), ('b', 'a')]

I would like to remove duplicate tuples (duplicate in terms of both content and order of items inside) so that the output would be:

[('a', 'b'), ('c', 'd')]

[('b', 'a'), ('c', 'd')]

I tried converting it to set then to list but the output would maintain both ('b', 'a') and ('a', 'b') in the resulting set!

Ultan answered 2/12, 2015 at 20:17 Comment(3)

... put all the items in a set, then create a list from the set entries - this has probably been asked before – Niobic 2/12, 2015 at 20:19

well your problem is that ('a','b') != ('b','a') or more specifically, hash(('a','b')) != hash(('b','a')). Do an initial list comprehension that sorts each tuple then convert to a set – Cheese 2/12, 2015 at 20:23

also, you should probably reword your question since "duplicate in terms of both content and order of items inside" implies that you would want ('a','b') and ('b','a') to not be equal.... – Cheese 2/12, 2015 at 20:24

Try this :

a = [('a', 'b'), ('c', 'd'), ('a', 'b'), ('b', 'a')]
b = list(set([ tuple(sorted(t)) for t in a ]))
[('a', 'b'), ('c', 'd')]

Let's break this down :

If you sort a tuple, it becomes a sorted list.

>>> t = ('b', 'a')
>>> sorted(t)
['a', 'b']

For each tuple t in a, sort it and convert it back to a tuple.

>>> b = [ tuple(sorted(t)) for t in a ]
>>> b
[('a', 'b'), ('c', 'd'), ('a', 'b'), ('a', 'b')]

Convert the resulting list b to a set : values are now unique. Convert it back to a list.

>>> list(set(b))
[('a', 'b'), ('c', 'd')]

Et voilà !

Note that you can skip the creation of the intermediate list b by using a generator instead of a list comprehension.

>>> list(set(tuple(sorted(t)) for t in a))
[('a', 'b'), ('c', 'd')]

Motivity answered 2/12, 2015 at 21:28 Comment(5)

Good, except that you don't need to actually create the intermediate list set([ tuple(sorted(t)) for t in a ]). set(tuple(sorted(t)) for t in a) instead. – Disprove 2/12, 2015 at 21:56

True, this is not an OP requirement. But he expects an output like [('a', 'b'), ('c', 'd')], so I figured it has to be a list. – Motivity 2/12, 2015 at 22:4

The final form has to be a list; the intermediate list comprehension is unnecessary. – Disprove 2/12, 2015 at 22:9

Ok, I see what you mean. You're right. I'll just keep my answer as is, since it's easier to break it down. :) – Motivity 2/12, 2015 at 22:11

This doesn't work in the case: [('b','a'), ('c','d'), ('b','a')]. The output should be [('b','a'), ('c','d')] but your code outputs [('a','b'), ('c','d')]. The order within tuple shouldn't matter while comparing with other tuples -- but output should contain the original tuples -- not modified tuples. – Ernaline 20/11, 2021 at 5:42

If you did not mind using a frozenset with a set:

l = [('a', 'b'), ('c', 'd'), ('a', 'b'), ('b', 'a')]

print(set(map(frozenset,l)))
{frozenset({'a', 'b'}), frozenset({'c', 'd'})}

You can convert back to tuple if preferable:

l = [('a', 'b'), ('c', 'd'), ('a', 'b'), ('b', 'a')]

print(list(map(tuple,set(map(frozenset ,l)))))
[('a', 'b'), ('d', 'c')]

Or using a set and reversing the order of the tuples:

l = [('a', 'b'), ('c', 'd'), ('a', 'b'), ('b', 'a')]

seen, pairs = set(), []
for a,b in l:
    if (a,b) not in seen and (b,a) not in seen:
        pairs.append((a,b))
    seen.add((a,b))

Buller answered 2/12, 2015 at 20:24 Comment(0)

Try this :

a = [('a', 'b'), ('c', 'd'), ('a', 'b'), ('b', 'a')]
b = list(set([ tuple(sorted(t)) for t in a ]))
[('a', 'b'), ('c', 'd')]

Let's break this down :

If you sort a tuple, it becomes a sorted list.

>>> t = ('b', 'a')
>>> sorted(t)
['a', 'b']

For each tuple t in a, sort it and convert it back to a tuple.

>>> b = [ tuple(sorted(t)) for t in a ]
>>> b
[('a', 'b'), ('c', 'd'), ('a', 'b'), ('a', 'b')]

Convert the resulting list b to a set : values are now unique. Convert it back to a list.

>>> list(set(b))
[('a', 'b'), ('c', 'd')]

Et voilà !

Note that you can skip the creation of the intermediate list b by using a generator instead of a list comprehension.

>>> list(set(tuple(sorted(t)) for t in a))
[('a', 'b'), ('c', 'd')]

Motivity answered 2/12, 2015 at 21:28 Comment(5)

Good, except that you don't need to actually create the intermediate list set([ tuple(sorted(t)) for t in a ]). set(tuple(sorted(t)) for t in a) instead. – Disprove 2/12, 2015 at 21:56

True, this is not an OP requirement. But he expects an output like [('a', 'b'), ('c', 'd')], so I figured it has to be a list. – Motivity 2/12, 2015 at 22:4

The final form has to be a list; the intermediate list comprehension is unnecessary. – Disprove 2/12, 2015 at 22:9

Ok, I see what you mean. You're right. I'll just keep my answer as is, since it's easier to break it down. :) – Motivity 2/12, 2015 at 22:11

This can solve your problem if order is not important.

a=[('a', 'b'), ('c', 'd'), ('a', 'b'), ('b', 'a')]


a=map(tuple,[sorted(i) for i in a])

print list(set(a))

Output:

[('a', 'b'), ('c', 'd')]

Sweetie answered 2/12, 2015 at 20:24 Comment(3)

I like this solution because it doesn't make any assumption about the length or original internal order of the tuples. – Disprove 2/12, 2015 at 20:48

@PadraicCunningham, actually previous one was okay. lol. I just preferred your's one because it liked the shorthand, but i didn't notice it won't sort the list element unfortunately, rather it sorts the list itself. – Sweetie 2/12, 2015 at 20:48

yep meant a = map(tuple, map(sorted, l)), if it were python2 then imap would be better – Buller 2/12, 2015 at 20:55

Just wanted to add a potential second solution if anyone has a use case where "first come, first serve" might matter.

For example, say we take three lists and merge them into a list of tuples:

# Make some lists (must be same size) 
a = [1,1,1,2,8,6,1]
b = [2,4,6,1,4,21,69]
c = [2,8,21,2,1,1,8]

# Lists to list of tuples
arr = []
for i in range(len(a)):
    new_row = (a[i],b[i],c[i])
    arr.append(new_row)

Such that our original array looks like:

(1, 2, 2)
(1, 4, 8)
(1, 6, 21)
(2, 1, 2)
(8, 4, 1)
(6, 21, 1)
(1, 69, 8)

In our case, we want to remove items like (2,1,2) and (8,4,1) as they're equivalent to (1,2,2) and (1,4,8) respectively.
To do this, we can use a new empty list called filtered or something, and itertools.permutations() on each tuple in the original array.
First, we check if any permutation of each item is present in the filtered list.
If not, we add. If it is, we skip the duplicate.

filtered = []
for i in range(len(arr)):
    it = itertools.permutations(arr[i])
    perms = []
    for p in it:
        perms.append(p)  
    check = any(item in perms for item in filtered)
    if not check: 
        filtered.append(arr[i])

Now if we iterate over filtered and print, we see our truncated list of tuples:

(1, 2, 2)
(1, 4, 8)
(1, 6, 21)
(1, 69, 8)

Note that we're left with the first instance of each tuple of numbers, and not working via a set guarantees the same order of elements when iterating over the filtered list.

Only thing I'm not 100% on is the time/space complexity of doing it this way -- if anyone has feedback I'd love to hear about it.

Tollman answered 4/11, 2021 at 21:18 Comment(0)

-1

Built-in types to the rescue:

data = [('a', 'b'), ('c', 'd'), ('a', 'b'), ('b', 'a')]
set(map(frozenset, data))
{frozenset({'a', 'b'}), frozenset({'c', 'd'})}

Demilitarize answered 3/12, 2015 at 9:38 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags