If you only want to remove reversed pairs and don't want external libraries you could use a simple generator function (loosly based on the itertools
"unique_everseen" recipe):
def remove_reversed_duplicates(iterable):
# Create a set for already seen elements
seen = set()
for item in iterable:
# Lists are mutable so we need tuples for the set-operations.
tup = tuple(item)
if tup not in seen:
# If the tuple is not in the set append it in REVERSED order.
seen.add(tup[::-1])
# If you also want to remove normal duplicates uncomment the next line
# seen.add(tup)
yield item
>>> list(remove_reversed_duplicates(a))
[[0, 1], [0, 4], [1, 4]]
The generator function might be a pretty fast way to solve this problem because set-lookups are really cheap. This approach also keeps the order of your initial list and only removes reverse duplicates while being faster than most of the alternatives!
If you don't mind using an external library and you want to remove all duplicates (reversed and identical) an alternative is: iteration_utilities.unique_everseen
>>> a = [[0, 1], [0, 4], [1, 0], [1, 4], [4, 0], [4, 1]]
>>> from iteration_utilities import unique_everseen
>>> list(unique_everseen(a, key=set))
[[0, 1], [0, 4], [1, 4]]
This checks if any item has the same contents in arbitary order (thus the key=set
) as another. In this case this works as expected but it also removes duplicate [a, b]
instead of only [b, a]
occurences. You could also use key=sorted
(like the other answers suggest). The unique_everseen
like this has a bad algorithmic complexity because the result of the key
function is not hashable and thus the fast lookup is replaced by a slow lookup. To speed this up you need to make the keys hashable, for example by converting them to sorted tuples (like some other answers suggest):
>>> from iteration_utilities import chained
>>> list(unique_everseen(a, key=chained(sorted, tuple)))
[[0, 1], [0, 4], [1, 4]]
The chained
is nothing else than a faster alternative to lambda x: tuple(sorted(x))
.
EDIT: As mentioned by @jpmc26 one could use frozenset
instead of normal sets:
>>> list(unique_everseen(a, key=frozenset))
[[0, 1], [0, 4], [1, 4]]
To get an idea about the performance I did some timeit
comparisons for the different suggestions:
>>> a = [[0, 1], [0, 4], [1, 0], [1, 4], [4, 0], [4, 1]]
>>> %timeit list(remove_reversed_duplicates(a))
100000 loops, best of 3: 16.1 µs per loop
>>> %timeit list(unique_everseen(a, key=frozenset))
100000 loops, best of 3: 13.6 µs per loop
>>> %timeit list(set(map(frozenset, a)))
100000 loops, best of 3: 7.23 µs per loop
>>> %timeit list(unique_everseen(a, key=set))
10000 loops, best of 3: 26.4 µs per loop
>>> %timeit list(unique_everseen(a, key=chained(sorted, tuple)))
10000 loops, best of 3: 25.8 µs per loop
>>> %timeit [list(tpl) for tpl in list(set([tuple(sorted(pair)) for pair in a]))]
10000 loops, best of 3: 29.8 µs per loop
>>> %timeit set(tuple(item) for item in map(sorted, a))
10000 loops, best of 3: 28.5 µs per loop
Long list with many duplicates:
>>> import random
>>> a = [[random.randint(0, 10), random.randint(0,10)] for _ in range(10000)]
>>> %timeit list(remove_reversed_duplicates(a))
100 loops, best of 3: 12.5 ms per loop
>>> %timeit list(unique_everseen(a, key=frozenset))
100 loops, best of 3: 10 ms per loop
>>> %timeit set(map(frozenset, a))
100 loops, best of 3: 10.4 ms per loop
>>> %timeit list(unique_everseen(a, key=set))
10 loops, best of 3: 47.7 ms per loop
>>> %timeit list(unique_everseen(a, key=chained(sorted, tuple)))
10 loops, best of 3: 22.4 ms per loop
>>> %timeit [list(tpl) for tpl in list(set([tuple(sorted(pair)) for pair in a]))]
10 loops, best of 3: 24 ms per loop
>>> %timeit set(tuple(item) for item in map(sorted, a))
10 loops, best of 3: 35 ms per loop
And with fewer duplicates:
>>> a = [[random.randint(0, 100), random.randint(0,100)] for _ in range(10000)]
>>> %timeit list(remove_reversed_duplicates(a))
100 loops, best of 3: 15.4 ms per loop
>>> %timeit list(unique_everseen(a, key=frozenset))
100 loops, best of 3: 13.1 ms per loop
>>> %timeit set(map(frozenset, a))
100 loops, best of 3: 11.8 ms per loop
>>> %timeit list(unique_everseen(a, key=set))
1 loop, best of 3: 1.96 s per loop
>>> %timeit list(unique_everseen(a, key=chained(sorted, tuple)))
10 loops, best of 3: 24.2 ms per loop
>>> %timeit [list(tpl) for tpl in list(set([tuple(sorted(pair)) for pair in a]))]
10 loops, best of 3: 31.1 ms per loop
>>> %timeit set(tuple(item) for item in map(sorted, a))
10 loops, best of 3: 36.7 ms per loop
So the variants with remove_reversed_duplicates
, unique_everseen
(key=frozenset
) and set(map(frozenset, a))
seem to be by far the fastest solutions. Which one depends on the length of the input and the number of duplicates.
[0, 1]
? – Naturalize[1, 4], [0, 1], [0, 4]
be fine or does it have to be[0, 1], [0, 4], [1, 4]
? – Businessman[1, 1]
or[2, 2]
in the input? Do they need to be preserved as[1, 1]
or is it okay if they are converted to[1]
? – Gere[[3, 4], [4, 3], [2, 5]]
to[[3, 4], [5, 2]]
? – Bangweulu[a, b] == [b, a]
but I think technically it'slist_1 == list_2 or list_1 == list_2[::-1]
– Tailored