Why does Python's itertools.permutations contain duplicates? (When the original list has duplicates)
Asked Answered
F

6

54

It is universally agreed that a list of n distinct symbols has n! permutations. However, when the symbols are not distinct, the most common convention, in mathematics and elsewhere, seems to be to count only distinct permutations. Thus the permutations of the list [1, 1, 2] are usually considered to be
[1, 1, 2], [1, 2, 1], [2, 1, 1]. Indeed, the following C++ code prints precisely those three:

int a[] = {1, 1, 2};
do {
    cout<<a[0]<<" "<<a[1]<<" "<<a[2]<<endl;
} while(next_permutation(a,a+3));

On the other hand, Python's itertools.permutations seems to print something else:

import itertools
for a in itertools.permutations([1, 1, 2]):
    print a

This prints

(1, 1, 2)
(1, 2, 1)
(1, 1, 2)
(1, 2, 1)
(2, 1, 1)
(2, 1, 1)

As user Artsiom Rudzenka pointed out in an answer, the Python documentation says so:

Elements are treated as unique based on their position, not on their value.

My question: why was this design decision made?

It seems that following the usual convention would give results that are more useful (and indeed it is usually exactly what I want)... or is there some application of Python's behaviour that I'm missing?

[Or is it some implementation issue? The algorithm as in next_permutation — for instance explained on StackOverflow here (by me) and shown here to be O(1) amortised — seems efficient and implementable in Python, but is Python doing something even more efficient since it doesn't guarantee lexicographic order based on value? And if so, was the increase in efficiency considered worth it?]

Footwork answered 30/6, 2011 at 12:3 Comment(4)
According to the documentation Python does guarantee lexicographic order.Kiersten
The output example above doesn't seem to be sorted (1,2,1 comes before 1,1,2). Maybe because elements aren't unique?Dominant
@Macke: Yes, that's what I meant — the lexicographic order is based on position, not value. If you think of the two 1's as "1" and "1+" with the second greater, then (1,2,1+) coming before (1+,1,2) is fine. But of course, 1 is 1. :-) Also, if you ask it for permutations of [3,2,1] (say), then the results will actually be in reverse lexicographic order. And if you ask for [2, 1,3], they will be in neither. The point is that Python doesn't look at values, only positions.Footwork
I am also wondering. Especially because "Elements are treated as unique based on their position, not on their value" seems redundant - only one element can occupy a particular position at a time, so basically they are saying "We assume all elements are distinct" or "We don't check solutions for uniqueness".Kimberly
A
31

I can't speak for the designer of itertools.permutations (Raymond Hettinger), but it seems to me that there are a couple of points in favour of the design:

First, if you used a next_permutation-style approach, then you'd be restricted to passing in objects that support a linear ordering. Whereas itertools.permutations provides permutations of any kind of object. Imagine how annoying this would be:

>>> list(itertools.permutations([1+2j, 1-2j, 2+j, 2-j]))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: no ordering relation is defined for complex numbers

Second, by not testing for equality on objects, itertools.permutations avoids paying the cost of calling the __eq__ method in the usual case where it's not necessary.

Basically, itertools.permutations solves the common case reliably and cheaply. There's certainly an argument to be made that itertools ought to provide a function that avoids duplicate permutations, but such a function should be in addition to itertools.permutations, not instead of it. Why not write such a function and submit a patch?

Alleman answered 30/6, 2011 at 12:47 Comment(6)
Thanks, it's a good point that sometimes one wants permutations of elements that are not comparable — writing code for this case, and not looking at the values, does make itertools.permutations really fast. Whether this is actually "the usual case" and "the common case" depends on the user, of course. :-) BTW, how easy is the whole process of submitting a patch to the Python libraries and following it to the end?Footwork
Good answer and good point about efficiency. However, I'm not convinced that this this is a good reason for itertools.permutations to leave duplicates in. It is perfectly reasonable for permutations to require the elements to be comparable. If one explicitly wants permutations of positions, one can explicitly write: ([it[index] for index in indexes] for indexes in itertools.permutations(range(len(it))))Galilean
I'm confused, why do you need linear ordering for unique_permutation? Don't you only need equality test?Sinasinai
@EhsanKia: Look at the implementation of next_permutation that the OP suggests Python ought to use. It uses the < operator on the objects being permuted to find the smallest permutation following the current one. (Obviously there are various ways to get around this, but they make the suggested approach less attractive.)Alleman
@NeilG Your point that it is trivial to obtain the permutation of indices with an implementation that follows the OP's desired functionality is a powerful one. It seems that the OP's design solves all the use cases that the current implementation addresses and a bunch of other common use cases as well. Whereas the current implementation does not solve the additional use cases in a straightforward manner.Sanative
@Scott I just added an answer. The standard library probably won't change even if we think their decision is unreasonable.Galilean
F
16

I'm accepting the answer of Gareth Rees as the most appealing explanation (short of an answer from the Python library designers), namely, that Python's itertools.permutations doesn't compare the values of the elements. Come to think of it, this is what the question asks about, but I see now how it could be seen as an advantage, depending on what one typically uses itertools.permutations for.

Just for completeness, I compared three methods of generating all distinct permutations. Method 1, which is very inefficient memory-wise and time-wise but requires the least new code, is to wrap Python's itertools.permutations, as in zeekay's answer. Method 2 is a generator-based version of C++'s next_permutation, from this blog post. Method 3 is something I wrote that is even closer to C++'s next_permutation algorithm; it modifies the list in-place (I haven't made it too general).

def next_permutationS(l):
    n = len(l)
    #Step 1: Find tail
    last = n-1 #tail is from `last` to end
    while last>0:
        if l[last-1] < l[last]: break
        last -= 1
    #Step 2: Increase the number just before tail
    if last>0:
        small = l[last-1]
        big = n-1
        while l[big] <= small: big -= 1
        l[last-1], l[big] = l[big], small
    #Step 3: Reverse tail
    i = last
    j = n-1
    while i < j:
        l[i], l[j] = l[j], l[i]
        i += 1
        j -= 1
    return last>0

Here are some results. I have even more respect for Python's built-in function now: it's about three to four times as fast as the other methods when the elements are all (or almost all) distinct. Of course, when there are many repeated elements, using it is a terrible idea.

Some results ("us" means microseconds):

l                                       m_itertoolsp  m_nextperm_b  m_nextperm_s
[1, 1, 2]                               5.98 us       12.3 us       7.54 us
[1, 2, 3, 4, 5, 6]                      0.63 ms       2.69 ms       1.77 ms
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]         6.93 s        13.68 s       8.75 s

[1, 2, 3, 4, 6, 6, 6]                   3.12 ms       3.34 ms       2.19 ms
[1, 2, 2, 2, 2, 3, 3, 3, 3, 3]          2400 ms       5.87 ms       3.63 ms
[1, 1, 1, 1, 1, 1, 1, 1, 1, 2]          2320000 us    89.9 us       51.5 us
[1, 1, 2, 2, 3, 3, 4, 4, 4, 4, 4, 4]    429000 ms     361 ms        228 ms

The code is here if anyone wants to explore.

Footwork answered 4/7, 2011 at 13:10 Comment(3)
Do methods m_itertoolsp, m_nextperm_b and m_nextperm_s in the results table refer to Methods 1, 2 & 3 respectively?Tees
You can reverse the tail with just: l[last:n] = p[n-1:last-1:-1]Tees
@IsaacTurner I seem to have missed your comment when it was posted. Yes they refer to Methods 1, 2, and 3 in the answer. And I haven't tried the other way of reversing the tail… it makes for short code but I haven't thought about how well it will perform.Footwork
C
12

It's fairly easy to get the behavior you prefer by wrapping itertools.permutations, which might have influenced the decision. As described in the documentation, itertools is designed as a collection of building blocks/tools to use in building your own iterators.

def unique(iterable):
    seen = set()
    for x in iterable:
        if x in seen:
            continue
        seen.add(x)
        yield x

for a in unique(permutations([1, 1, 2])):
    print a

(1, 1, 2)
(1, 2, 1)
(2, 1, 1)

However, as pointed out in the comments, this might not be quite as efficient as you'd like:

>>> %timeit iterate(permutations([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2]))
1 loops, best of 3: 4.27 s per loop

>>> %timeit iterate(unique(permutations([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2])))
1 loops, best of 3: 13.2 s per loop

Perhaps if there is enough interest, a new function or an optional argument to itertools.permutations could be added to itertools, to generate permutations without duplicates more efficiently.

Charyl answered 30/6, 2011 at 12:34 Comment(5)
+1. This is what you have to do if you want unique permutations. Non-unique permutations can be useful (and fun) too, but are more expensive to compute.Dominant
This has Ω(n!) complexity to generate all permutations — actually I think it's Ω(nn!) since you need Ω(n) time to compare permutations —, which is very very bad relative to next_permutation when the list has duplicates (and so the number of *actual permutations is much smaller than n!). See e.g. this post.Footwork
Instead of [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2], try another couple of 1's, like [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2] — it takes at least hundred times as long. :-)Footwork
Indeed! Out of curiosity, considering instead: [1, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12], would permutations be more or less efficient than a new next_permutations implementation? The main advantage would be avoiding generating additional permutations for objects it's already seen, yes?Charyl
Another serious issue with this solution is memory: since you're keeping all the seen permutations in a set, you will need as much memory as the total size of all permutations… which kind of defeats the point of using itertools. (E.g. for [1,2,3,4,5,6,7,8,9,10], this needs to keep in memory all 10! ≈ 3million permutations, which is several megabytes.)Footwork
I
4

I find also surprising that itertools doesn't have a function for the more intuitive concept of unique permutations. Generating repetitive permutations only to select the unique among them is out of the question for any serious application.

I have written my own iterative generator function which behaves similarly to itertools.permutations but does not return duplicates. Only permutations of the original list are considered, sublists may be created with the standard itertools library.

def unique_permutations(t):
    lt = list(t)
    lnt = len(lt)
    if lnt == 1:
        yield lt
    st = set(t)
    for d in st:
        lt.remove(d)
        for perm in unique_permutations(lt):
            yield [d]+perm
        lt.append(d)
Ineligible answered 16/1, 2013 at 22:52 Comment(2)
Thanks. In my answer above, I have a link to code that has 3 approaches, and some timing comparison -- could you test how fast your unique_permutations is in comparison to m_itertoolsp, m_nextperm_b, and m_nextperm_s?Footwork
I tested the speed as you suggested, and -- not unexpectedly -- my code is 5 to 10 times slower than the two options you suggested. Recursion and list modification has its price. Still, it handily beats the itertools workaround by a factor of hundreds. I only suggested it as an alternative that someone may find a way to improve upon, if it happens to be better suited to a different purpose.Ineligible
G
2

Revisiting this old question, the easiest thing to do now is to use more_itertools.distinct_permutations.

Galilean answered 31/10, 2020 at 1:59 Comment(1)
Yes, more_itertools version works very efficiently. The implemented the method described here: en.wikipedia.org/wiki/… I also tested the implemented versions above. They are also great.Efrenefron
R
1

Maybe i am wrong but seems that reason for this is in 'Elements are treated as unique based on their position, not on their value. So if the input elements are unique, there will be no repeat values in each permutation.' You have specified (1,1,2) and from your point of view 1 at the 0 index and 1 at the 1 index are the same - but this in not so since permutations python implementation used indexes instead of values.

So if we take a look at the default python permutations implementation we will see that it uses indexes:

def permutations(iterable, r=None):
    pool = tuple(iterable)
    n = len(pool)
    r = n if r is None else r
    for indices in product(range(n), repeat=r):
        if len(set(indices)) == r:
            yield tuple(pool[i] for i in indices)

For example if you change your input to [1,2,3] you will get correct permutations([(1, 2, 3), (1, 3, 2), (2, 1, 3), (2, 3, 1), (3, 1, 2), (3, 2, 1)]) since the values are unique.

Reservoir answered 30/6, 2011 at 12:16 Comment(3)
The question is, why was this implemented that way, when we usually expect something else?Kiersten
@Space_C0wb0y - oh, sorry - but then this question should be asked to people who have implemented python. They gives us tutorial and api reference so we able to use it's base features or not if they are not acceptable for us. But from tutorial point of view this method works correctlyReservoir
Yes, Space_C0wb0y has it right: my question is precisely why it is this way. (One possible explanation is that it was simply not designed with lists containing duplicates in mind, and if a reference for this is found, that would be an answer. But there may be some other explanation.) And I don't think questions about design decisions behind a language are entirely out of the scope of this website: the set of people involved in the design of a language, or with access to the discussions, or with some insight into the issue, may have a nontrivial intersection with the users of this website.Footwork

© 2022 - 2024 — McMap. All rights reserved.