generate sequence with all permutations
Asked Answered
B

6

12

How can I generate the shortest sequence with contains all possible permutations?

Example: For length 2 the answer is 121, because this list contains 12 and 21, which are all possible permutations.

For length 3 the answer is 123121321, because this list contains all possible permutations: 123, 231, 312, 121 (invalid), 213, 132, 321.

Each number (within a given permutation) may only occur once.

Bondsman answered 12/2, 2010 at 16:17 Comment(2)
Interesting problem. I don't even know the length of the shortest such sequence. A lower bound is n! + n - 1, since there are n! distinct permutations and each must have a distinct starting point in the sequence. An upper bound is (n-1)!(2n-1), which you can achieve by arbitrarily choosing an element, then stringing together a "full cycle" of each permutation ending with that element, i.e. almost two copies of the permutation, as in 234512345. There are (n-1)! such permutations, and each cycle is of length (2n-1).Haler
did the answer you select is the right (and optimal) answerKeldon
H
6

This greedy algorithm produces fairly short minimal sequences.

UPDATE: Note that for n ≥ 6, this algorithm does not produce the shortest possible string!

  • Make a collection of all permutations.
  • Remove the first permutation from the collection.
  • Let a = the first permutation.
  • Find the sequence in the collection that has the greatest overlap with the end of a. If there is a tie, choose the sequence is first in lexicographic order. Remove the chosen sequence from the collection and add the non-overlapping part to the end of a. Repeat this step until the collection is empty.

The curious tie-breaking step is necessary for correctness; breaking the tie at random instead seems to result in longer strings.

I verified (by writing a much longer, slower program) that the answer this algorithm gives for length 4, 123412314231243121342132413214321, is indeed the shortest answer. However, for length 6 it produces an answer of length 873, which is longer than the shortest known solution.

The algorithm is O(n!2).

An implementation in Python:

import itertools

def costToAdd(a, b):
    for i in range(1, len(b)):
        if a.endswith(b[:-i]):
            return i
    return len(b)

def stringContainingAllPermutationsOf(s):
    perms = set(''.join(tpl) for tpl in itertools.permutations(s))
    perms.remove(s)
    a = s
    while perms:
        cost, next = min((costToAdd(a, x), x) for x in perms)
        perms.remove(next)
        a += next[-cost:]
    return a

The length of the strings generated by this function are 1, 3, 9, 33, 153, 873, 5913, ... which appears to be this integer sequence.

I have a hunch you can do better than O(n!2).

Haler answered 13/2, 2010 at 14:9 Comment(2)
The greedy algorithm looks nice! But could you also post your 'much longer, slower program'? I also made one myself but it's very slow and I would like to compare it to yours.Bondsman
Here's code for three algorithms: a long, slow A*ish search; the greedy algorithm; and my much faster second answer. pastie.org/827466 (But the long, slow code is probably extremely hard to follow, as I never expected anyone else to look at it!)Haler
E
5
  • Create all permutations.
  • Let each permutation represent a node in a graph.
  • Now, for any two states add an edge with a value 1 if they share n-1 digits (for the source from the end, and for the target from the end), two if they share n-2 digits and so on.
  • Now, you are left to find the shortest path containing n vertices.
Episcopalian answered 12/2, 2010 at 16:31 Comment(5)
Hmm isn't it about finding the shortest path containing all the vertices (not "n")? Anyway as you formulate it, it's a generalization of the traveling salesman problem, which is NP-complete. Does this idea lead to a practical algorithm?Cholecyst
I'd have gone for a BFS at step 4.Episcopalian
antti is right: you have to find the shortest path containing all the vertices. I don't see how BFS helps.Haler
@Jason Orendorff: The shortest path is taken care of by taking into account the prefix/suffix match lengths. IMHO, all that remains is to find a path involving n nodes.Episcopalian
But how can a path involving n nodes include all n! permutations?Haler
H
3

Here is a fast algorithm that produces a short string containing all permutations. I am pretty sure it produces the shortest possible answer, but I don't have a complete proof in hand.

Explanation. Below is a tree of All Permutations. The picture is incomplete; imagine that the tree goes on forever to the right.

1 --+-- 12 --+-- 123 ...
    |        |
    |        +-- 231 ...
    |        |
    |        +-- 312 ...
    |
    +-- 21 --+-- 213 ...
             |
             +-- 132 ...
             |
             +-- 321 ...

The nodes at level k of this tree are all the permutations of length k. Furthermore, the permutations are in a particular order with a lot of overlap between each permutation and its neighbors above and below.

To be precise, each node's first child is found by simply adding the next symbol to the end. For example, the first child of 213 would be 2134. The rest of the children are found by rotating to the first child to left one symbol at a time. Rotating 2134 would produce 1342, 3421, 4213.

Taking all the nodes at a given level and stringing them together, overlapping as much as possible, produces the strings 1, 121, 123121321, etc.

The length of the nth string in that sequence is the sum for x=1 to n of x!. (You can prove this by observing how much non-overlap there is between neighboring permutations. Siblings overlap in all but 1 symbol; first-cousins overlap in all but 2 symbols; and so on.)

Sketch of proof. I haven't completely proved that this is the best solution, but here's a sketch of how the proof would proceed. First show that any string containing n distinct permutations has length ≥ 2n - 1. Then show that adding any string containing n+1 distinct permutations has length 2n + 1. That is, adding one more permutation will cost you two digits. Proceed by calculating the minimum length of strings containing nPr and nPr + 1 distinct permutations, up to n!. In short, this sequence is optimal because you can't make it worse somewhere in the hope of making it better someplace else. It's already locally optimal everywhere. All the moves are forced.

Algorithm. Given all this background, the algorithm is very simple. Walk this tree to the desired depth and string together all the nodes at that depth.

Fortunately we do not actually have to build the tree in memory.

def build(node, s):
    """String together all descendants of the given node at the target depth."""
    d = len(node)  # depth of this node. depth of "213" is 3.
    n = len(s)     # target depth
    if d == n - 1:
        return node + s[n - 1] + node    # children of 213 join to make "2134213"
    else:
        c0 = node + s[d]                 # first child node
        children = [c0[i:] + c0[:i] for i in range(d + 1)]  # all child nodes
        strings = [build(c, s) for c in children]  # recurse to the desired depth
        for j in range(1, d + 1):
            strings[j] = strings[j][d:]  # cut off overlap with previous sibling
        return ''.join(strings)          # join what's left

def stringContainingAllPermutationsOf(s):
    return build(s[:1], s)

Performance. The above code is already much faster than my other solution, and it does a lot of cutting and pasting of large strings that you can optimize away. The algorithm can be made to run in time and memory proportional to the size of the output.

Haler answered 16/2, 2010 at 17:42 Comment(2)
Just a note that this doesn’t always produce the shortest possible answer. A couple of years ago I found a shorter string for 123456: see the paper here.Skedaddle
Yes, the length (sum k!) is not minimal for all n > 5. Also, you do not need to generate the tree, its more efficient to start with the solution for n-1, consider all substrings of length n-1 which are permutations (i.e. discard those with not all letters distinct), glue together 2 copies of each with an "n" in the middle, and then merge them together, deleting any possible "overlap" (i.e., a repeated substring).Archiphoneme
C
1

I show one example with DNA sequences. Sometimes, I need to generate all possible short DNA sequences (usually called oligos). I use a queue to replace recursion. I did not optimize, if you want to save memory, you can use resize of string to the desired length. To further save memory (and maybe efficiency), you can use pointer and length of the the construction intermediate. My code is C++. It can be converted any any other language. For those DNA (High school) is a foreign word, it has only 4 letters: A, C, G, T. For other set of alphabets, you will need a loop to save some typing; otherwise, you can still type, say English 26, lines of code manually with copy-pasting.

 vector<string> buildAllOligo(short len) {
     //int num = pow(4, len);
     //vector<char*> res;
     vector<string> res;
     queue<string> intermediate;
     intermediate.push(string());
     while (!intermediate.empty()) {
        string addA(std::move(intermediate.front()));
        string addC(addA);
        string addG(addA);
        string addT(addA);
        addA.push_back('A');
        addC.push_back('C');
        addG.push_back('G');
        addT.push_back('T');
        intermediate.pop();
        if (static_cast<short>(addA.size()) < len) {
           intermediate.push(std::move(addA));
           intermediate.push(std::move(addC));
           intermediate.push(std::move(addG));
           intermediate.push(std::move(addT));
        }
        else {
           res.push_back(std::move(addA));
           res.push_back(std::move(addC));
           res.push_back(std::move(addG));
           res.push_back(std::move(addT));
        }
     }
     assert(res.size() == pow(4,len));
     return res;
  }
Connective answered 8/12, 2023 at 17:28 Comment(0)
M
0

For n 3 length chain is 8 12312132 Seems to me we are working with cycled system - it's ring, saying in other words. But we are are working with ring as if it is chain. Chain is realy 123121321 = 9 But the ring is 12312132 = 8 We take last 1 for 321 from the beginning of the sequence 12312132[1].

Mcmahan answered 4/12, 2018 at 11:1 Comment(1)
You are wrong, we are not working with a cyclic system, which is clear from the example "121" for n=2 (it would be "12" for the cyclic case).Your idea is nice and interesting but clearly a different problem - as would be "subsequence" instead of "substring".Archiphoneme
A
0

These are called (minimal length) superpermutations (cf. Wikipedia). Interest on this has re-sparked when an anonymous user has posted a new lower bound on 4chan. (See Wikipedia and many other web pages for history.)

AFAIK, as of today we just know:

  • Their length is A180632(n) ≤ A007489(n) = Sum_{k=1..n} k! but this bound is only sharp for n ≤ 5, i.e., we have equality for n ≤ 5 but strictly less for n > 5.
  • There's a very simple recursive algorithm, given below, producing a superpermutation of length A007489(n), which is always palindromic (but as said above this is not the minimal length for n > 5).
  • For n ≥ 7 we have the better upper bound n! + (n−1)! + (n−2)! + (n−3)! + n − 3.
  • For n ≤ 5 all minimal SP's are known; and for all n > 5 we don't know which is the minimal SP.
  • For n = 1, 2, 3, 4 the minimal SP's are unique (up to changing the symbols), given by (1, 121, 123121321, 123412314231243121342132413214321) of length A007489(1..4) = (1, 3, 9, 33).
  • For n = 5 there are 8 inequivalent ones of minimal length 153 = A007489(5); the palindromic one produced by the algorithm below is the 3rd in lexicographic order.
  • For n = 6 Houston produced thousands of the smallest known length 872 = A007489(6) - 1, but AFAIK we still don't know whether this is minimal.
  • For n = 7 Egan produced one of length 5906 (one less than the better upper bound given above) but again we don't know whether that's minimal.

I've written a very short PARI/GP program (you can paste to run it on the PARI/GP web site) which implements the standard algorithm producing a palindromic superpermutation of length A007489(n):

extend(S,n=vecmax(s))={ my(t); concat([
  if(#Set(s)<n, [], /* discard if not a permutation */
    s=concat([s, n+1, s]); /* Now merge with preceding segment: */ 
    forstep(i=min(#s, #t)-1, 0, -1,
      if(s[1..1+i]==t[#t-i..#t], s=s[2+i..-1]; break));
    t=s /* store as previous for next */
  )/*endif*/
  | s <- [ S[i+1..i+n] | i <- [0..#S-n] ]])
}
SSP=vector(6, n, s=if(n>1, extend(s), [1])); // gives the first 6, the 6th being non-minimal

I think that easily translates to any other language. (For non-PARI speaking persons: "| x <-" means "for x in".)

Archiphoneme answered 29/7, 2020 at 17:18 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.