Permute all unique enumerations of a vector in R
Asked Answered
J

7

20

I'm trying to find a function that will permute all the unique permutations of a vector, while not counting juxtapositions within subsets of the same element type. For example:

dat <- c(1,0,3,4,1,0,0,3,0,4)

has

factorial(10)
> 3628800

possible permutations, but only 10!/(2!*2!*4!*2!)

factorial(10)/(factorial(2)*factorial(2)*factorial(2)*factorial(4))
> 18900

unique permutations when ignoring juxtapositions within subsets of the same element type.

I can get this by using unique() and the permn() function from the package combinat

unique( permn(dat) )

but this is computationally very expensive, since it involves enumerating n!, which can be an order of magnitude more permutations than I need. Is there a way to do this without first computing n!?

Jernigan answered 15/4, 2011 at 0:19 Comment(3)
Can you elaborate what juxtapositions within subsets of same element type means? Maybe it's obvious, but I'm not seeing it at this point.Paquette
@Chase: there are duplicate values in the vector. You can see it with a smaller vector like c(0,0,2). Half the permutations of permn(c(0,0,2)) are duplicates.Graptolite
I don't have a solution, but I think maybe a different way of thinking about it would help. If you break your original vector into k "value groups," each of size n_k, then what you really want to do is to assign to each group a set of n_k positions (where the position # would be anything between 1 and 10 in your example). So one "permutation" of your sample vector would be as follows: The zeroes get positions 1, 2, 3, 4; the ones get positions 5, 6; the threes get positions 7,8; the fours get positions 9, 10. I hope someone else can see where I'm going & take it from here-Sherborn
S
12

EDIT: Here's a faster answer; again based on the ideas of Louisa Grey and Bryce Wagner, but with faster R code thanks to better use of matrix indexing. It's quite a bit faster than my original:

> ddd <- c(1,0,3,4,1,0,0,3,0,4)
> system.time(up1 <- uniqueperm(d))
   user  system elapsed 
  0.183   0.000   0.186 
> system.time(up2 <- uniqueperm2(d))
   user  system elapsed 
  0.037   0.000   0.038 

And the code:

uniqueperm2 <- function(d) {
  dat <- factor(d)
  N <- length(dat)
  n <- tabulate(dat)
  ng <- length(n)
  if(ng==1) return(d)
  a <- N-c(0,cumsum(n))[-(ng+1)]
  foo <- lapply(1:ng, function(i) matrix(combn(a[i],n[i]),nrow=n[i]))
  out <- matrix(NA, nrow=N, ncol=prod(sapply(foo, ncol)))
  xxx <- c(0,cumsum(sapply(foo, nrow)))
  xxx <- cbind(xxx[-length(xxx)]+1, xxx[-1])
  miss <- matrix(1:N,ncol=1)
  for(i in seq_len(length(foo)-1)) {
    l1 <- foo[[i]]
    nn <- ncol(miss)
    miss <- matrix(rep(miss, ncol(l1)), nrow=nrow(miss))
    k <- (rep(0:(ncol(miss)-1), each=nrow(l1)))*nrow(miss) + 
               l1[,rep(1:ncol(l1), each=nn)]
    out[xxx[i,1]:xxx[i,2],] <- matrix(miss[k], ncol=ncol(miss))
    miss <- matrix(miss[-k], ncol=ncol(miss))
  }
  k <- length(foo)
  out[xxx[k,1]:xxx[k,2],] <- miss
  out <- out[rank(as.numeric(dat), ties="first"),]
  foo <- cbind(as.vector(out), as.vector(col(out)))
  out[foo] <- d
  t(out)
}

It doesn't return the same order, but after sorting, the results are identical.

up1a <- up1[do.call(order, as.data.frame(up1)),]
up2a <- up2[do.call(order, as.data.frame(up2)),]
identical(up1a, up2a)

For my first attempt, see the edit history.

Sheelah answered 16/4, 2011 at 1:26 Comment(8)
Good function, thanks! A tiny thing: The (not very sensible) edge case where d has length 1 fails in the for(i in 2:ng) loop since foo then has only 1 component.Leonoraleonore
@Aaron: is there an easy way to fix the bug that caracal mentions above?Jernigan
Also just realized this is the same thing Bryce suggests. It's quite possible the combining could be done faster, either by being more careful in R or by rewriting in C; if anyone feels like trying to speed it up, feel free. For one thing, I'm sure I'm creating more matrices along the way than is necessary.Sheelah
I wonder - would there be a way to use multicore or foreach to use multiple cores and speed this up? It looks like out is being continually overwritten in the for loop, so, perhaps this isn't possible.Bob
Not with the algorithm I use here, no, for exactly the reason you note. I'm sure it could be rewritten in a smarter way where at least part of it could use multiple cores, but I suspect that combining them together might depend on all the results and would be the part that slows it down. My intuition is that you could get more speedup more easily by thinking harder about the algorithm or rewriting in C.Sheelah
I have a faster solution here. See 7) and the function plainpermCulbertson
plainperm is simply multicool::allPerm(multicool::initMC(x)). Multicool implements the Cool-lex loop free multiset algorithm developed by Aaron Williams. A nifty animation of the algo is avail on his academic page.Autarchy
I have written a package called iterpc a while ago to solve this. The iterpc answer can be found below. It is much faster than multicool and uniqueperm2.Culbertson
R
4

The following function (which implements the classic formula for repeated permutations just like you did manually in your question) seems quite fast to me:

upermn <- function(x) {
    n <- length(x)
    duplicates <- as.numeric(table(x))
    factorial(n) / prod(factorial(duplicates))
}

It does compute n! but not like permn function which generates all permutations first.

See it in action:

> dat <- c(1,0,3,4,1,0,0,3,0,4)
> upermn(dat)
[1] 18900
> system.time(uperm(dat))
   user  system elapsed 
  0.000   0.000   0.001 

UPDATE: I have just realized that the question was about generating all unique permutations not just specifying the number of them - sorry for that!

You could improve the unique(perm(...)) part with specifying unique permutations for one less element and later adding the uniqe elements in front of them. Well, my explanation may fail, so let the source speak:

uperm <- function(x) {
u <- unique(x)                    # unique values of the vector
result <- x                       # let's start the result matrix with the vector
for (i in 1:length(u)) {
    v <- x[-which(x==u[i])[1]]    # leave the first occurance of duplicated values
    result <- rbind(result, cbind(u[i], do.call(rbind, unique(permn(v)))))
}
return(result)
}

This way you could gain some speed. I was lazy to run the code on the vector you provided (took so much time), here is a small comparison on a smaller vector:

> dat <- c(1,0,3,4,1,0,0)
> system.time(unique(permn(dat)))
   user  system elapsed 
  0.264   0.000   0.268 
> system.time(uperm(dat))
   user  system elapsed 
  0.147   0.000   0.150 

I think you could gain a lot more by rewriting this function to be recursive!


UPDATE (again): I have tried to make up a recursive function with my limited knowledge:

uperm <- function(x) {
    u <- sort(unique(x))
    l <- length(u)
    if (l == length(x)) {
        return(do.call(rbind,permn(x)))
    }
    if (l == 1) return(x)
    result <- matrix(NA, upermn(x), length(x))
    index <- 1
    for (i in 1:l) {
        v <- x[-which(x==u[i])[1]]
        newindex <- upermn(v)
        if (table(x)[i] == 1) {
            result[index:(index+newindex-1),] <- cbind(u[i], do.call(rbind, unique(permn(v))))
            } else {
                result[index:(index+newindex-1),] <- cbind(u[i], uperm(v))
            }
        index <- index+newindex
    }
    return(result)
}

Which has a great gain:

> system.time(unique(permn(c(1,0,3,4,1,0,0,3,0))))
   user  system elapsed 
 22.808   0.103  23.241 

> system.time(uperm(c(1,0,3,4,1,0,0,3,0)))
   user  system elapsed 
  4.613   0.003   4.645 

Please report back if this would work for you!

Reames answered 15/4, 2011 at 15:53 Comment(3)
I just get an error with that last one - Error: evaluation nested too deeply: infinite recursion / options(expressions=)? The first recursive function worked pretty well though - a big improvement in time. Thanks a lot for taking the time. If you can sort out the error that would be fantastic.Jernigan
@Steve: the last uperm function runs fine on my machine with the data you provided. It takes 17 seconds to calculate uperm(c(1,0,3,4,1,0,0,3,0,4)) here. Have you checked the last version of the function? I have edited my answer 25 minutes ago.Reames
@Steve: you can find upermn function in my post above. Just run it before running the uperm function. This is used to compute and declare the number of rows of the result matrix not to muddle with rbind (which is good for performance).Reames
B
3

One option that hasn't been mentioned here is the allPerm function from the multicool package. It can be used pretty easily to get all the unique permutations:

library(multicool)
perms <- allPerm(initMC(dat))
dim(perms)
# [1] 18900    10
head(perms)
#      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,]    4    4    3    3    1    1    0    0    0     0
# [2,]    0    4    4    3    3    1    1    0    0     0
# [3,]    4    0    4    3    3    1    1    0    0     0
# [4,]    4    4    0    3    3    1    1    0    0     0
# [5,]    3    4    4    0    3    1    1    0    0     0
# [6,]    4    3    4    0    3    1    1    0    0     0

In benchmarking I found it to be faster on dat than the solutions from the OP and daroczig but slower than the solution from Aaron.

Boulogne answered 17/9, 2015 at 23:48 Comment(2)
On my PC, microbenchmark(uniqueperm2(dat),allPerm(initMC(dat))) tells me allperm is about 7x faster..Perichondrium
@Perichondrium On my Mac, allPerm(initMC()) is 100 times slower than uniqueperm2, but sure why. Could you also test the speed of the iterpc solution below?Culbertson
I
2

I don't actually know R, but here's how I'd approach the problem:

Find how many of each element type, i.e.

4 X 0
2 X 1
2 X 3
2 X 4

Sort by frequency (which the above already is).

Start with the most frequent value, which takes up 4 of the 10 spots. Determine the unique combinations of 4 values within the 10 available spots. (0,1,2,3),(0,1,2,4),(0,1,2,5),(0,1,2,6) ... (0,1,2,9),(0,1,3,4),(0,1,3,5) ... (6,7,8,9)

Go to the second most frequent value, it takes up 2 of 6 available spots, and determine it's unique combinations of 2 of 6. (0,1),(0,2),(0,3),(0,4),(0,5),(1,2),(1,3) ... (4,6),(5,6)

Then 2 of 4: (0,1),(0,2),(0,3),(1,2),(1,3),(2,3)

And the remaining values, 2 of 2: (0,1)

Then you need to combine them into each possible combination. Here's some pseudocode (I'm convinced there's a more efficient algorithm for this, but this shouldn't be too bad):

lookup = (0,1,3,4)
For each of the above sets of combinations, example: input = ((0,2,4,6),(0,2),(2,3),(0,1))
newPermutation = (-1,-1,-1,-1,-1,-1,-1,-1,-1,-1)
for i = 0 to 3
  index = 0
  for j = 0 to 9
    if newPermutation(j) = -1
      if index = input(i)(j)
        newPermutation(j) = lookup(i)
        break
      else
        index = index + 1
Inconsistent answered 15/4, 2011 at 18:7 Comment(0)
C
2

Another option is the iterpc package, I believe it is the fastest of the existing method. More importantly, the result is in dictionary order (which may be somehow preferable).

dat <- c(1, 0, 3, 4, 1, 0, 0, 3, 0, 4)
library(iterpc)
getall(iterpc(table(dat), order=TRUE))

The benchmark indicates that iterpc is significant faster than all other methods described here

library(multicool)
library(microbenchmark)
microbenchmark(uniqueperm2(dat), 
               allPerm(initMC(dat)), 
               getall(iterpc(table(dat), order=TRUE))
              )

Unit: milliseconds
                                     expr         min         lq        mean      median
                         uniqueperm2(dat)   23.011864   25.33241   40.141907   27.143952
                     allPerm(initMC(dat)) 1713.549069 1771.83972 1814.434743 1810.331342
 getall(iterpc(table(dat), order = TRUE))    4.332674    5.18348    7.656063    5.989448
          uq        max neval
   64.147399   74.66312   100
 1855.869670 1937.48088   100
    6.705741   49.98038   100
Culbertson answered 25/3, 2016 at 18:44 Comment(1)
iterpc has been deprecated, check the package arrangements.Culbertson
I
1

As this question is old and continues to attract many views, this post is solely meant to inform R users of the current state of the language with regards to performing the popular task outlined by the OP. As @RandyLai alludes to, there are packages developed with this task in mind. They are: arrangements and RcppAlgos*.

Efficiency

They are very efficient and quite easy to use for generating permutations of a multiset.

dat <- c(1, 0, 3, 4, 1, 0, 0, 3, 0, 4)
dim(RcppAlgos::permuteGeneral(sort(unique(dat)), freqs = table(dat)))
[1] 18900    10

microbenchmark(algos = RcppAlgos::permuteGeneral(sort(unique(dat)), freqs = table(dat)),
               arngmnt = arrangements::permutations(sort(unique(dat)), freq = table(dat)),
               curaccptd = uniqueperm2(dat), unit = "relative")
Unit: relative
     expr       min        lq       mean    median        uq       max neval
    algos  1.000000  1.000000  1.0000000  1.000000  1.000000 1.0000000   100
  arngmnt  1.501262  1.093072  0.8783185  1.089927  1.133112 0.3238829   100
curaccptd 19.847457 12.573657 10.2272080 11.705090 11.872955 3.9007364   100

With RcppAlgos we can utilize parallel processing for even better efficiency on larger examples.

hugeDat <- rep(dat, 2)[-(1:5)]
RcppAlgos::permuteCount(sort(unique(hugeDat)), freqs = table(hugeDat))
[1] 3603600

microbenchmark(algospar = RcppAlgos::permuteGeneral(sort(unique(hugeDat)),
                                                    freqs = table(hugeDat), nThreads = 4),
               arngmnt = arrangements::permutations(sort(unique(hugeDat)), freq = table(hugeDat)),
               curaccptd = uniqueperm2(hugeDat), unit = "relative", times = 10)
Unit: relative
     expr      min        lq      mean    median       uq      max neval
 algospar  1.00000  1.000000  1.000000  1.000000  1.00000  1.00000    10
  arngmnt  3.23193  3.109092  2.427836  2.598058  2.15965  1.79889    10
curaccptd 49.46989 45.910901 34.533521 39.399481 28.87192 22.95247    10

Lexicographical Order

A nice benefit of these packages is that the output is in lexicographical order:

head(RcppAlgos::permuteGeneral(sort(unique(dat)), freqs = table(dat)))
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]    0    0    0    0    1    1    3    3    4     4
[2,]    0    0    0    0    1    1    3    4    3     4
[3,]    0    0    0    0    1    1    3    4    4     3
[4,]    0    0    0    0    1    1    4    3    3     4
[5,]    0    0    0    0    1    1    4    3    4     3
[6,]    0    0    0    0    1    1    4    4    3     3

tail(RcppAlgos::permuteGeneral(sort(unique(dat)), freqs = table(dat)))
         [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[18895,]    4    4    3    3    0    1    1    0    0     0
[18896,]    4    4    3    3    1    0    0    0    0     1
[18897,]    4    4    3    3    1    0    0    0    1     0
[18898,]    4    4    3    3    1    0    0    1    0     0
[18899,]    4    4    3    3    1    0    1    0    0     0
[18900,]    4    4    3    3    1    1    0    0    0     0

identical(RcppAlgos::permuteGeneral(sort(unique(dat)), freqs = table(dat)),
      arrangements::permutations(sort(unique(dat)), freq = table(dat)))
[1] TRUE

Iterators

Additionally, both packages offer iterators that allow for memory efficient generation of permutation, one by one:

algosIter <- RcppAlgos::permuteIter(sort(unique(dat)), freqs = table(dat))

algosIter$nextIter()
[1] 0 0 0 0 1 1 3 3 4 4

algosIter$nextNIter(5)
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]    0    0    0    0    1    1    3    4    3     4
[2,]    0    0    0    0    1    1    3    4    4     3
[3,]    0    0    0    0    1    1    4    3    3     4
[4,]    0    0    0    0    1    1    4    3    4     3
[5,]    0    0    0    0    1    1    4    4    3     3

## last permutation
algosIter$back()
[1] 4 4 3 3 1 1 0 0 0 0

## use reverse iterator methods
algosIter$prevNIter(5)
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]    4    4    3    3    1    0    1    0    0     0
[2,]    4    4    3    3    1    0    0    1    0     0
[3,]    4    4    3    3    1    0    0    0    1     0
[4,]    4    4    3    3    1    0    0    0    0     1
[5,]    4    4    3    3    0    1    1    0    0     0

* I am the author of RcppAlgos

Inhale answered 15/4, 2011 at 0:19 Comment(0)
B
0

Another option is by using the Rcpp package. The difference is that it returns a list.

//[[Rcpp::export]]
std::vector<std::vector< int > > UniqueP(std::vector<int> v){
std::vector< std::vector<int> > out;
std::sort (v.begin(),v.end());
do {
    out.push_back(v);
} while ( std::next_permutation(v.begin(),v.end()));
return out;
}
 Unit: milliseconds
         expr       min      lq     mean    median       uq      max neval cld
 uniqueperm2(dat) 10.753426 13.5283 15.61438 13.751179 16.16061 34.03334   100   b
 UniqueP(dat)      9.090222  9.6371 10.30185  9.838324 10.20819 24.50451   100   a 
Brew answered 13/2, 2020 at 2:43 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.