How to generate permutations or combinations of object in R?
Asked Answered
V

3

35

How to generate sequences of r objects from n objects? I'm looking for a way to do either permutations or combinations, with/without replacement, with distinct and non-distinct items (aka multisets).

This is related to twelvefold way. The "distinct" solutions could be included in twelvefold way, while the "non-distinct" are not included.

Vange answered 21/3, 2014 at 20:43 Comment(5)
There are, arguably, twelve questions of this type.Junette
Yeah, it's a really useful way to organize and think about all of those different combinatorial objects. FYI, most of Google's first page hits for "Twelvefold Way" include more readable tables/clearer explanations than does the Wikipedia page I linked.Junette
Thanks for the information. I think what I am missing are the surjective cases. Right..? [update]: it seems to be wrongVange
You're right, that's wrong ;) The characteristics on which the 12-fold classification is based are +/- different than what you've picked. For me, by far the best way to think about it is as looking at n balls being placed into m urns. There are three possible restrictions on how they can be placed (no restriction, must be injective, or must be surjective), and 4 possible combinations of labeled/unlabeled balls and urns. Here and here are 2 sources that use that lens to view the problem.Junette
Finally, I figure out the difference between the 8 questions here and twelvefold. Four of the questions here are in twelvefold (those "distinct" questions) while those "non-distinct" questions are not in twelvefold.Vange
V
35

EDIT: I have updated the answer to use a more efficient package arrangements

Getting start of using arrangement

arrangements contains some efficient generators and iterators for permutations and combinations. It has been demonstrated that arrangements outperforms most of the existing packages of similar kind. Some benchmarks could be found here.

Here are the answers to the above questions

# 1) combinations: without replacement: distinct items

combinations(5, 2)

      [,1] [,2]
 [1,]    1    2
 [2,]    1    3
 [3,]    1    4
 [4,]    1    5
 [5,]    2    3
 [6,]    2    4
 [7,]    2    5
 [8,]    3    4
 [9,]    3    5
[10,]    4    5


# 2) combinations: with replacement: distinct items

combinations(5, 2, replace=TRUE)

      [,1] [,2]
 [1,]    1    1
 [2,]    1    2
 [3,]    1    3
 [4,]    1    4
 [5,]    1    5
 [6,]    2    2
 [7,]    2    3
 [8,]    2    4
 [9,]    2    5
[10,]    3    3
[11,]    3    4
[12,]    3    5
[13,]    4    4
[14,]    4    5
[15,]    5    5



# 3) combinations: without replacement: non distinct items

combinations(x = c("a", "b", "c"), freq = c(2, 1, 1), k = 2)

     [,1] [,2]
[1,] "a"  "a" 
[2,] "a"  "b" 
[3,] "a"  "c" 
[4,] "b"  "c" 



# 4) combinations: with replacement: non distinct items

combinations(x = c("a", "b", "c"), k = 2, replace = TRUE)  # as `freq` does not matter

     [,1] [,2]
[1,] "a"  "a" 
[2,] "a"  "b" 
[3,] "a"  "c" 
[4,] "b"  "b" 
[5,] "b"  "c" 
[6,] "c"  "c" 

# 5) permutations: without replacement: distinct items

permutations(5, 2)

      [,1] [,2]
 [1,]    1    2
 [2,]    1    3
 [3,]    1    4
 [4,]    1    5
 [5,]    2    1
 [6,]    2    3
 [7,]    2    4
 [8,]    2    5
 [9,]    3    1
[10,]    3    2
[11,]    3    4
[12,]    3    5
[13,]    4    1
[14,]    4    2
[15,]    4    3
[16,]    4    5
[17,]    5    1
[18,]    5    2
[19,]    5    3
[20,]    5    4



# 6) permutations: with replacement: distinct items

permutations(5, 2, replace = TRUE)

      [,1] [,2]
 [1,]    1    1
 [2,]    1    2
 [3,]    1    3
 [4,]    1    4
 [5,]    1    5
 [6,]    2    1
 [7,]    2    2
 [8,]    2    3
 [9,]    2    4
[10,]    2    5
[11,]    3    1
[12,]    3    2
[13,]    3    3
[14,]    3    4
[15,]    3    5
[16,]    4    1
[17,]    4    2
[18,]    4    3
[19,]    4    4
[20,]    4    5
[21,]    5    1
[22,]    5    2
[23,]    5    3
[24,]    5    4
[25,]    5    5


# 7) permutations: without replacement: non distinct items

permutations(x = c("a", "b", "c"), freq = c(2, 1, 1), k = 2)

     [,1] [,2]
[1,] "a"  "a" 
[2,] "a"  "b" 
[3,] "a"  "c" 
[4,] "b"  "a" 
[5,] "b"  "c" 
[6,] "c"  "a" 
[7,] "c"  "b" 



# 8) permutations: with replacement: non distinct items

permutations(x = c("a", "b", "c"), k = 2, replace = TRUE)  # as `freq` doesn't matter

      [,1] [,2]
 [1,] "a"  "a" 
 [2,] "a"  "b" 
 [3,] "a"  "c" 
 [4,] "b"  "a" 
 [5,] "b"  "b" 
 [6,] "b"  "c" 
 [7,] "c"  "a" 
 [8,] "c"  "b" 
 [9,] "c"  "c" 

Compare to other packages

There are few advantages of using arrangements over the existing packages.

  1. Integral framework: you don't have to use different packages for different methods.

  2. It is very efficient. See https://randy3k.github.io/arrangements/articles/benchmark.html for some benchmarks.

  3. It is memory efficient, it is able to generate all 13! permutation of 1 to 13, existing packages will fail to do so because of the limitation of matrix size. The getnext() method of the iterators allow users to get the arrangements one by one.

  4. The generated arrangements are in dictionary order which may be desired for some users.

Vange answered 21/3, 2014 at 20:43 Comment(6)
I think this answer might be improved by showing some output that tells the story of each "question."Tapp
This answer is an advertisement for your package. If you're going to do that, please demonstrate the various capabilities and why they are superior to previous methods. As it is, in my opinion, this question and answer does not supplant all other questions about combinations/permutations (and it looks like this is your intent).Breadbasket
Hi Matthew, sorry to make you feel like it is an advertisement (indeed it is :)..) If you to go to see the editing history of my answer, you will see that the old answers are using other packages. In particularly, there is no package in doing k-permeation of multi set, see the home-brew function here. Since I was unsatisfied with the existing packages, so I decided to write my own package.Vange
But I agree with you, I should compare my package with the existing packages.Vange
Might I suggest that you change your function names. The functions combinations/permutations from gtools are so widely used, your package could potentially break dependencies/legacy code/etc. When developing packages I like to use the adage articulated by @DirkEddelbuettel: "Don't do harm".Vorfeld
I guess it is in general not an issue because the recommended way for a package to call a function in another package is <package>::<function>. Unless scripts are concerned, having similar names should not do much harm.....IMO.Vange
V
43

A Walk Through a Slice of Combinatorics in R*

Below, we examine packages equipped with the capabilities of generating combinations & permutations. If I have left out any package, please forgive me and please leave a comment or better yet, edit this post.

Outline of analysis:

  1. Introduction
  2. Setup
  3. Combinations
  4. Permutations
  5. Multisets
  6. Summary
  7. Memory

Before we begin, we note that combinations/permutations with replacement of distinct vs. non-distint items chosen m at a time are equivalent. This is so, because when we have replacement, it is not specific. Thus, no matter how many times a particular element originally occurs, the output will have an instance(s) of that element repeated 1 to m times.

1. Introduction

Packages:

  1. gtools
  2. combinat
  3. multicool
  4. partitions
  5. RcppAlgos
  6. arrangements
  7. utils

I did not include permute or permutations as they are not really meant to attack these types of problems. I also did not include the updated gRbase as certain cases crashed my computer.

|————————————— OVERVIEW —————————————-|

|_________________| gtools | combinat | multicool | partitions |
|       comb rep  |  Yes   |          |           |            |
|    comb NO rep  |  Yes   |   Yes    |           |            |
|       perm rep  |  Yes   |          |           |            |
|    perm NO rep  |  Yes   |   Yes    |    Yes    |    Yes     |
|  perm multiset  |        |          |    Yes    |    Yes     |
|  comb multiset  |        |          |           |            |
| accepts factors |        |   Yes    |           |            |
|    m at a time  |  Yes   |  Yes/No  |           |            |
| general vector  |  Yes   |   Yes    |    Yes    |            |
|     iterable    |        |          |    Yes    |            |
| parallelizable  |        |          |           |            |
| multi-threaded  |        |          |           |            |
|   big integer   |        |          |           |            |

|_________________| arrangements | RcppAlgos | utils |
|       comb rep  |     Yes      |    Yes    |       |
|    comb NO rep  |     Yes      |    Yes    |  Yes  |
|       perm rep  |     Yes      |    Yes    |       |
|    perm NO rep  |     Yes      |    Yes    |       |
|  perm multiset  |     Yes      |    Yes    |       |
|  comb multiset  |     Yes      |    Yes    |       |
| accepts factors |     Yes      |    Yes    |  Yes  |
|    m at a time  |     Yes      |    Yes    |  Yes  |
| general vector  |     Yes      |    Yes    |  Yes  |
|     iterable    |     Yes      |    Yes    |       |
| parallelizable  |     Yes      |    Yes    |       |
|   big integer   |     Yes      |    Yes    |       |
| multi-threaded  |              |    Yes    |       |

The tasks, m at a time and general vector, refer to the capability of generating results “m at a time” and rearranging a “general vector” as opposed to 1:n. In practice, we are generally concerned with finding rearrangements of a general vector, therefore all examinations below will reflect this when possible.

2. Setup

All benchmarks were ran on 3 different set-ups.

  1. 2022 Macbook Air Apple M2 24 GB
  2. 2020 Macbook Pro i7 16 GB
  3. 2022 Windows Surface i5 16 GB
library(microbenchmark)
## print up to 4 digits to keep microbenchmark output tidy
options(digits = 4)
options(width = 90)

numThreads <- min(as.integer(RcppAlgos::stdThreadMax() / 2), 6)
numThreads
#> [1] 4

pkgs <- c("gtools", "combinat", "multicool", "partitions",
          "RcppAlgos", "arrangements", "utils", "microbenchmark")
sapply(pkgs, packageVersion, simplify = FALSE)
#> $gtools
#> [1] '3.9.3'
#> 
#> $combinat
#> [1] '0.0.8'
#> 
#> $multicool
#> [1] '0.1.12'
#> 
#> $partitions
#> [1] '1.10.7'
#> 
#> $RcppAlgos
#> [1] '2.6.0'
#> 
#> $arrangements
#> [1] '1.1.9'
#> 
#> $utils
#> [1] '4.2.1'
#> 
#> $microbenchmark
#> [1] '1.4.7'

The listed results were obtained from setup #1 (i.e. Macbook Air M2). The results on the Macbook Pro were similar, however with the Windows setup, multi-threading was less effective. In some cases on the Windows setup, the serial execution was faster. We will call all functions with the pattern package::function so no library calls are needed.

3. Combinations

First, we examine combinations without replacement chosen m at a time.

  1. RcppAlgos
  2. combinat
  3. gtools
  4. arrangements
  5. utils

How to:

set.seed(13)
tVec1 <- sort(sample(100, 20))
m <- 10
t1 <- RcppAlgos::comboGeneral(tVec1, m)  ## returns matrix with m columns
t3 <- combinat::combn(tVec1, m)  ## returns matrix with m rows
t4 <- gtools::combinations(20, m, tVec1)  ## returns matrix with m columns
identical(t(t3), t4) ## must transpose to compare
#> [1] TRUE
t5 <- arrangements::combinations(tVec1, m)
identical(t1, t5)
#> [1] TRUE
t6 <- utils::combn(tVec1, m)  ## returns matrix with m rows
identical(t(t6), t4) ## must transpose to compare
#> [1] TRUE

Benchmark:

microbenchmark(
    cbRcppAlgosPar = RcppAlgos::comboGeneral(tVec1, m,
                                             nThreads = numThreads),
    cbRcppAlgosSer = RcppAlgos::comboGeneral(tVec1, m),
    cbGtools = gtools::combinations(20, m, tVec1),
    cbCombinat = combinat::combn(tVec1, m),
    cbUtils = utils::combn(tVec1, m),
    cbArrangements = arrangements::combinations(tVec1, m),
    unit = "relative"
)
#> Warning in microbenchmark(cbRcppAlgosPar = RcppAlgos::comboGeneral(tVec1, : less accurate
#> nanosecond times to avoid potential integer overflows
#> Unit: relative
#>            expr     min      lq    mean  median      uq     max neval
#>  cbRcppAlgosPar   1.000   1.000   1.000   1.000   1.000  1.0000   100
#>  cbRcppAlgosSer   2.712   2.599   1.280   2.497   2.477  0.2666   100
#>        cbGtools 739.325 686.803 271.623 679.894 661.560 11.7178   100
#>      cbCombinat 173.836 166.265  76.735 176.052 191.945  4.9075   100
#>         cbUtils 179.895 170.345  71.639 169.661 178.385  3.8900   100
#>  cbArrangements   2.717   2.995   1.975   2.835   2.935  0.8195   100

Now, we examine combinations with replacement chosen m at a time.

  1. RcppAlgos
  2. gtools
  3. arrangements

How to:

set.seed(97)
tVec2 <- sort(rnorm(12))
m  <- 9
t1 <- RcppAlgos::comboGeneral(tVec2, m, repetition = TRUE)
t3 <- gtools::combinations(12, m, tVec2, repeats.allowed = TRUE)
identical(t1, t3)
#> [1] TRUE
t4 <- arrangements::combinations(tVec2, m, replace = TRUE)
identical(t1, t4)
#> [1] TRUE

Benchmark:

microbenchmark(
    cbRcppAlgosPar = RcppAlgos::comboGeneral(tVec2, m, TRUE,
                                             nThreads = numThreads),
    cbRcppAlgosSer = RcppAlgos::comboGeneral(tVec2, m, TRUE),
    cbGtools = gtools::combinations(12, m, tVec2, repeats.allowed = TRUE),
    cbArrangements = arrangements::combinations(tVec2, m, replace = TRUE),
    unit = "relative"
)
#> Unit: relative
#>            expr     min      lq     mean  median       uq     max neval
#>  cbRcppAlgosPar   1.000   1.000   1.0000   1.000   1.0000  1.0000   100
#>  cbRcppAlgosSer   1.987   2.382   0.9173   2.347   1.2776  0.9733   100
#>        cbGtools 670.126 517.561 103.8726 523.135 177.0940 12.0440   100
#>  cbArrangements   2.300   2.582   0.8294   2.542   0.9212  1.0089   100

4. Permutations

First, we examine permutations without replacement chosen m at a time.

  1. RcppAlgos
  2. gtools
  3. arrangements

How to:

tVec3 <- as.integer(c(2, 3, 5, 7, 11, 13, 17, 19, 23, 29))
t1 <- RcppAlgos::permuteGeneral(tVec3, 6)
t3 <- gtools::permutations(10, 6, tVec3)
identical(t1, t3)
#> [1] TRUE
t4 <- arrangements::permutations(tVec3, 6)
identical(t1, t4)
#> [1] TRUE

Benchmark:

microbenchmark(
    cbRcppAlgosPar = RcppAlgos::permuteGeneral(tVec3, 6,
                                               nThreads = numThreads),
    cbRcppAlgosSer = RcppAlgos::permuteGeneral(tVec3, 6),
    cbGtools = gtools::permutations(10, 6, tVec3),
    cbArrangements = arrangements::permutations(tVec3, 6),
    unit = "relative"
)
#> Unit: relative
#>            expr     min      lq    mean  median      uq     max neval
#>  cbRcppAlgosPar   1.000   1.000   1.000   1.000   1.000  1.0000   100
#>  cbRcppAlgosSer   1.204   1.553   1.522   1.523   1.509  0.5722   100
#>        cbGtools 357.078 308.978 288.396 301.611 292.862 64.8564   100
#>  cbArrangements   2.356   2.361   2.183   2.292   2.224  0.4605   100

Next, we examine permutations without replacement with a general vector (returning all permutations).

  1. RcppAlgos
  2. gtools
  3. combinat
  4. multicool
  5. arrangements

How to:

tVec3Prime <- tVec3[1:9]
## For RcppAlgos, arrangements, & gtools (see above)

t4 <- combinat::permn(tVec3Prime) ## returns a list of vectors
## convert to a matrix
t4 <- do.call(rbind, t4)
t5 <- multicool::allPerm(multicool::initMC(tVec3Prime)) ## returns a matrix with n columns
all.equal(t4[do.call(order,as.data.frame(t4)),],
          t5[do.call(order,as.data.frame(t5)),])
#> [1] TRUE

Benchmark:

microbenchmark(
    cbRcppAlgosPar = RcppAlgos::permuteGeneral(tVec3Prime,
                                               nThreads = numThreads),
    cbRcppAlgosSer = RcppAlgos::permuteGeneral(tVec3Prime),
    cbGtools = gtools::permutations(9, 9, tVec3Prime),
    cbCombinat = combinat::permn(tVec3Prime),
    cbMulticool = multicool::allPerm(multicool::initMC(tVec3Prime)),
    cbArrangements = arrangements::permutations(tVec3Prime),
    times = 25,
    unit = "relative"
)
#> Unit: relative
#>            expr      min       lq     mean   median       uq    max neval
#>  cbRcppAlgosPar    1.000    1.000    1.000    1.000    1.000    1.0    25
#>  cbRcppAlgosSer    1.555    2.187    2.616    2.190    2.274   10.3    25
#>        cbGtools 1913.125 1850.589 1893.918 1877.707 1915.601 2124.5    25
#>      cbCombinat  508.360  512.182  562.042  532.123  595.722  715.3    25
#>     cbMulticool  103.061  103.694  128.480  118.169  127.118  208.3    25
#>  cbArrangements    3.216    3.583   13.566    3.544    3.561  139.2    25

Now, we examine permutations without replacement for 1:n (returning all permutations).

  1. RcppAlgos
  2. gtools
  3. combinat
  4. multicool
  5. partitions
  6. arrangements

How to:

t1 <- partitions::perms(9)  ## returns an object of type 'partition' with n rows
identical(t(as.matrix(t1)), RcppAlgos::permuteGeneral(9))
#> [1] TRUE

Benchmark:

microbenchmark(
    cbRcppAlgosPar = RcppAlgos::permuteGeneral(9, nThreads = numThreads),
    cbRcppAlgosSer = RcppAlgos::permuteGeneral(9),
    cbGtools = gtools::permutations(9, 9),
    cbCombinat = combinat::permn(9),
    cbMulticool = multicool::allPerm(multicool::initMC(1:9)),
    cbPartitions = partitions::perms(9),
    cbArrangements = arrangements::permutations(9),
    times = 25,
    unit = "relative"
)
#> Unit: relative
#>            expr      min       lq     mean   median       uq     max neval
#>  cbRcppAlgosPar    1.000    1.000    1.000    1.000    1.000   1.000    25
#>  cbRcppAlgosSer    1.583    2.218    2.587    2.261    2.591   1.814    25
#>        cbGtools 2010.303 1855.443 1266.853 1898.458 1903.055 217.422    25
#>      cbCombinat  511.196  515.197  392.252  533.737  652.125  86.305    25
#>     cbMulticool  108.152  103.188   80.469  108.227  120.804  23.504    25
#>    cbPartitions    6.139    6.018    7.167    5.993    6.403   9.446    25
#>  cbArrangements    4.089    3.797    3.135    3.791    3.760   1.858    25

Lastly, we examine permutations with replacement.

  1. RcppAlgos
  2. gtools
  3. arrangements

How to:

t1 <- RcppAlgos::permuteGeneral(tVec3, 5, repetition = TRUE)
t3 <- gtools::permutations(10, 5, tVec3, repeats.allowed = TRUE)
t4 <- arrangements::permutations(x = tVec3, k = 5, replace = TRUE)

identical(t1, t3)
#> [1] TRUE
identical(t1, t4)
#> [1] TRUE

This next benchmark is a little surprising given the results up until now.

microbenchmark(
    cbRcppAlgosPar = RcppAlgos::permuteGeneral(
        tVec3, 5, TRUE, nThreads = numThreads
    ),
    cbRcppAlgosSer = RcppAlgos::permuteGeneral(tVec3, 5, TRUE),
    cbGtools = gtools::permutations(10, 5, tVec3, repeats.allowed = TRUE),
    cbArrangements = arrangements::permutations(tVec3, 5, replace = TRUE),
    unit = "relative"
)
#> Unit: relative
#>            expr   min    lq  mean median    uq   max neval
#>  cbRcppAlgosPar 1.000 1.000 1.000  1.000 1.000 1.000   100
#>  cbRcppAlgosSer 1.307 1.669 1.465  1.561 1.513 1.015   100
#>        cbGtools 6.364 6.188 5.448  5.762 5.434 1.625   100
#>  cbArrangements 2.584 2.442 1.824  2.265 2.135 0.117   100

That is not a typo… gtools::permutations is almost as fast as the other compiled functions. I encourage the reader to go check out the source code of gtools::permutations as it is one of the most elegant displays of programming out there (R or otherwise).

5. Multisets

First, we examine combinations of multisets.

  1. RcppAlgos
  2. arrangements

To find combinations/permutations of multisets, with RcppAlgos use the freqs arguments to specify how many times each element of the source vector, v, is repeated.

set.seed(496)
myFreqs <- sample(1:5, 12, replace = TRUE)
## This is how many times each element will be repeated
tVec4 <- as.integer(c(1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233))
t1 <- RcppAlgos::comboGeneral(tVec4, 12, freqs = myFreqs)
t3 <- arrangements::combinations(tVec4, 12, freq = myFreqs)
identical(t1, t3)
#> [1] TRUE

Benchmark:

microbenchmark(
    cbRcppAlgosPar = RcppAlgos::comboGeneral(
        tVec4, 12, freqs = myFreqs, nThreads = numThreads
    ),
    cbRcppAlgosSer = RcppAlgos::comboGeneral(tVec4, 12, freqs = myFreqs),
    cbArrangements = arrangements::combinations(tVec4, 12, freq = myFreqs),
    unit = "relative"
)
#> Unit: relative
#>            expr   min    lq  mean median    uq    max neval
#>  cbRcppAlgosPar 1.000 1.000 1.000  1.000 1.000 1.0000   100
#>  cbRcppAlgosSer 3.197 3.012 2.003  2.831 2.681 0.1658   100
#>  cbArrangements 9.391 7.830 4.901  7.252 6.731 0.3140   100

For permutations of multisets chosen m at a time, we have:

  1. RcppAlgos
  2. arrangements

How to:

set.seed(123)
tVec5 <- sort(runif(5))
t1 <- RcppAlgos::permuteGeneral(tVec5, 8, freqs = rep(4, 5))
t3 <- arrangements::permutations(tVec5, 8, freq = rep(4, 5))
identical(t1, t3)
#> [1] TRUE

Benchmark:

microbenchmark(
    cbRcppAlgosPar = RcppAlgos::permuteGeneral(
        tVec5, 8, freqs = rep(4, 5), nThreads = numThreads
    ),
    cbRcppAlgosSer = RcppAlgos::permuteGeneral(tVec5, 8, freqs = rep(4, 5)),
    cbArrangements = arrangements::permutations(tVec5, 8, freq = rep(4, 5)),
    unit = "relative"
)
#> Unit: relative
#>            expr   min    lq  mean median    uq   max neval
#>  cbRcppAlgosPar 1.000 1.000 1.000  1.000 1.000 1.000   100
#>  cbRcppAlgosSer 3.336 3.326 2.990  3.330 3.517 2.106   100
#>  cbArrangements 3.751 3.746 3.346  3.757 3.840 2.305   100

For permutations of multisets returning all permutations, we have:

  1. RcppAlgos
  2. multicool
  3. partitions
  4. arrangements

How to:

tVec6 <- (1:5)^3
## For multicool, you must have the elements explicitly repeated
tVec6Prime <- rep(tVec6, times = rep(2, 5))

## for comparison
t1 <- RcppAlgos::permuteGeneral(tVec6, freqs = rep(2, 5))
t2 <- partitions::multiset(tVec6Prime)
t3 <- multicool::allPerm(multicool::initMC(tVec6Prime))
t4 <- arrangements::permutations(tVec6, freq = rep(2, 5))

## the package partitions, returns class of integer
## whereas RcppAlgos preserves class of tVec6 (i.e. numeric)
all.equal(t1, t(as.matrix(t2)))
#> [1] TRUE
identical(t1[do.call(order,as.data.frame(t1)),],
          t3[do.call(order,as.data.frame(t3)),])
#> [1] TRUE
identical(t1, t4)
#> [1] TRUE

Benchmark:

microbenchmark(
    cbRcppAlgosPar = RcppAlgos::permuteGeneral(
        tVec6, freqs = rep(2, 5), nThreads = numThreads
    ),
    cbRcppAlgosSer = RcppAlgos::permuteGeneral(tVec6, freqs = rep(2, 5)),
    cbMulticool = multicool::allPerm(multicool::initMC(tVec6Prime)),
    cbPartitions = partitions::multiset(tVec6Prime),
    cbArrangements = arrangements::permutations(tVec6, freq = rep(2, 5)),
    unit = "relative"
)
#> Unit: relative
#>            expr    min     lq   mean median     uq    max neval
#>  cbRcppAlgosPar  1.000  1.000  1.000  1.000  1.000 1.0000   100
#>  cbRcppAlgosSer  2.485  2.141  2.289  2.584  2.511 1.0250   100
#>     cbMulticool 80.195 66.237 45.540 64.971 66.057 3.5655   100
#>    cbPartitions  5.731  4.786  3.925  4.719  4.953 0.4558   100
#>  cbArrangements  2.999  2.907  3.270  2.966  2.906 3.1214   100

6. Summary

Both gtools and combinat are well established packages for rearranging elements of a vector. With gtools there are a few more options (see the overview above) and with combinat, you can rearrange factors. With multicool, one is able to rearrange multisets. Although partitions is limited for the purposes of this question, it is a powerhouse packed with highly efficient functions for dealing with integer partitions.

arrangements

  1. The output is in lexicographical order.
  2. Allows the user to specify the format via the layout argument (“row : row-major”, “colmnn : column-major”, and “list : list”).
  3. Offers convenient methods such as collect & getnext when working with iterators.
  4. Allows for the generation of more than 2^31 - 1 combinations/permutations via getnext. N.B. RcppAlgos (via nextItem) and multicool (via nextPerm) are also capable of doing this.
  5. GMP support allows for exploration of combinations/permutations of vectors with many results.

Observe:

icomb <- arrangements::icombinations(1000, 7)
icomb$getnext()
#> [1] 1 2 3 4 5 6 7

icomb$getnext(d = 5)
#>      [,1] [,2] [,3] [,4] [,5] [,6] [,7]
#> [1,]    1    2    3    4    5    6    8
#> [2,]    1    2    3    4    5    6    9
#> [3,]    1    2    3    4    5    6   10
#> [4,]    1    2    3    4    5    6   11
#> [5,]    1    2    3    4    5    6   12

This feature is really nice when you only want a few combinations/permutations. With traditional methods, you would have to generate all combinations/permutations and then subset. This would render the previous example impossible as there are more than 10^17 results (i.e. ncombinations(1000, 7, bigz = TRUE) = 194280608456793000).

This feature along with the improvements to the generators in arrangements, allow it to be very efficient with respect to memory.

RcppAlgos

  1. The output is in lexicographical order.
  2. There are convenient constraint features that we will not discuss here as they are off-topic for this question. I will only note that the types of problems that can be solved by utilizing these features were the motivation for creating this package (partitions, subset-sum, etc.).
  3. GMP support allows for exploration of combinations/permutations of vectors with many results.
  4. Produce results in parallel using the Parallel or nThreads arguments.
  5. Similar to combn, there is a FUN argument for applying a function to each result (See also FUN.VALUE).
  6. Provides flexible and merory efficient iterators that allow for bidirectional iteration as well as random access.
    • nextItem|nextNIter|nextRemaining: Retrieve the next lexicographical result(s)
    • prevItem|prevNIter|prevRemaining: Retrieve the previous lexicographical result(s)
    • front|back|[[: Random access methods
    • Allows for easy generation of more than 2^31 - 1 results from any starting place.

Observe:

iter <- RcppAlgos::comboIter(1000, 7)

# first combinations
iter@nextIter()
#> [1] 1 2 3 4 5 6 7

# next 5 combinations
iter@nextNIter(5)
#>      [,1] [,2] [,3] [,4] [,5] [,6] [,7]
#> [1,]    1    2    3    4    5    6    8
#> [2,]    1    2    3    4    5    6    9
#> [3,]    1    2    3    4    5    6   10
#> [4,]    1    2    3    4    5    6   11
#> [5,]    1    2    3    4    5    6   12

# from the current state, the previous combination
iter@prevIter()
#> [1]  1  2  3  4  5  6 11

# the last combination
iter@back()
#> [1]  994  995  996  997  998  999 1000

# the 5th combination
iter[[5]]
#> [1]  1  2  3  4  5  6 11

# you can even pass a vector of indices
iter[[c(1, 3, 5)]]
#>      [,1] [,2] [,3] [,4] [,5] [,6] [,7]
#> [1,]    1    2    3    4    5    6    7
#> [2,]    1    2    3    4    5    6    9
#> [3,]    1    2    3    4    5    6   11

# start iterating from any index
iter[[gmp::pow.bigz(2, 31)]]
#> [1]   1   2   3  17 138 928 954

# get useful info about the current state
iter@summary()
#> $description
#> [1] "Combinations of 1000 choose 7"
#> 
#> $currentIndex
#> Big Integer ('bigz') :
#> [1] 2147483648
#> 
#> $totalResults
#> Big Integer ('bigz') :
#> [1] 194280608456793000
#> 
#> $totalRemaining
#> Big Integer ('bigz') :
#> [1] 194280606309309352

## get next ieteration
iter@nextIter()
#> [1]   1   2   3  17 138 928 955

In case you were wondering how each package scales, I will leave you with this final example that measures how fast RcppAlgos and the arrangements packages can generate over 100 million results. Note, gtools::combinations is left out here as it will throw the error: evaluation nested too deeply.... We also leave out combn as it takes quite some time to execute. Curiously, the differences in memory usage between utils::combn and combinat::combn is quite bizarre given that they are only marginally different (see ?utils::combn under the “Authors” section).

Observe:

set.seed(2187)
tVec7 <- sort(sample(10^7, 10^3))

## 166,167,000 Combinations
system.time(RcppAlgos::comboGeneral(tVec7, 3))
#>    user  system elapsed 
#>   0.386   0.105   0.490
system.time(arrangements::combinations(x = tVec7, k = 3))
#>    user  system elapsed 
#>   0.439   0.105   0.545

## 124,251,000 Permuations
system.time(RcppAlgos::permuteGeneral(tVec7[1:500], 3))
#>    user  system elapsed 
#>   0.141   0.076   0.218
system.time(arrangements::permutations(x = tVec7[1:500], k = 3))
#>    user  system elapsed 
#>   0.386   0.077   0.463

7. Memory

When executing comboGeneral as well as arrangements::combinations, the memory will jump almost 2 Gbs before calling gc. This seems about right as #rows * #nols * bytesPerCell / 2^30 bytes = choose(1000,3) * 3 * 4 / 2^30 bytes = (166167000 * 3 * 4)/2^30 = 1.857 Gbs). However, when executing combn, the memory behavior was eratic (e.g. sometimes it would use all 16 Gb of memory and other times it would only spike a couple of Gbs). When I tested this on the Windows set-up, it would often crash.

We can confirm this using Rprof along with summaryRporf. Observe:

Rprof("RcppAlgos.out", memory.profiling = TRUE)
t1 <- RcppAlgos::comboGeneral(tVec7, 3)
Rprof(NULL)
head(summaryRprof("RcppAlgos.out", memory = "both")$by.total, n = 1)
#>                           total.time total.pct mem.total self.time self.pct
#> "RcppAlgos::comboGeneral"       0.42       100      1902      0.42      100

Rprof("arrangements.out", memory.profiling = TRUE)
t3 <- arrangements::combinations(tVec7, 3)
Rprof(NULL)
head(summaryRprof("arrangements.out", memory = "both")$by.total, n = 1)
#>                              total.time total.pct mem.total self.time self.pct
#> "arrangements::combinations"        0.5       100      1902       0.5      100

With RcppAlgos & arrangements, mem.total registers just over 1900 Mb.

And here is the memory profile on a smaller vector.

tVec7Prime <- tVec7[1:300]

Rprof("combinat.out", memory.profiling = TRUE)
t3 <- combinat::combn(tVec7Prime, 3)
Rprof(NULL)
head(summaryRprof("combinat.out", memory = "both")$by.total, n = 1)
#>                   total.time total.pct mem.total self.time self.pct
#> "combinat::combn"        2.1       100      1055      1.98    94.29

Rprof("utils.out", memory.profiling = TRUE)
t4 <- utils::combn(tVec7Prime, 3)
Rprof(NULL)
head(summaryRprof("utils.out", memory = "both")$by.total, n = 1)
#>                total.time total.pct mem.total self.time self.pct
#> "utils::combn"        1.6       100      2059       1.6      100

Rprof("gtools.out", memory.profiling = TRUE)
t5 <- gtools::combinations(300, 3, tVec7Prime)
Rprof(NULL)
head(summaryRprof("gtools.out", memory = "both")$by.total, n = 1)
#>         total.time total.pct mem.total self.time self.pct
#> "rbind"       1.62       100      6659      1.46    90.12

Interestingly, utils::combn and combinat::combn use different amounts of memory and take different amounts of time to execute. This does not hold up with smaller vectors:

microbenchmark(combinat::combn(2:13, 6), utils::combn(2:13, 6))
#> Unit: microseconds
#>                      expr   min    lq  mean median    uq   max neval
#>  combinat::combn(2:13, 6) 313.4 326.7 329.4  328.1 330.4 370.6   100
#>     utils::combn(2:13, 6) 378.3 393.1 397.0  395.2 399.2 451.2   100

And with gtools the total memory used is a little over 3x as much as utils. It should be noted that for these 3 packages, I obtained different results every-time I ran them (e.g. for combinat::combn sometimes I would get 9000 Mb and then I would get 13000 Mb).

Still, none can match RcppAlgos OR arrangements. Both only use 51 Mb when ran on the example above.

benchmark script: https://github.com/jwood000/RcppAlgos/blob/main/scripts/SO_Comb_Perm_in_R.R

*: An homage to A Walk through Combinatorics by Miklós Bóna

Vorfeld answered 21/3, 2014 at 20:43 Comment(8)
Excellent review! I guess I understand why in some cases, iterpc is not performing as efficiently as RcppAlgos because of the nature of generator. iterpc needs to initialize a generator object before performing the actual algorithm. I am actually refactoring iterpc as a new package and paradoxically, I am trying to get rid of RCpp and to use R C api solely. Again, excellent package RcppAlgos!Vange
@RandyLai, thanks for the kind words. I'm glad this review can help in some way. I've heard the C api in R can be tricky to say the least. I wish you the best in your goals.Vorfeld
@JosephWood I have a problem with permutation. I wonder if the permuteGeneral() function can be applied to a list in list to calculate all possible permutations.i.e expand.grid(1:10,1:100,1:5) gives different length of vector of permutations. And it is also applicable with list. Consider I have a list mylist = list(list(c(1,2,3,3,4),c(10,20,30,30,40,40,40,55)),list(c(2,4,6,6),1:10,1:50)) and if a use sapply(mylist,expand.grid) it gives expected result. I wonder if this can be done with permuteGeneral() function since expand.grid() function takes a lot time with higher dimensions.Committeeman
@maydin, expand.grid and permuteGeneral do two different things. The former gives the Cartesian product and the latter is pure permutations. I’ve flirted with implementing a Cartesian product analog to permuteGeneral, but I’ve hit many road blocks. It is on my list though!!Vorfeld
@JosephWood Thanks for the answer.I actually asked a question in here without talking about the list implementation. I used just a data.frame instead for the sake of simplicity. And the RcppAlgos package has been suggested to me. Your package is extremely fast! And I thought that there may be a solution. I wish all rocks on your road become broken :).Committeeman
I am gobsmacked! What a thorough exploration of the topic! Thanks!Agathy
ok but where are base R solutions?Appel
@jangorecki, the only base R solution I know of is combn which is included. Are there other base R solutions? I will happily add it. The only other base R solution i can think of that some may think should be include is expand.grid. The problem with this function is that it doesn’t give permutations or combinations, but rather the Cartesian product, which is fundamentally different.Vorfeld
V
35

EDIT: I have updated the answer to use a more efficient package arrangements

Getting start of using arrangement

arrangements contains some efficient generators and iterators for permutations and combinations. It has been demonstrated that arrangements outperforms most of the existing packages of similar kind. Some benchmarks could be found here.

Here are the answers to the above questions

# 1) combinations: without replacement: distinct items

combinations(5, 2)

      [,1] [,2]
 [1,]    1    2
 [2,]    1    3
 [3,]    1    4
 [4,]    1    5
 [5,]    2    3
 [6,]    2    4
 [7,]    2    5
 [8,]    3    4
 [9,]    3    5
[10,]    4    5


# 2) combinations: with replacement: distinct items

combinations(5, 2, replace=TRUE)

      [,1] [,2]
 [1,]    1    1
 [2,]    1    2
 [3,]    1    3
 [4,]    1    4
 [5,]    1    5
 [6,]    2    2
 [7,]    2    3
 [8,]    2    4
 [9,]    2    5
[10,]    3    3
[11,]    3    4
[12,]    3    5
[13,]    4    4
[14,]    4    5
[15,]    5    5



# 3) combinations: without replacement: non distinct items

combinations(x = c("a", "b", "c"), freq = c(2, 1, 1), k = 2)

     [,1] [,2]
[1,] "a"  "a" 
[2,] "a"  "b" 
[3,] "a"  "c" 
[4,] "b"  "c" 



# 4) combinations: with replacement: non distinct items

combinations(x = c("a", "b", "c"), k = 2, replace = TRUE)  # as `freq` does not matter

     [,1] [,2]
[1,] "a"  "a" 
[2,] "a"  "b" 
[3,] "a"  "c" 
[4,] "b"  "b" 
[5,] "b"  "c" 
[6,] "c"  "c" 

# 5) permutations: without replacement: distinct items

permutations(5, 2)

      [,1] [,2]
 [1,]    1    2
 [2,]    1    3
 [3,]    1    4
 [4,]    1    5
 [5,]    2    1
 [6,]    2    3
 [7,]    2    4
 [8,]    2    5
 [9,]    3    1
[10,]    3    2
[11,]    3    4
[12,]    3    5
[13,]    4    1
[14,]    4    2
[15,]    4    3
[16,]    4    5
[17,]    5    1
[18,]    5    2
[19,]    5    3
[20,]    5    4



# 6) permutations: with replacement: distinct items

permutations(5, 2, replace = TRUE)

      [,1] [,2]
 [1,]    1    1
 [2,]    1    2
 [3,]    1    3
 [4,]    1    4
 [5,]    1    5
 [6,]    2    1
 [7,]    2    2
 [8,]    2    3
 [9,]    2    4
[10,]    2    5
[11,]    3    1
[12,]    3    2
[13,]    3    3
[14,]    3    4
[15,]    3    5
[16,]    4    1
[17,]    4    2
[18,]    4    3
[19,]    4    4
[20,]    4    5
[21,]    5    1
[22,]    5    2
[23,]    5    3
[24,]    5    4
[25,]    5    5


# 7) permutations: without replacement: non distinct items

permutations(x = c("a", "b", "c"), freq = c(2, 1, 1), k = 2)

     [,1] [,2]
[1,] "a"  "a" 
[2,] "a"  "b" 
[3,] "a"  "c" 
[4,] "b"  "a" 
[5,] "b"  "c" 
[6,] "c"  "a" 
[7,] "c"  "b" 



# 8) permutations: with replacement: non distinct items

permutations(x = c("a", "b", "c"), k = 2, replace = TRUE)  # as `freq` doesn't matter

      [,1] [,2]
 [1,] "a"  "a" 
 [2,] "a"  "b" 
 [3,] "a"  "c" 
 [4,] "b"  "a" 
 [5,] "b"  "b" 
 [6,] "b"  "c" 
 [7,] "c"  "a" 
 [8,] "c"  "b" 
 [9,] "c"  "c" 

Compare to other packages

There are few advantages of using arrangements over the existing packages.

  1. Integral framework: you don't have to use different packages for different methods.

  2. It is very efficient. See https://randy3k.github.io/arrangements/articles/benchmark.html for some benchmarks.

  3. It is memory efficient, it is able to generate all 13! permutation of 1 to 13, existing packages will fail to do so because of the limitation of matrix size. The getnext() method of the iterators allow users to get the arrangements one by one.

  4. The generated arrangements are in dictionary order which may be desired for some users.

Vange answered 21/3, 2014 at 20:43 Comment(6)
I think this answer might be improved by showing some output that tells the story of each "question."Tapp
This answer is an advertisement for your package. If you're going to do that, please demonstrate the various capabilities and why they are superior to previous methods. As it is, in my opinion, this question and answer does not supplant all other questions about combinations/permutations (and it looks like this is your intent).Breadbasket
Hi Matthew, sorry to make you feel like it is an advertisement (indeed it is :)..) If you to go to see the editing history of my answer, you will see that the old answers are using other packages. In particularly, there is no package in doing k-permeation of multi set, see the home-brew function here. Since I was unsatisfied with the existing packages, so I decided to write my own package.Vange
But I agree with you, I should compare my package with the existing packages.Vange
Might I suggest that you change your function names. The functions combinations/permutations from gtools are so widely used, your package could potentially break dependencies/legacy code/etc. When developing packages I like to use the adage articulated by @DirkEddelbuettel: "Don't do harm".Vorfeld
I guess it is in general not an issue because the recommended way for a package to call a function in another package is <package>::<function>. Unless scripts are concerned, having similar names should not do much harm.....IMO.Vange
F
0

The package crossdes seems like a worthy addition to the list. Check this tutorial.

I will copy the text from the tutorial in case the information in that link gets lost.

For a BIBD there are v treatments repeated r times in b blocks of k observations. There is a fifth parameter lambda that records the number of blocks where every pair of treatment occurs in the design.

We first load the crossdes package in our sessions:

require(crossdes)

The function find.BIB is used to generate a block design with specific number of treatments, blocks (rows of the design) and elements per block (columns of the design).

Consider an example with five treatments in four blocks of three elements. We can create a block design via:

> find. BIB(5, 4, 3)
       [,1] [,2] [,3]
 [1,]    1    3    4
 [2,]    2    4    5
 [3,]    2    3    5
 [4,]    1    2    5

This design is not a BIBD because the treatments are not all repeated the same number of times in the design and we can check this with the isGYD function. For this example:

> isGYD(find. BIB(5, 4, 3))

  [1] The design is neither balanced w.r.t. rows nor w.r.t. columns.

This confirms what we can see from the design.


The tutorial keeps going with seven treatments, seven blocks, and three elements but I won't include those.

Frication answered 5/10, 2022 at 8:43 Comment(1)
While this link may answer the question, it is better to include the essential parts of the answer here and provide the link for reference. Link-only answers can become invalid if the linked page changes. - From ReviewScriber

© 2022 - 2024 — McMap. All rights reserved.