Unordered combinations of all lengths
Asked Answered
I

3

25

I am looking a function that return me all the unordered combination of a vector. eg

x <- c('red','blue','black')
uncomb(x)
[1]'red'
[2]'blue'
[3]'black'
[4]'red','blue'
[5]'blue','black'
[6]'red','black'
[7]'red','blue','black'

I guess that there is a function in some library that do this, but in can't find it. I am trying with permutations of gtool but it is not the function i am looking for.

Isochronize answered 14/1, 2015 at 22:25 Comment(1)
I won't post my answer because it is really near to the Richard Scriven one. However, if you want to exploit the gtool package, you can use combinations and not permutations: sapply(seq_along(x), combinations, v = x, n = length(x))Romeyn
S
27

You could apply a sequence the length of x over the m argument of the combn() function.

x <- c("red", "blue", "black")
do.call(c, lapply(seq_along(x), combn, x = x, simplify = FALSE))
# [[1]]
# [1] "red"
# 
# [[2]]
# [1] "blue"
# 
# [[3]]
# [1] "black"
# 
# [[4]]
# [1] "red"  "blue"
# 
# [[5]]
# [1] "red"   "black"
# 
# [[6]]
# [1] "blue"  "black"
# 
# [[7]]
# [1] "red"   "blue"  "black"

If you prefer a matrix result, then you can apply stringi::stri_list2matrix() to the list above.

stringi::stri_list2matrix(
    do.call(c, lapply(seq_along(x), combn, x = x, simplify = FALSE)),
    byrow = TRUE
)
#      [,1]    [,2]    [,3]   
# [1,] "red"   NA      NA     
# [2,] "blue"  NA      NA     
# [3,] "black" NA      NA     
# [4,] "red"   "blue"  NA     
# [5,] "red"   "black" NA     
# [6,] "blue"  "black" NA     
# [7,] "red"   "blue"  "black"
Shack answered 14/1, 2015 at 22:29 Comment(5)
Yep - unlist(lapply(seq_along(x), combn, x=x, simplify=FALSE),recursive=FALSE) for another potential output variation. Unequal length data objects are ideal for a listUnderplay
I agree, but I was prompted in a comment to get closer to the desired output. Even though lapply(seq_along(x), combn, x = x) reads exactly as it should, which is column-wiseShack
The list (in my variation) is almost exactly what the OP presents in the question as a desired output. Using a matrix seems like it would be vastly more difficult to pass to other functions because of all the NA's.Underplay
I totally agree @Underplay - I've made the edit in the first part. For some reason, I prefer do.call(c, ...) over unlist(..., recursive = FALSE) thoughShack
Much of a muchness - "Tomato, tomato, let's call the whole thing off..."Underplay
I
9

I was re-directed here from List All Combinations With combn as this was one of the dupe targets. This is an old question and the answer provided by @RichScriven is very nice, but I wanted to give the community a few more options that are arguably more natural and more efficient (the last two).

We first note that the output is very similar to the Power Set. Calling powerSet from the rje package, we see that indeed our output matches every element from the power set except the first element which is equivalent to the Empty Set:

x <- c("red", "blue", "black")
rje::powerSet(x)
[[1]]
character(0)   ## empty set equivalent

[[2]]
[1] "red"

[[3]]
[1] "blue"

[[4]]
[1] "red"  "blue"

[[5]]
[1] "black"

[[6]]
[1] "red"   "black"

[[7]]
[1] "blue"  "black"

[[8]]
[1] "red"   "blue"  "black"

If you don't want the first element, you can easily add a [-1] to the end of your function call like so : rje::powerSet(x)[-1].

The next two solutions are from the newer packages arrangements and RcppAlgos (I am the author), that will offer the user great gains in efficiency. Both of these packages are capable of generating combinations of Multisets.

Why is this important?

It can be shown that there is a one-to-one mapping from the power set of A to all combinations of the multiset c(rep(emptyElement, length(A)), A) choose length(A), where emptyElement is a representation of the empty set (like zero or a blank). With this in mind, observe:

## There is also a function called combinations in the
## rje package, so we fully declare the function with
## scope operator
library(arrangements)
arrangements::combinations(x = c("",x), k = 3, freq = c(2, rep(1, 3)))
     [,1]  [,2]   [,3]   
[1,] ""    ""     "red"  
[2,] ""    ""     "blue" 
[3,] ""    ""     "black"
[4,] ""    "red"  "blue" 
[5,] ""    "red"  "black"
[6,] ""    "blue" "black"
[7,] "red" "blue" "black"

library(RcppAlgos)
comboGeneral(c("",x), 3, freqs = c(2, rep(1, 3)))
     [,1]  [,2]   [,3]   
[1,] ""    ""     "red"  
[2,] ""    ""     "blue" 
[3,] ""    ""     "black"
[4,] ""    "red"  "blue" 
[5,] ""    "red"  "black"
[6,] ""    "blue" "black"
[7,] "red" "blue" "black"

If you don't like dealing with blank elements and/or matrices, you can also return a list making use of lapply.

lapply(seq_along(x), comboGeneral, v = x)
[[1]]
     [,1]   
[1,] "red"  
[2,] "blue" 
[3,] "black"

[[2]]
     [,1]   [,2]   
[1,] "red"  "blue" 
[2,] "red"  "black"
[3,] "blue" "black"

[[3]]
     [,1]  [,2]   [,3]   
[1,] "red" "blue" "black"


lapply(seq_along(x), function(y) arrangements::combinations(x, y))
[[1]]
     [,1]   
[1,] "red"  
[2,] "blue" 
[3,] "black"

[[2]]
     [,1]   [,2]   
[1,] "red"  "blue" 
[2,] "red"  "black"
[3,] "blue" "black"

[[3]]
     [,1]  [,2]   [,3]   
[1,] "red" "blue" "black"

Now we show that the last two methods are much more efficient (N.B. I removed do.call(c, and simplify = FALSE from the answer provided by @RichSciven in order to compare generation of similar outputs. I also included rje::powerSet for good measure):

set.seed(8128)
bigX <- sort(sample(10^6, 20)) ## With this as an input, we will get 2^20 - 1 results.. i.e. 1,048,575
library(microbenchmark)
microbenchmark(powSetRje = powerSet(bigX),
               powSetRich = lapply(seq_along(bigX), combn, x = bigX),
               powSetArrange = lapply(seq_along(bigX), function(y) arrangements::combinations(x = bigX, k = y)),
               powSetAlgos = lapply(seq_along(bigX), comboGeneral, v = bigX),
               unit = "relative")

Unit: relative
          expr        min        lq      mean   median        uq      max neval
     powSetRje 64.4252454 44.063199 16.678438 18.63110 12.082214 7.317559   100
    powSetRich 61.6766640 43.027789 16.009151 17.88944 11.406994 7.222899   100
 powSetArrange  0.9508052  1.060309  1.080341  1.02257  1.262713 1.126384   100
   powSetAlgos  1.0000000  1.000000  1.000000  1.00000  1.000000 1.000000   100

Even further, arrangements comes equipped with an argument called layout which lets the user choose a particular format for their output. One of those is layout = "l" for list. It is similar to setting simplify = FALSE in combn and allows us to obtain output like that of powerSet. Observe:

do.call(c, lapply(seq_along(x), function(y) {
                    arrangements::combinations(x, y, layout = "l")
                  }))
[[1]]
[1] "red"

[[2]]
[1] "blue"

[[3]]
[1] "black"

[[4]]
[1] "red"  "blue"

[[5]]
[1] "red"   "black"

[[6]]
[1] "blue"  "black"

[[7]]
[1] "red"   "blue"  "black"

And the benchmarks:

microbenchmark(powSetRje = powerSet(bigX)[-1],
               powSetRich = do.call(c, lapply(seq_along(bigX), combn, x = bigX, simplify = FALSE)),
               powSetArrange = do.call(c, lapply(seq_along(bigX), function(y) arrangements::combinations(bigX, y, layout = "l"))),
               times = 15, unit = "relative")
Unit: relative
          expr      min       lq     mean   median       uq      max neval
     powSetRje 5.539967 4.785415 4.277319 4.387410 3.739593 3.543570    15
    powSetRich 4.994366 4.306784 3.863612 3.932252 3.334708 3.327467    15
 powSetArrange 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000    15    15
Idiopathy answered 31/3, 2018 at 18:57 Comment(1)
How would I use this function to get all combinations for a range of lengths? E.g. if my input vector were x <- c("red", "blue", "black", "green") and I want to generate all combinations of lengths 2 and 3? (in the matrix form, not the list form)Irritate
S
1

A solution with matrix result without using any external packages:

store <- lapply(
  seq_along(x), 
  function(i) {
    out <- combn(x, i) 
    N <- NCOL(out)
    length(out) <- length(x) * N
    matrix(out, ncol = N, byrow = TRUE)
})
t(do.call(cbind, store))

     [,1]    [,2]    [,3]   
[1,] "red"   NA      NA     
[2,] "blue"  NA      NA     
[3,] "black" NA      NA     
[4,] "red"   "black" NA     
[5,] "blue"  "blue"  NA     
[6,] "red"   "black" NA     
[7,] "red"   "blue"  "black"
Samara answered 3/4, 2020 at 14:25 Comment(1)
You could change 3L to length(x) for a more general solutionIdiopathy

© 2022 - 2024 — McMap. All rights reserved.