Asked 20/4, 2023 at 13:20 Answered 20/4, 2023 at 21:43

Feels like this should be straightforward but I have been through stack overflow and combn help, and cannot see solution.

The below players need to be in teams of 3 versus 2. I need to find all possible combinations of teams. For example two possible teams are "Ross", "Bobby" and "Casper" in one team, and "Max" and "Jake" in the other team. How can I code this?

players <- c("Ross", "Bobby", "Max", "Casper", "Jake")

Filmy answered 20/4, 2023 at 13:20 Comment(3)

FYI, many developers strongly discourage the use of left-to-right assignment in scripts. It makes the assignment action non-obvious, which drastically reduces clarity about the data flow (which variables exist, and what is their current value?) and thus makes code harder to read, debug and modify. – Relax 20/4, 2023 at 13:41

I use pipes in my code and put assignment on right so everything flows from top to bottom – Filmy 20/4, 2023 at 15:5

Please have a look at the second-to-last paragraph of this comment explaining why this is a bad idea for code maintainability. – Relax 20/4, 2023 at 15:28

I think the key step is to randomly choose 3 (or 2) out of 5 players to generate the first team, and the remaining are naturally assigned to the second team.

Probably you can try it like this

combn(players,
  3, 
  function(x) list(team1 = x, team2 = players[!players %in% x]),
  simplify = FALSE
)

or if you don't mind the time efficiency, you can use setdiff instead

combn(players,
    3,
    function(x) list(team1 = x, team2 = setdiff(players, x)),
    simplify = FALSE
  )

and either of which gives

[[1]]
[[1]]$team1
[1] "Ross"  "Bobby" "Max"

[[1]]$team2
[1] "Casper" "Jake"


[[2]]
[[2]]$team1
[1] "Ross"   "Bobby"  "Casper"

[[2]]$team2
[1] "Max"  "Jake"


[[3]]
[[3]]$team1
[1] "Ross"  "Bobby" "Jake"

[[3]]$team2
[1] "Max"    "Casper"


[[4]]
[[4]]$team1
[1] "Ross"   "Max"    "Casper"

[[4]]$team2
[1] "Bobby" "Jake"


[[5]]
[[5]]$team1
[1] "Ross" "Max"  "Jake"

[[5]]$team2
[1] "Bobby"  "Casper"


[[6]]
[[6]]$team1
[1] "Ross"   "Casper" "Jake"

[[6]]$team2
[1] "Bobby" "Max"


[[7]]
[[7]]$team1
[1] "Bobby"  "Max"    "Casper"

[[7]]$team2
[1] "Ross" "Jake"


[[8]]
[[8]]$team1
[1] "Bobby" "Max"   "Jake"

[[8]]$team2
[1] "Ross"   "Casper"


[[9]]
[[9]]$team1
[1] "Bobby"  "Casper" "Jake"

[[9]]$team2
[1] "Ross" "Max"


[[10]]
[[10]]$team1
[1] "Max"    "Casper" "Jake"

[[10]]$team2
[1] "Ross"  "Bobby"

Benchmarking

players <- c("Ross", "Bobby", "Max", "Casper", "Jake")

microbenchmark(
  f1 = combn(players,
    3,
    function(x) list(team1 = x, team2 = players[!players %in% x]),
    simplify = FALSE
  ),
  f2 = combn(players,
    3,
    function(x) list(team1 = x, team2 = setdiff(players, x)),
    simplify = FALSE
  )
)

and we will see

Unit: microseconds
 expr    min      lq      mean   median       uq      max neval
   f1 26.200 28.0510  51.55586  29.4515  32.5015 1935.301   100
   f2 97.301 99.8505 119.44610 103.6010 111.1510 1162.501   100

Roubaix answered 20/4, 2023 at 13:25 Comment(7)

players[!players %in% x]) can be simplified to setdiff(players, x). – Relax 20/4, 2023 at 13:43

@KonradRudolph yes, I was thinking of setdiff but it would be slow for larger sizes of players or x, so I use %in% instead – Roubaix 20/4, 2023 at 13:45

Hmm, I don't think it should be slower. Am I wrong? – Relax 20/4, 2023 at 13:46

@KonradRudolph I added a benchmark – Roubaix 20/4, 2023 at 13:49

Okay, I did not expect that. Thanks for adding the benchmark. I have to think about why this is happening. – Relax 20/4, 2023 at 13:53

@KonradRudolph, if you look at setdiff, it is doing a lot more than %in%. In the console, we see setdiff ends up doing u[!duplicated(unclass(u)) & (match(u, v, 0L) == 0L)] whereas %in% simply calls match(x, table, nomatch = 0L) > 0L. – Outbuilding 20/4, 2023 at 22:8

@JosephWood Yes, I saw that as well. It's definitely not an optimal implementation (match and duplicated will end up doing largely redundant work, and a better implementation would do this work only once). – Relax 21/4, 2023 at 6:34

There is a function in RcppAlgos (I am the author) called comboGroups built precisely for this task. As of version 2.8.0, we can now handle groups of varying sizes.

library(RcppAlgos)
packageVersion("RcppAlgos")
#> [1] '2.8.0'

We specify the size of each team using the grpSizes parameter:

players <- c("Ross", "Bobby", "Max", "Casper", "Jake")
comboGroups(players, grpSizes = c(2, 3))
#>       Grp1     Grp1     Grp2    Grp2     Grp2    
#>  [1,] "Ross"   "Bobby"  "Max"   "Casper" "Jake"  
#>  [2,] "Ross"   "Max"    "Bobby" "Casper" "Jake"  
#>  [3,] "Ross"   "Casper" "Bobby" "Max"    "Jake"  
#>  [4,] "Ross"   "Jake"   "Bobby" "Max"    "Casper"
#>  [5,] "Bobby"  "Max"    "Ross"  "Casper" "Jake"  
#>  [6,] "Bobby"  "Casper" "Ross"  "Max"    "Jake"  
#>  [7,] "Bobby"  "Jake"   "Ross"  "Max"    "Casper"
#>  [8,] "Max"    "Casper" "Ross"  "Bobby"  "Jake"  
#>  [9,] "Max"    "Jake"   "Ross"  "Bobby"  "Casper"
#> [10,] "Casper" "Jake"   "Ross"  "Bobby"  "Max"

It is very efficient and quite flexible. Let’s test on a larger set of players.

library(microbenchmark)
more_players <- c(players, "Kai", "Eliana", "Jayden", "Luca",
                  "Rowan", "Nova", "Amara", "Finn", "Zion", "Mia")

microbenchmark(
    f1 = combn(more_players,
               7,
               function(x) list(team1 = x, team2 = more_players[!more_players %in% x]),
               simplify = FALSE
    ),
    f2 = combn(more_players,
               7,
               function(x) list(team1 = x, team2 = setdiff(more_players, x)),
               simplify = FALSE
    ),
    f3_rcpp = comboGeneral(
        v = more_players,
        m = 7,
        repetition = FALSE,
        FUN = function(x) list(team1 = x, team2 = more_players[!more_players %in% x])
    ),
    f4 = comboGroups(more_players, grpSizes = c(7, 8)),
    unit = "relative"
)
#> Unit: relative
#>     expr      min       lq     mean   median       uq       max neval  cld
#>       f1 16.22184 19.03265 23.68141 20.78979 26.59770  92.08256   100 a   
#>       f2 41.34947 45.55672 57.30847 54.70320 59.88822 123.09864   100  b  
#>  f3_rcpp 11.83424 14.48575 17.65936 16.24856 21.43574  36.27697   100   c 
#>       f4  1.00000  1.00000  1.00000  1.00000  1.00000   1.00000   100    d

More than 2 Groups

What about more exotic groupings? With most other approaches, efficiency and code maintenance will be an issue.

Say for example, given more_players (15 total players) what if we wanted to find all possible groupings with 2 teams of size 3, one team of size 4, and one team of size 5?

With comboGroups, it is no problem:

system.time(t <- comboGroups(more_players, grpSizes = c(3, 3, 4, 5)))
#>    user  system elapsed 
#>   0.508   0.040   0.548 

head(t, n = 2)
#>      Grp1   Grp1    Grp1  Grp2     Grp2   Grp2  Grp3     Grp3     Grp3   Grp3   
#> [1,] "Ross" "Bobby" "Max" "Casper" "Jake" "Kai" "Eliana" "Jayden" "Luca" "Rowan"
#> [2,] "Ross" "Bobby" "Max" "Casper" "Jake" "Kai" "Eliana" "Jayden" "Luca" "Nova" 
#>      Grp4    Grp4    Grp4   Grp4   Grp4 
#> [1,] "Nova"  "Amara" "Finn" "Zion" "Mia"
#> [2,] "Rowan" "Amara" "Finn" "Zion" "Mia"

tail(t, n = 2)
#>            Grp1    Grp1   Grp1  Grp2   Grp2    Grp2   Grp3   Grp3     Grp3    
#> [6306299,] "Rowan" "Zion" "Mia" "Nova" "Amara" "Finn" "Jake" "Eliana" "Jayden"
#> [6306300,] "Rowan" "Zion" "Mia" "Nova" "Amara" "Finn" "Kai"  "Eliana" "Jayden"
#>            Grp3   Grp4   Grp4    Grp4  Grp4     Grp4  
#> [6306299,] "Luca" "Ross" "Bobby" "Max" "Casper" "Kai" 
#> [6306300,] "Luca" "Ross" "Bobby" "Max" "Casper" "Jake"

If you just need a sample of possible teams, try comboGroupsSample:

comboGroupsSample(
    more_players, grpSizes = c(3, 3, 4, 5), n = 2,
    seed = 42, namedSample = TRUE
)
#>         Grp1    Grp1     Grp1   Grp2    Grp2    Grp2   Grp3   Grp3     Grp3    
#> 3207141 "Bobby" "Jake"   "Mia"  "Luca"  "Amara" "Zion" "Max"  "Casper" "Eliana"
#> 4248729 "Max"   "Casper" "Nova" "Rowan" "Amara" "Finn" "Ross" "Bobby"  "Kai"   
#>         Grp3    Grp4   Grp4     Grp4     Grp4   Grp4  
#> 3207141 "Rowan" "Ross" "Kai"    "Jayden" "Nova" "Finn"
#> 4248729 "Luca"  "Jake" "Eliana" "Jayden" "Zion" "Mia"

Outbuilding answered 20/4, 2023 at 21:43 Comment(1)

RcppAlgos keeps being better. Nice work! – Rolo 21/4, 2023 at 18:3

Just to build off @ThomasIsCoding great answer. Further speed could be gained by using the great RcppAlgos library and the comboGeneral function:

players <- c("Ross", "Bobby", "Max", "Casper", "Jake")

microbenchmark(
  f1 = combn(players,
             3,
             function(x) list(team1 = x, team2 = players[!players %in% x]),
             simplify = FALSE
  ),
  f2 = combn(players,
             3,
             function(x) list(team1 = x, team2 = setdiff(players, x)),
             simplify = FALSE
  ),
  f3_rcpp = RcppAlgos::comboGeneral(
    v = players,
    m = 3,
    repetition = FALSE,
    FUN = function(x) list(team1 = x, team2 = players[!players %in% x])
  )
)

BENCHMARK

Unit: microseconds
    expr   min     lq    mean median     uq    max neval
      f1  34.3  37.95  63.693  39.00  41.45 2016.7   100
      f2 143.5 152.40 184.351 155.30 158.95 1961.7   100
 f3_rcpp  29.3  32.40  61.820  33.95  42.40 2205.0   100

Rolo answered 20/4, 2023 at 14:5 Comment(0)

Benchmarking

More than 2 Groups

Recommended topics

Hot tags