Within the base packages, how can I generate the unique unordered pairs between two copies of a vector?
Asked Answered
H

4

13

Given n=2, I want the set of values (1, 1), (1, 2), and (2, 2). For n=3, I want (1, 1), (1, 2), (1, 3), (2, 2), (2, 3), and (3, 3). And so on for n=4, 5, etc.

I'd like to do this entirely within the base libraries. Recently, I've taken to using

gen <- function(n)
{
    x <- seq_len(n)
    cbind(combn(x, 2), rbind(x, x))
}

which gives some workable but hacky output. We get the below for n=4.

  [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
x    1    1    1    2    2    3    1    2    3     4
x    2    3    4    3    4    4    1    2    3     4

Is there a better way? Between expand.grid, outer, combn, and R's many other ways of generating vectors, I was hoping to be able to do this with just one combination-producing function rather than having to bind together the output of combn with something else. I could write the obvious for loop, but that seems like a waste of R's powers.

Starting with expand.grid and then subsetting is an option that many answers so far have taken, but I find the idea of generating twice the set that I need to be a poor use of memory. This probably also rules out outer.

Hamo answered 18/10, 2021 at 18:53 Comment(3)
wb n <- 1:4; rbind(sequence(n), rep(n, n))Matchboard
Nice @rawr. Post an answer?Calipee
@Matchboard You should definitely post an answer. It's the fastest and cleanest solution IMO.Indochina
L
9

Update

Here is another version, inspired by @G. Grothendieck

gen <- function(n) t(which(upper.tri(diag(n), diag = TRUE), arr.ind = TRUE))

or

gen <- function(n) {
  unname(do.call(
    cbind,
    sapply(
      seq(n),
      function(k) rbind(k, k:n)
    )
  ))
}

You can try expand.grid + subset like below

gen <- function(n) {
  unname(t(
    subset(
      expand.grid(rep(list(seq(n)), 2)),
      Var1 <= Var2
    )
  ))
}

and you will see

> gen(2)
     [,1] [,2] [,3]
[1,]    1    1    2
[2,]    1    2    2

> gen(3)
     [,1] [,2] [,3] [,4] [,5] [,6]
[1,]    1    1    2    1    2    3
[2,]    1    2    2    3    3    3

> gen(4)
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]    1    1    2    1    2    3    1    2    3     4
[2,]    1    2    2    3    3    3    4    4    4     4
Librettist answered 18/10, 2021 at 19:5 Comment(2)
+1 for the clever use of rep, but it's a shame that we have to generate the entire grid before subsetting it. It would be nicer to get what we want and no more.Hamo
@J.Mini See my update.Librettist
M
13

Here are some ways to do this.

1) upper.tri

n <- 4
d <- diag(n)
u <- upper.tri(d, diag = TRUE)
rbind(row(d)[u], col(d)[u])
##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,]    1    1    2    1    2    3    1    2    3     4
## [2,]    1    2    2    3    3    3    4    4    4     4

The last line of code could alternately be written as:

t(sapply(c(row, col), function(f) f(d)[u]))

2) combn

n <- 4
combn(n+1, 2, function(x) if (x[2] == n+1) x[1] else x)
##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,]    1    1    1    1    2    2    2    3    3     4
## [2,]    2    3    4    1    3    4    2    4    3     4

A variation of this is:

co <- combn(n+1, 2)
co[2, ] <- ifelse(co[2, ] == n+1, co[1, ], co[2, ])
co

3) list comprehension

library(listcompr)
t(gen.matrix(c(i, j), i = 1:n, j = i:n))
##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,]    1    1    2    1    2    3    1    2    3     4
## [2,]    1    2    2    3    3    3    4    4    4     4

Performance

library(microbenchmark)
library(listcompr)

n <- 25
microbenchmark(
  upper.tri = {
    d <- diag(n)
    u <- upper.tri(d, diag = TRUE)
    rbind(row(d)[u], col(d)[u]) },
  upper.tri2 = {
    d <- diag(n)
    u <- upper.tri(d, diag = TRUE)
    t(sapply(c(row, col), function(f) f(d)[u])) },
  combn = combn(n+1, 2, function(x) if (x[2] == n+1) x[1] else x),
  combn2 = { 
     co <- combn(n+1, 2)
     co[2, ] <- ifelse(co[2, ] == n+1, co[1, ], co[2, ])
     co
  },
  listcompr = t(gen.matrix(c(i, j), i = 1:n, j = i:n)))

giving:

Unit: microseconds
       expr     min        lq       mean    median        uq      max neval cld
  upper.tri    41.8     52.00     65.761     61.30     77.15    132.6   100  a 
 upper.tri2   110.8    128.95    187.372    154.85    178.60   3024.6   100  a 
      combn  1342.8   1392.25   1514.038   1432.90   1473.65   7034.7   100  a 
     combn2   687.5    725.50    780.686    765.85    812.65   1129.4   100  a 
  listcompr 97889.0 100321.75 106442.425 101347.95 105826.55 307089.4   100   b
Multinuclear answered 18/10, 2021 at 19:1 Comment(2)
Upvoted for the upper.tri method, which is really smart.Librettist
@Librettist Smart, but it's a shame that it generates the entire grid before subsetting it. You'd rather just get the numbers that you want and nothing more.Hamo
L
9

Update

Here is another version, inspired by @G. Grothendieck

gen <- function(n) t(which(upper.tri(diag(n), diag = TRUE), arr.ind = TRUE))

or

gen <- function(n) {
  unname(do.call(
    cbind,
    sapply(
      seq(n),
      function(k) rbind(k, k:n)
    )
  ))
}

You can try expand.grid + subset like below

gen <- function(n) {
  unname(t(
    subset(
      expand.grid(rep(list(seq(n)), 2)),
      Var1 <= Var2
    )
  ))
}

and you will see

> gen(2)
     [,1] [,2] [,3]
[1,]    1    1    2
[2,]    1    2    2

> gen(3)
     [,1] [,2] [,3] [,4] [,5] [,6]
[1,]    1    1    2    1    2    3
[2,]    1    2    2    3    3    3

> gen(4)
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]    1    1    2    1    2    3    1    2    3     4
[2,]    1    2    2    3    3    3    4    4    4     4
Librettist answered 18/10, 2021 at 19:5 Comment(2)
+1 for the clever use of rep, but it's a shame that we have to generate the entire grid before subsetting it. It would be nicer to get what we want and no more.Hamo
@J.Mini See my update.Librettist
F
7

Here's a slightly modified version of @G. Grothendieck's upper.tri, and a comparison of both to @rawr's method in the comments

upper.tri3 <- function(n){
  mrow <- row(diag(n))
  mcol <- t(mrow)
  i <- mrow <= mcol
  rbind(mrow[i], mcol[i])
}

library(bench)
n <- 1e4
mark(
  upper.tri = {
    d <- diag(n)
    u <- upper.tri(d, diag = TRUE)
    rbind(row(d)[u], col(d)[u]) },
  upper.tri3 = upper.tri3(n),
  rawr = {
    s <- 1:n
    rbind(sequence(s), rep(s, s))
  }
)
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
#> # A tibble: 3 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 upper.tri     3.96s    3.96s     0.252    4.47GB    0.757
#> 2 upper.tri3    2.46s    2.46s     0.406    3.73GB    1.62 
#> 3 rawr       372.25ms 429.55ms     2.33   763.06MB    1.16

Created on 2021-10-18 by the reprex package (v2.0.1)

Fahy answered 18/10, 2021 at 23:21 Comment(0)
B
1

You can use expand.grid. I see it as the most intuitive and easy to read solution.

simple_solution <- function(x) {
    df <- expand.grid(1:x, 1:x)
    return(df[df$Var1 <= df$Var2, ])
}
> simple_solution(4)
    Var1 Var2
1     1    1
5     1    2
6     2    2
9     1    3
10    2    3
11    3    3
13    1    4
14    2    4
15    3    4
16    4    4

Bertie answered 28/10, 2021 at 19:7 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.