Create a sequence of sequences of numbers
Asked Answered
S

4

21

I would like to make the following sequence in R, by using rep or any other function.

c(1, 2, 3, 4, 5, 2, 3, 4, 5, 3, 4, 5, 4, 5, 5)

Basically, c(1:5, 2:5, 3:5, 4:5, 5:5).

Sheridan answered 4/1, 2022 at 13:14 Comment(0)
L
37

Use sequence.

sequence(5:1, from = 1:5)
[1] 1 2 3 4 5 2 3 4 5 3 4 5 4 5 5

The first argument, nvec, is the length of each sequence (5:1); the second, from, is the starting point for each sequence (1:5).

Note: this works only for R >= 4.0.0. From R News 4.0.0:

sequence() [...] gains arguments [e.g. from] to generate more complex sequences.

Lipo answered 4/1, 2022 at 13:20 Comment(1)
@Henrik A very similar question answered some time ago using sequence: https://mcmap.net/q/659147/-using-seq-and-rep-to-create-a-sequence-of-5-integers-that-go-up-by-1-on-each-repetitionIronbound
C
8
unlist(lapply(1:5, function(i) i:5))
# [1] 1 2 3 4 5 2 3 4 5 3 4 5 4 5 5

Some speed tests on all answers provided note the OP mentioned 10K somewhere if I recall correctly

s1 <- function(n) { 
  unlist(lapply(1:n, function(i) i:n))
}

s2 <- function(n) {
  unlist(lapply(seq_len(n), function(i) seq(from = i, to = n, by = 1)))
}

s3 <- function(n) {
  vect <- 0:n
  unlist(replicate(n, vect <<- vect[-1]))
}

s4 <- function(n) {
  m <- matrix(1:n, ncol = n, nrow = n, byrow = TRUE)
  m[lower.tri(m)] <- 0
  c(t(m)[t(m != 0)])
}

s5 <- function(n) {
  m <- matrix(seq.int(n), ncol = n, nrow = n)
  m[lower.tri(m, diag = TRUE)]
}

s6 <- function(n) {
  out <- c()
  for (i in 1:n) { 
    out <- c(out, (1:n)[i:n])
  }
  out
}

library(rbenchmark)

n = 5

n = 5L

benchmark(
  "s1" = { s1(n) },
  "s2" = { s2(n) },
  "s3" = { s3(n) },
  "s4" = { s4(n) },
  "s5" = { s5(n) },
  "s6" = { s6(n) },
  replications = 1000,
  columns = c("test", "replications", "elapsed", "relative")
)

Do not get fooled by some "fast" solutions using hardly any function that takes time to be called, and differences are multiplied by 1000x replications.

  test replications elapsed relative
1   s1         1000    0.05      2.5
2   s2         1000    0.44     22.0
3   s3         1000    0.14      7.0
4   s4         1000    0.08      4.0
5   s5         1000    0.02      1.0
6   s6         1000    0.02      1.0

n = 1000

n = 1000L

benchmark(
  "s1" = { s1(n) },
  "s2" = { s2(n) },
  "s3" = { s3(n) },
  "s4" = { s4(n) },
  "s5" = { s5(n) },
  "s6" = { s6(n) },
  replications = 10,
  columns = c("test", "replications", "elapsed", "relative")
)

As the poster already mentioned as "not to do", we see the for loop becoming pretty slow compared to any other method, on n = 1000L

  test replications elapsed relative
1   s1           10    0.17    1.000
2   s2           10    0.83    4.882
3   s3           10    0.19    1.118
4   s4           10    1.50    8.824
5   s5           10    0.29    1.706
6   s6           10   28.64  168.471

n = 10000

n = 10000L

benchmark(
  "s1" = { s1(n) },
  "s2" = { s2(n) },
  "s3" = { s3(n) },
  "s4" = { s4(n) },
  "s5" = { s5(n) },
  # "s6" = { s6(n) },
  replications = 10,
  columns = c("test", "replications", "elapsed", "relative")
)

At big n's we see matrix becomes very slow compared to the other methods. Using seq in the apply might be neater, but comes with a trade-off as calling that function n times increases processing time a lot. Although seq_len(n) is nicer than 1:n and is just run once. Interesting to see that the replicate method is the fastest.

  test replications elapsed relative
1   s1           10    5.44    1.915
2   s2           10    9.98    3.514
3   s3           10    2.84    1.000
4   s4           10   72.37   25.482
5   s5           10   35.78   12.599
Chian answered 4/1, 2022 at 14:52 Comment(6)
Careful with this. It will misbehave if you change the first argument without remembering to change the second. For example, unlist(lapply(1:10, function(i) i:5)) isn't right. Changing the second argument to function(i) seq(from = i, to = 5, by = 1) is a lot more verbose, but it's safer. The ultimate version is probably something like output <- function(x) unlist(lapply(seq_len(x), function(i) seq(from = i, to = x, by = 1))).Sambo
Hi @Merijn van Tilborg! Perhaps you could include the sequence answer in the timings as well? CheersPooi
I would have if I could, but I have not the R version that supports the from argument. I expect it to be the same speed as s1 or s2 as if we look at the old sequence function it is basically a wrapper of R: sequence function (nvec) unlist(lapply(nvec, seq_len))Chian
Indeed, but it seems like that is no longer the case, so the timing may actually differ.Pooi
A quick system.time with sequence and n = 10000 suggests that it is about 8-9 times faster than the replicate method.Pooi
This could also be shortened to unlist(lapply(1:5, ':', 5)).Sitwell
S
5

Your mention of rep reminded me of replicate, so here's a very stateful solution. I'm presenting this because it's short and unusual, not because it's good. This is very unidiomatic R.

vect <- 0:5
unlist(replicate(5, vect <<- vect[-1]))
[1] 1 2 3 4 5 2 3 4 5 3 4 5 4 5 5

You can do it with a combination of rep and lapply, but it's basically the same as Merijn van Tilborg's answer.

Of course, the truly fearless unidomatic R user does this and refuses to elaborate further.

mat <- matrix(1:5, ncol = 5, nrow = 5, byrow = TRUE)
mat[lower.tri(mat)] <- 0
c(t(mat)[t(mat != 0)])
[1] 1 2 3 4 5 2 3 4 5 3 4 5 4 5 5
Sambo answered 4/1, 2022 at 22:54 Comment(4)
Your matrix alternative can be slightly simplified: m = matrix(seq.int(n), ncol = n, nrow = n); m[lower.tri(m, diag = TRUE)] (less unidiomatic though)Pooi
@Pooi Good job. I knew that something was off when I had to call t twice while using byrow=TRUE.Sambo
I fully understand. I have got lost in the maze of upper/lower.tri/byrow/"to t or not to t" soo many times myself. Your unidiomatic contribution is much appreciated.Pooi
The indexing could be golfed with row(m)>=col(m)Pooi
T
0

You could use a loop like so:

out=c();for(i in 1:5){ out=c(out, (1:5)[i:5]) }
out
# [1] 1 2 3 4 5 2 3 4 5 3 4 5 4 5 5

but that's not a good idea!


Why not use a loop?

Using a loop is:

  • slower,
  • less memory efficient, and
  • harder to read and understand.

By contrast, using a vectorised function like sequence is the opposite (faster, more efficient, and easy to read).


Further info

From ?sequence:

The default method for sequence generates the sequence seq(from[i], by = by[i], length.out = nvec[i]) for each element i in the parallel (and recycled) vectors from, by and nvec. It then returns the result of concatenating those sequences.

and about the from argument:

from: each element specifies the first element of a sequence.

Also, since the vector used in the loop is not preallocated, it will require more memory, and will also be slower.

Tighe answered 5/1, 2022 at 5:4 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.