Closest subsequent index for a specified value
Asked Answered
U

8

13

Consider a vector:

int = c(1, 1, 0, 5, 2, 0, 0, 2)

I'd like to get the closest subsequent index (not the difference) for a specified value. The first parameter of the function should be the vector, while the second should be the value one wants to see the closest subsequent elements.

For instance,

f(int, 0)
# [1] 2 1 0 2 1 0 0 NA

Here, the first element of the vector (1) is two positions away from the first subsequent 0, (3 - 1 = 2), so it should return 2. Then the second element is 1 position away from a 0 (2 - 1 = 1). When there is no subsequent values that match the specified value, return NA (here it's the case for the last element, because no subsequent value is 0).

Other examples:

f(int, 1)
# [1] 0 0 NA NA NA NA NA NA

f(int, 2) 
# [1] 4 3 2 1 0 2 1 0

f(int, 3) 
# [1] NA NA NA NA NA NA NA NA

This should also work for character vectors:

char = c("A", "B", "C", "A", "A")

f(char, "A") 
# [1] 0 2 1 0 0
Ulda answered 7/2, 2022 at 11:4 Comment(2)
In your last char example, could you explain the output, why are we getting 0,2,1,0,0 ?Feudalism
Yes. f should return the closest following value that equals "A". So first the value of char, "A", this is 0, for the second, it is 2, because B is 2-position away from the following "A" (4-2 = 2). Does it make more sense now?Stroman
H
14

Find the location of each value (numeric or character)

int = c(1, 1, 0, 5, 2, 0, 0, 2)
value = 0
idx = which(int == value)
## [1] 3 6 7

Expand the index to indicate the nearest value of interest, using an NA after the last value in int.

nearest = rep(NA, length(int))
nearest[1:max(idx)] = rep(idx, diff(c(0, idx))),
## [1]  3  3  3  6  6  6  7 NA

Use simple arithmetic to find the difference between the index of the current value and the index of the nearest value

abs(seq_along(int) - nearest)
## [1]  2  1  0  2  1  0  0 NA

Written as a function

f <- function(x, value) {
    idx = which(x == value)
    nearest = rep(NA, length(x))
    if (length(idx)) # non-NA values only if `value` in `x`
        nearest[1:max(idx)] = rep(idx, diff(c(0, idx)))
    abs(seq_along(x) - nearest)
}

We have

> f(int, 0)
[1]  2  1  0  2  1  0  0 NA
> f(int, 1)
[1]  0  0 NA NA NA NA NA NA
> f(int, 2)
[1] 4 3 2 1 0 2 1 0
> f(char, "A")
[1] 0 2 1 0 0
> f(char, "B")
[1]  1  0 NA NA NA
> f(char, "C")
[1]  2  1  0 NA NA

The solution doesn't involve recursion or R-level loops, so should e fast even for long vectors.

Holtz answered 7/2, 2022 at 12:11 Comment(0)
F
11

Look for the match from nth position to the end of the vector, then get the 1st match:

f <- function(v, x){
  sapply(seq_along(v), function(i){
    which(v[ i:length(v) ] == x)[ 1 ] - 1
  })
}

f(int, 0)
# [1]  2  1  0  2  1  0  0 NA
f(int, 1)
# [1]  0  0 NA NA NA NA NA NA
f(int, 2)
# [1] 4 3 2 1 0 2 1 0
f(int, 3) 
# [1] NA NA NA NA NA NA NA NA

f(char, "A") 
# [1] 0 2 1 0 0
Feudalism answered 7/2, 2022 at 11:35 Comment(2)
beautiful! I thought the only elegant answer would have been a recursive function but yours is elegant as well.Merla
@sbarbit to be honest, recursive solution didn't even come to my mind.Feudalism
U
9

Using sequence:

f <- function(v, x){
  d = diff(c(0, which(v == x)))
  vec <- sequence(d, d-1, by = -1)
  length(vec) <- length(int)
  vec
}

Output

int = c(1, 1, 0, 5, 2, 0, 0, 2)
char = c("A", "B", "C", "A", "A")

f(int, 0)
# [1]  2  1  0  2  1  0  0 NA

f(int, 1)
# [1]  0  0 NA NA NA NA NA NA

f(int, 2)
# [1] 4 3 2 1 0 2 1 0

f(char, "A")
# [1] 0 2 1 0 0

Benchmark (n = 1000):

set.seed(123)
int = sample(0:100, size = 1000, replace = T)

library(microbenchmark)
bm <- microbenchmark(
  fSequence(int, 0),
  fzx8754(int, 0),
  fRecursive(int, 0), 
  fMartinMorgan(int, 0), 
  fMap2dbl(int, 0),
  fReduce(int, 0),
  fAve(int, 0),
  fjblood94(int, 0),
  times = 10L,
  setup = gc(FALSE)
)
autoplot(bm)

Martin Morgan's solution seems to be the quickest, followed by this answer's sequence solution, sbarbit's recursive solution, and jblood94's for loop solution. enter image description here

Functions used:

fSequence <- function(v, x){
  vec <- sequence(diff(c(0, which(v == x))), diff(c(0, which(v == x))) - 1, by = -1)
  length(vec) <- length(v)
  vec
}

fzx8754 <- function(v, x){
  sapply(seq_along(v), function(i){
    which(v[ i:length(v) ] == x)[ 1 ] - 1
  })
}

fRecursive <- function(lookup,val ) {
  ind <- which(lookup == val)[1] -1
  if (length(lookup) > 1) {
    c(ind, f(lookup[-1], val))
  } else {
    ind
  }
}

fMartinMorgan <- function(x, value) {
  idx = which(x == value)
  nearest = rep(NA, length(x))
  nearest[1:max(idx)] = rep(idx, diff(c(0, idx)))
  abs(seq_along(x) - nearest)
}

fMap2dbl <- function(int, num)
{
  n <- length(int)
  
  map2_dbl(num, 1:n, ~ ifelse(length(which(.x == int[.y:n])) == 0, NA, 
                              min(which(.x == int[.y:n])) - 1))
}

fReduce <- function(vec, value) {
  replace(
    Reduce(
      function(x, y)
        x  + (y * x) ,
      vec != value,
      right = TRUE,
      accumulate = TRUE
    ),
    max(tail(which(vec == value), 1), 0) < seq_along(vec),
    NA
  )
}

fAve <- function(init, k) {
  ave(
    seq_along(init),
    c(0, head(cumsum(init == k), -1)),
    FUN = function(x) if (any(x == k)) rev(seq_along(x) - 1) else NA
  )
}

fjblood94 <- function(v, val) {
  out <- integer(length(v))
  if (v[length(v)] != val) out[length(v)] <- NA_integer_
  
  for (i in (length(v) - 1L):1) {
    if (v[i] == val) {
      out[i] <- 0L
    } else {
      out[i] <- out[i + 1L] + 1L
    }
  }
  
  return(out)
}

Ulda answered 7/2, 2022 at 14:0 Comment(3)
sequence is really efficient, upvoted! Also, love the benchmarking!Echopraxia
Interesting, all my benchmarking shows fSequence as the clear winner, which is what I would have expected.Sleigh
Interesting indeed. What about when you replicate my benchmarking analysis?Stroman
M
6

Here f is defined as a recursive function that calls itself over shorter tails of the lookup vector:

f <- function(lookup,val ) {
  ind <- which(lookup == val)[1] -1
  if (length(lookup) > 1) {
    c(ind, f(lookup[-1], val))
  } else {
    ind
  }
}
Merla answered 7/2, 2022 at 11:31 Comment(2)
Nice recursive solution.Feudalism
well.. being on stack overflow kind of inspired me :-)Merla
A
5

Here is an approach using Reduce() and then some fiddling to get the NA values.

f <- function(vec, value) {
replace(
  Reduce(
    function(x, y)
      x  + (y * x) ,
    vec != value,
    right = TRUE,
    accumulate = TRUE
  ),
  max(tail(which(vec == value), 1), 0) < seq_along(vec),
  NA
)
}

f(int, 0)          
[1]  2  1  0  2  1  0  0 NA

f(int, 1)          
[1]  0  0 NA NA NA NA NA NA

f(int, 2) 
[1] 4 3 2 1 0 2 1 0

f(int, 3) 
[1] NA NA NA NA NA NA NA NA

char = c("A", "B", "C", "A", "A")

f(char, "A") 
[1] 0 2 1 0 0
Atheling answered 7/2, 2022 at 12:14 Comment(0)
F
5

Another possible solution, based on purrr::map2_dbl:

library(purrr)

int = c(1, 1, 0, 5, 2, 0, 0, 2)

f <- function(int, num)
{
  n <- length(int)
  
  map2_dbl(num, 1:n, ~ ifelse(length(which(.x == int[.y:n])) == 0, NA, 
      min(which(.x == int[.y:n])) - 1))
}

f(int, 0)
#> [1]  2  1  0  2  1  0  0 NA

f(int, 1)          
#> [1]  0  0 NA NA NA NA NA NA

f(int, 2) 
#> [1] 4 3 2 1 0 2 1 0

f(int, 3) 
#> [1] NA NA NA NA NA NA NA NA

char = c("A", "B", "C", "A", "A")

f(char, "A") 
#> [1] 0 2 1 0 0
Flyspeck answered 7/2, 2022 at 13:0 Comment(0)
S
5

A single-pass for loop is simple and efficient:

f1 <- function(v, val) {
  out <- integer(length(v))
  if (v[length(v)] != val) out[length(v)] <- NA_integer_
  
  for (i in (length(v) - 1L):1) {
    if (v[i] == val) {
      out[i] <- 0L
    } else {
      out[i] <- out[i + 1L] + 1L
    }
  }
  
  return(out)
}

int <- c(1, 1, 0, 5, 2, 0, 0, 2)
chr <- c("A", "B", "C", "A", "A")
f1(int, 0)
#> [1]  2  1  0  2  1  0  0 NA
f1(chr, "A")
#> [1] 0 2 1 0 0

Benchmarking against other solutions:

f2 <- function(v, x){
  sapply(seq_along(v), function(i){
    which(v[ i:length(v) ] == x)[ 1 ] - 1
  })
}

f3 <- function(lookup,val ) {
  ind <- which(lookup == val)[1] -1
  if (length(lookup) > 1) {
    c(ind, f3(lookup[-1], val))
  } else {
    ind
  }
}

f4 <- function(x, value) {
  idx = which(x == value)
  nearest = rep(NA, length(x))
  nearest[1:max(idx)] = rep(idx, diff(c(0, idx)))
  abs(seq_along(x) - nearest)
}

f5 <- function(vec, value) {
  replace(
    Reduce(
      function(x, y)
        x  + (y * x) ,
      vec != value,
      right = TRUE,
      accumulate = TRUE
    ),
    max(tail(which(vec == value), 1), 0) < seq_along(vec),
    NA
  )
}

microbenchmark::microbenchmark(f1 = {f1(int, 0); f1(chr, "A")},
                               f2 = {f2(int, 0); f2(chr, "A")},
                               f3 = {f3(int, 0); f3(chr, "A")},
                               f4 = {f4(int, 0); f4(chr, "A")},
                               f5 = {f5(int, 0); f5(chr, "A")},
                               check = "equal")
#> Unit: microseconds
#>  expr  min    lq   mean median    uq   max neval
#>    f1  6.0  7.50  8.990   8.40  9.60  18.3   100
#>    f2 54.2 61.45 71.752  65.55 79.40 131.8   100
#>    f3 25.5 28.60 33.393  30.75 35.90 105.2   100
#>    f4 22.3 26.30 30.599  28.00 32.65  82.4   100
#>    f5 59.7 64.55 73.474  69.10 75.70 157.0   100
Sleigh answered 7/2, 2022 at 14:13 Comment(1)
Nice to see the benchmark! If I try int = sample(0:5, 1000, replace = TRUE) I get Error: C stack usage 7975244 is too close to the limit from the recursive (f3()) version, so that doesn't scale well. Removing it, f1() and f4() are one or two orders of magnitude faster than the others, and for me f4() is twice as fast as f1() (111 vs 223 microseconds); for a vector of length 10000, f4() is a little more than 3x faster than f1().Holtz
E
5

A base R option using ave + cumsum

f <- function(init, k) {
  ave(
    seq_along(init),
    c(0, head(cumsum(init == k), -1)),
    FUN = function(x) if (any(x == k)) rev(seq_along(x) - 1) else NA
  )
}

and you will see

> f(init, 0)
[1]  2  1  0  2  1  0  0 NA

> f(init, 1)
[1]  0  0 NA NA NA NA NA NA

> f(init, 2)
[1] 4 3 2 1 0 2 1 0

> f(init, 3)
[1] NA NA NA NA NA NA NA NA
Echopraxia answered 7/2, 2022 at 14:16 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.