Find position of first value greater than X in a vector

Asked 1/4, 2015 at 10:23 Answered 9/2, 2022 at 16:7

I have a vector and want to find the position of the first value that is greater than 100.

Boice answered 1/4, 2015 at 10:23 Comment(0)

# Randomly generate a suitable vector
set.seed(0)
v <- sample(50:150, size = 50, replace = TRUE)

min(which(v > 100))

Externalize answered 1/4, 2015 at 10:34 Comment(0)

Most answers based on which and max are slow (especially for long vectors) as they iterate through the entire vector:

x>100 evaluates every value in the vector to see if it matches the condition
which and max/min search all the indexes returned at step 1. and find the maximum/minimum

Position will only evaluate the condition until it encounters the first TRUE value and immediately return the corresponding index, without continuing through the rest of the vector.

# Randomly generate a suitable vector
v <- sample(50:150, size = 50, replace = TRUE)

Position(function(x) x > 100, v)

Cursed answered 23/3, 2016 at 23:13 Comment(4)

^ for functional programming – Open 8/6, 2016 at 19:0

Side note: ?Position says: "The current implementation is not optimized for performance." So I guess it also evaluates the whole vector. – Mccowyn 29/9, 2017 at 16:20

@Mccowyn - it uses a for loop. If you just run "Position" (the bare name) it will print out the implementation. – Hircine 27/10, 2017 at 18:19

@Hircine and evaluates the function every time, both slow operations in R. Unless you specifically expect the match to occur at the beginning of the vector, I'd guess this is likely going to be slower than the vectorized version. – Threewheeler 4/5, 2019 at 7:34

Check out which.max:

x <- seq(1, 150, 3)
which.max(x > 100)
# [1] 35
x[35]
# [1] 103

Mccowyn answered 1/4, 2015 at 10:37 Comment(1)

?which.max: 'However, match(FALSE, x) or match(TRUE, x) are typically preferred, as they do indicate mismatches.' => match(TRUE, x>100) – Periodic 16/7, 2018 at 23:14

Just to mention, Hadley Wickham has implemented a function, detect_index, to do exactly this task in his purrr package for functional programming.

I recently used detect_index myself and would recommend it to anyone else with the same problem.

Documentation for detect_index can be found here: https://rdrr.io/cran/purrr/man/detect.html

Crudden answered 28/9, 2017 at 22:12 Comment(2)

Can you make an example? – Orthopter 14/6, 2021 at 15:57

For example, purrr::detect_index(seq(1, 150, 3), function(x) x > 100). Hadley's packages are optimized for readability, but certainly not for speed. – Flaming 23/7, 2022 at 20:35

As I need to perform a similar calculation many times within a loop, I was interested in which of the many answers provided in this thread would be most efficient.

TLDR: Whether the first value appears early or late in a vector, which.max(v > 100) is the fastest solution to this problem.

Note, however, that if no entry in v exceeds 100, it will return 1; thus there may be cause for

SafeWhichMax <- function (v) {
  first <- which.max(v > 100)
  if (first == 1L && v[1] <= 100) NA else first
}
SafeWhichMax(100) # NA
SafeWhichMax(101) # 1

If a vector is very long and is not guaranteed to contain any TRUE results, match(TRUE, v > 100) may be quicker than which.max() with checks.

# Short vector:
v <- 0:105

microbenchmark(
  which.max(v > 100),
  match(TRUE, v > 100),
  min(which(v > 100)),
  which(v > 100)[1],
  Position(function(x) x, v > 100),
  Position(function(x) x > 100, v),
  purrr::detect_index(v, function (x) x > 100)
)

Unit: microseconds
                                  mean      median
which.max(v > 100)                24.112    23.80
SafeWhichMax(v)                   24.889    24.25
match(TRUE, v > 100)              34.752    33.20
min(which(v > 100))               25.506    25.20
which(v > 100)[1]                 25.320    24.90
Position(function(x) x, v > 100)  3231.783  3043.50
Position(function(x) x > 100, v)  3487.805  3314.75
purrr::detect_index               16436.579 16064.90

# Long vector, with late first occurrence of v > 100
v <- -10000:105

Unit: microseconds
                                  mean   median
which.max(v > 100)               24.958    24.30
SafeWhichMax(v)                  25.456    24.90
match(TRUE, v > 100)             37.680    37.85
min(which(v > 100))              26.439    26.00
which(v > 100)[1]                25.724    25.55
Position(function(x) x, v > 100) 3224.240  3036.50
Position(function(x) x > 100, v) 3389.538  3287.05
purrr::detect_index              17344.706 15283.35

Karakalpak answered 9/2, 2022 at 16:7 Comment(0)

There are many solutions, another is:

x <- 90:110
which(x > 100)[1]

Immigration answered 1/4, 2015 at 10:39 Comment(0)

-2

Assuming values is your vector.

 firstGreatearThan <- NULL
  for(i in seq(along=values)) { 
    if(values[i] > 100) {
       firstGreatearThan <- i
       break
    }
 }

Angry answered 1/4, 2015 at 10:34 Comment(3)

I don't think that would give you the first value unless you added a break – Immigration 1/4, 2015 at 10:37

and we don't need a loop here – Civil 1/4, 2015 at 10:57

Yeah, right. The point is that is a so simple question, I just wrote a faster answer – Angry 1/4, 2015 at 16:53

Recommended topics

Hot tags