EDIT: Major clean-up in all aisles.
You might look at cut
. By default, cut
makes left open and right closed intervals, and that can be changed using the appropriate argument (right
). To use your example:
x <- c(3, 6, 7, 7, 29, 37, 52)
vec <- c(2, 5, 6, 35)
cutVec <- c(vec, max(x)) # for cut, range of vec should cover all of x
Now create four functions that should do the same thing: Two from the OP, one from Josh O'Brien, and then cut
. Two arguments to cut
have been changed from default settings: include.lowest = TRUE
will create an interval closed on both sides for the smallest (leftmost) interval. labels = FALSE
will cause cut
to return simply the integer values for the bins instead of creating a factor, which it otherwise does.
findInterval.rightClosed <- function(x, vec, ...) {
fi <- findInterval(x, vec, ...)
fi - (x==vec[fi])
}
findInterval.rightClosed2 <- function(x, vec, ...) {
length(vec) - findInterval(-x, -rev(vec), ...)
}
cutFun <- function(x, vec){
cut(x, vec, include.lowest = TRUE, labels = FALSE)
}
# The body of fiFun is a contribution by Josh O'Brien that got fed to the ether.
fiFun <- function(x, vec){
xxFI <- findInterval(x, vec * (1 + .Machine$double.eps))
}
Do all functions return the same result? Yup. (notice the use of cutVec
for cutFun
)
mapply(identical, list(findInterval.rightClosed(x, vec)),
list(findInterval.rightClosed2(x, vec), cutFun(x, cutVec), fiFun(x, vec)))
# [1] TRUE TRUE TRUE
Now a more demanding vector to bin:
x <- rpois(2e6, 10)
vec <- c(-Inf, quantile(x, seq(.2, 1, .2)))
Test whether identical (note use of unname
)
mapply(identical, list(unname(findInterval.rightClosed(x, vec))),
list(findInterval.rightClosed2(x, vec), cutFun(x, vec), fiFun(x, vec)))
# [1] TRUE TRUE TRUE
And benchmark:
library(microbenchmark)
microbenchmark(findInterval.rightClosed(x, vec), findInterval.rightClosed2(x, vec),
cutFun(x, vec), fiFun(x, vec), times = 50)
# Unit: milliseconds
# expr min lq median uq max
# 1 cutFun(x, vec) 35.46261 35.63435 35.81233 36.68036 53.52078
# 2 fiFun(x, vec) 51.30158 51.69391 52.24277 53.69253 67.09433
# 3 findInterval.rightClosed(x, vec) 124.57110 133.99315 142.06567 155.68592 176.43291
# 4 findInterval.rightClosed2(x, vec) 79.81685 82.01025 86.20182 95.65368 108.51624
From this run, cut
seems to be the fastest.
findInterval(x, c(-Inf, head(vec, -1)))
? – IndefeasiblefindInterval(x-1,vec)
do what you are looking for? – HemiterpenefindInterval.rightClosed2
. – AbstemiousPOSIXt
objects, but ideally anything with comparison operators), asfindInterval
does (findInterval
works with anything that supports anas.double
method). The first function above succeeds, but the second fails, because there's no way to negatePOSIXt
objects. – Sleeve