Clip values between a minimum and maximum allowed value in R
Asked Answered
H

4

23

In Mathematica there is the command Clip[x, {min, max}] which gives x for min<=x<=max, min for x<min and and max for x>max, see

http://reference.wolfram.com/mathematica/ref/Clip.html (mirror)

What would be the fastest way to achieve this in R? Ideally it should be a function that is listable, and should ideally work on either a single value, vector, matrix or dataframe...

Hollishollister answered 13/12, 2012 at 21:41 Comment(0)
I
25

Rcpp has clamp for this:

cppFunction('NumericVector rcpp_clip( NumericVector x, double a, double b){
    return clamp( a, x, b ) ;
}')

Here is a quick benchmark showing how it performs against other methods discussed :

pmin_pmax_clip <- function(x, a, b) pmax(a, pmin(x, b) )
ifelse_clip <- function(x, a, b) {
  ifelse(x <= a,  a, ifelse(x >= b, b, x))
}
operations_clip <- function(x, a, b) {
  a + (x-a > 0)*(x-a) - (x-b > 0)*(x-b)
}
x <- rnorm( 10000 )
require(microbenchmark)

microbenchmark( 
  pmin_pmax_clip( x, -2, 2 ), 
  rcpp_clip( x, -2, 2 ), 
  ifelse_clip( x, -2, 2 ), 
  operations_clip( x, -2, 2 )
)
# Unit: microseconds
#                        expr      min        lq   median        uq       max
# 1     ifelse_clip(x, -2, 2) 2809.211 3812.7350 3911.461 4481.0790 43244.543
# 2 operations_clip(x, -2, 2)  228.282  248.2500  266.605 1120.8855 40703.937
# 3  pmin_pmax_clip(x, -2, 2)  260.630  284.0985  308.426  336.9280  1353.721
# 4       rcpp_clip(x, -2, 2)   65.413   70.7120   84.568   92.2875  1097.039    
Income answered 13/12, 2012 at 23:26 Comment(6)
Those times are pretty rockin'.Mercurochrome
Just pasting the lines for the clamp code in a console session is obviously not what you intended us Rcpp virgins to be doing.Crux
Almost. See my use of cppFunction in my edit. (but you need the current devel version of Rcpp because clamp has been fixed since the last release).Income
Very cool. I'm shocked and baffled at how bad the operations_clip() times are .... sometimes. Any ideas why the max values are quite so much larger than the min values for all of these functions?Generalization
I'm pretty sure this is about memory allocation. operations_clip performs a lot of them, so my guess is that sometimes it takes longer.Income
@RomainFrancois Just to come back to this - what would be the recommended way to do this for the case where lower and upper limits a and b were not doubles but NumericVectors? Would using RcppArmadillo x.elem(arma::find(x < lower)) = lower.elem(arma::find(x < lower)); x.elem(arma::find(x > upper)) = upper.elem(arma::find(x > upper)); be an OK approach or would there be better ways? Or would using Rcpp sugar functions pmin and pmax pmax(lower, pmin(x, upper)) also work? And in Rcpp code above I think clamp(a, x, b) should be clamp(x, a, b) right?Hollishollister
C
23

Here's a method with nested pmin and pmax setting the bounds:

 fenced.var <- pmax( LB, pmin( var, UB))

It will be difficult to find a method that is faster. Wrapped in a function that defaults to a range of 3 and 7:

fence <- function(vec, UB=7, LB=3) pmax( LB, pmin( vec, UB))

> fence(1:10)
 [1] 3 3 3 4 5 6 7 7 7 7
Crux answered 13/12, 2012 at 22:45 Comment(7)
Very elegant - that's great!Hollishollister
I use this one a lot. I have a large dataset that has several variables that are not plausibly real below 0 and that should be sensible constrained at the high end as well. The real trick is remembering to set the max with pmin and set the min with pmax.Crux
Your 'It will be difficult to find a method that is faster' obviously motivated me to have a look.Income
Yeah. It still wins the compactness prize ... so far.Crux
I think the function's arguments UB and LB should be reversed. I suspect fence <- function(vec, LB=3, UB=7) pmax( LB, pmin( vec, UB)) is really what you're afterPrithee
@jf328: You might want to present a counter-example since my experiments showed it to be working in what I thought was a perfectly reasonable manner. pmin( pmax( matrix(1:10,2), 4), 6) Probably best to do this with a new question.Crux
@42- interesting. You have to put the scalar (4 and 6) as the second argument. If you put them as the first argument as in your answer, then it returns a vector, not a matrix. And the pmax/pmin document does say it only returns a vector.Weeny
S
11

Here's one function that will work for both vectors and matrices.

myClip <- function(x, a, b) {
    ifelse(x <= a,  a, ifelse(x >= b, b, x))
}

myClip(x = 0:10, a = 3,b = 7)
#  [1] 3 3 3 3 4 5 6 7 7 7 7

myClip(x = matrix(1:12/10, ncol=4), a=.2, b=0.7)
# myClip(x = matrix(1:12/10, ncol=4), a=.2, b=0.7)
#      [,1] [,2] [,3] [,4]
# [1,]  0.2  0.4  0.7  0.7
# [2,]  0.2  0.5  0.7  0.7
# [3,]  0.3  0.6  0.7  0.7

And here's another:

myClip2 <- function(x, a, b) {
    a + (x-a > 0)*(x-a) - (x-b > 0)*(x-b)
}

myClip2(-10:10, 0, 4)
# [1] 0 0 0 0 0 0 0 0 0 0 0 1 2 3 4 4 4 4 4 4 4
Saltire answered 13/12, 2012 at 21:51 Comment(2)
Great!! Thanks so much!! My function for this was waaayy slower, but this works quite fast!Hollishollister
This should be in R's base library!Fusible
A
4

I believe that would be clamp() from the raster package.

library(raster)
clamp(x, lower=-Inf, upper=Inf, ...)
Ari answered 10/1, 2018 at 3:19 Comment(1)
Wondering why no one has done timings on these alternatives?Crux

© 2022 - 2024 — McMap. All rights reserved.