Pythonic way to replace list values with upper and lower bound (clamping, clipping, thresholding)?
Asked Answered
D

2

25

I want to replace outliners from a list. Therefore I define a upper and lower bound. Now every value above upper_bound and under lower_bound is replaced with the bound value. My approach was to do this in two steps using a numpy array.

Now I wonder if it's possible to do this in one step, as I guess it could improve performance and readability.

Is there a shorter way to do this?

import numpy as np

lowerBound, upperBound = 3, 7

arr = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

arr[arr > upperBound] = upperBound
arr[arr < lowerBound] = lowerBound

# [3 3 3 3 4 5 6 7 7 7]
print(arr)

See How can I clamp (clip, restrict) a number to some range? for clamping individual values, including non-Numpy approaches.

Deposition answered 26/12, 2016 at 10:17 Comment(4)
While it is nice that there's a compiled clip method, there's nothing un-pythonic about your code. It is a perfectly good use of numpy, and just as readable (to an experienced user). Keep that concept in your toolbox; it works in cases that don't quite fit the clip model.Dippold
This operation is generally called clamping, clipping or else thresholdingKannada
You should use the clip method but there is another reason than speed; your code is elegant but creates an intermediate array with arr > upperBound which could be an issue if the array gets large.Unrequited
@Dippold thanks for your comment. By the term "pythonic" I meant short and fast. I am aware my solution is not unpythonic, but the clip() method is enough for my special use case. The steps 1) doing it on your own 2) understanding the concept and 3) using a library are a good way to go :)Deposition
A
37

You can use numpy.clip:

In [1]: import numpy as np

In [2]: arr = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [3]: lowerBound, upperBound = 3, 7

In [4]: np.clip(arr, lowerBound, upperBound, out=arr)
Out[4]: array([3, 3, 3, 3, 4, 5, 6, 7, 7, 7])

In [5]: arr
Out[5]: array([3, 3, 3, 3, 4, 5, 6, 7, 7, 7])
Artificial answered 26/12, 2016 at 10:20 Comment(5)
Hi @arthur, thanks that's exactly what I was looking for! I somehow missed the key word clip and didn't find the method myself...Deposition
I wonder how clip is written. It could be doing the same thing, just wrapped in a function call.Dippold
@Dippold did you find out?Field
AFAICT looks like it's in C.Field
Like many other functions, np.clip is python, but it defers to arr.clip, the method. For regular arrays that method is compiled, so will be faster (about 2x).Dippold
O
14

For an alternative that doesn't rely on numpy, you could always do

arr = [max(lower_bound, min(x, upper_bound)) for x in arr]

If you just wanted to set an upper bound, you could of course write arr = [min(x, upper_bound) for x in arr]. Or similarly if you just wanted a lower bound, you'd use max instead.

Here, I've just applied both operations, written together.

Edit: Here's a slightly more in-depth explanation:

Given an element x of the array (and assuming that your upper_bound is at least as big as your lower_bound!), you'll have one of three cases:

  1. x < lower_bound
  2. x > upper_bound
  3. lower_bound <= x <= upper_bound.

In case 1, the max/min expression first evaluates to max(lower_bound, x), which then resolves to lower_bound.

In case 2, the expression first becomes max(lower_bound, upper_bound), which then becomes upper_bound.

In case 3, we get max(lower_bound, x) which resolves to just x.

In all three cases, the output is what we want.

Oran answered 26/12, 2016 at 15:24 Comment(5)
just my complaint (no vote), I tend to have to think really hard when I see max/min combinations and find them not that readable.Field
@Field Sure, I don't disagree with that. On the other hand, the other answer to this point uses numpy.clip, which would not be immediately readable to me if I came across it somewhere--I'd probably want to double-check the numpy documentation, or else just guess what it did, and hope that the author got it right.Oran
What's weird is the nesting. It's a very symmetric operation that consists of "clip once, "clip twice." This is "clip once, then clip that again."Field
@Field Well...hmm. I guess to me, "clip once, clip twice" sounds very similar to "clip once, then clip that again", so I'm not sure if I completely understand your objection. But I do agree that using max/min together imposes some cognitive load...or else, requires some explanation. So I tried to give a (brief) explanation as well as the code. (However, I've said a lot more in the comments than I did in my answer, so that suggests that perhaps my answer was a little too brief!)Oran
Nice one-liner! Always good to have an alternative solution - as I must say that it lacks of readabilityDeposition

© 2022 - 2024 — McMap. All rights reserved.