How to normalize a NumPy array to within a certain range?
Asked Answered
A

8

224

After doing some processing on an audio or image array, it needs to be normalized within a range before it can be written back to a file. This can be done like so:

# Normalize audio channels to between -1.0 and +1.0
audio[:,0] = audio[:,0]/abs(audio[:,0]).max()
audio[:,1] = audio[:,1]/abs(audio[:,1]).max()

# Normalize image to between 0 and 255
image = image/(image.max()/255.0)

Is there a less verbose, convenience function way to do this? matplotlib.colors.Normalize() doesn't seem to be related.

Amalekite answered 14/11, 2009 at 17:52 Comment(0)
F
213
# Normalize audio channels to between -1.0 and +1.0
audio /= np.max(np.abs(audio),axis=0)
# Normalize image to between 0 and 255
image *= (255.0/image.max())

Using /= and *= allows you to eliminate an intermediate temporary array, thus saving some memory. Multiplication is less expensive than division, so

image *= 255.0/image.max()    # Uses 1 division and image.size multiplications

is marginally faster than

image /= image.max()/255.0    # Uses 1+image.size divisions

Since we are using basic numpy methods here, I think this is about as efficient a solution in numpy as can be.


In-place operations do not change the dtype of the container array. Since the desired normalized values are floats, the audio and image arrays need to have floating-point point dtype before the in-place operations are performed. If they are not already of floating-point dtype, you'll need to convert them using astype. For example,

image = image.astype('float64')
Fabri answered 14/11, 2009 at 18:22 Comment(13)
Why is multiplication less expensive than division?Amalekite
I don't know exactly why. However, I am confident of the claim, having checked it with timeit. With multiplication, you can work with one digit at a time. With division, especially with large divisors, you have to work with many digits, and "guess" how many times the divisor goes into the dividend. You end up doing many multiplication problems to solve one division problem. The computer algorithm for doing division may not be the same as human long division, but nevertheless I believe it's more complicated than multiplication.Fabri
Probably worth mentioning a divide by zero for blank images.Ruskin
@Amalekite multiplication is less expensive than division because of the way its implemented on the Assembly level. Division algorithms can't be parallelized as well as multiplication algorithms. en.wikipedia.org/wiki/Binary_multiplierMarniemaro
@mjones.udri Yeah but if you're dividing an entire array by a scalar, shouldn't it save time by multiplying by the scalar's inverse?Amalekite
@Amalekite if you tell it to behave that way, then yes.Marniemaro
@mjones.udri Are there numerical problems with doing it automatically, though?Amalekite
@Amalekite nope! Think about the definition of division: multiplication by inverse. 10 / 5 = 10 * (1 / 5)Marniemaro
Minimizing the number of divisions in favor of multiplications is a well know optimization technique.Marniemaro
@mjones.udri Yes that's true for mathematical concepts but I'm asking if it's true for fixed-length floating-point numbers. Does it cause any numerical error in odd cases like denormals, etc?Amalekite
@Amalekite with floating point numbers that's possible, but I'm not sure. I think it depends on how much precision (if any) is lost during manipulation.Marniemaro
you eliminate an intermediate temporary array, but you may get TypeError: Cannot cast ufunc multiply output from dtype('float64') to dtype('int64') with casting rule 'same_kind' if image is int type.Protactinium
what am I missing, won't this normalize between -255 and 255, after multiplying the [-1, 1] normalized array by 255?Indifferent
U
130

If the array contains both positive and negative data, I'd go with:

import numpy as np

a = np.random.rand(3,2)

# Normalised [0,1]
b = (a - np.min(a))/np.ptp(a)

# Normalised [0,255] as integer: don't forget the parenthesis before astype(int)
c = (255*(a - np.min(a))/np.ptp(a)).astype(int)        

# Normalised [-1,1]
d = 2.*(a - np.min(a))/np.ptp(a)-1

If the array contains nan, one solution could be to just remove them as:

def nan_ptp(a):
    return np.ptp(a[np.isfinite(a)])

b = (a - np.nanmin(a))/nan_ptp(a)

However, depending on the context you might want to treat nan differently. E.g. interpolate the value, replacing in with e.g. 0, or raise an error.

Finally, worth mentioning even if it's not OP's question, standardization:

e = (a - np.mean(a)) / np.std(a)
Unbiased answered 8/7, 2017 at 7:8 Comment(4)
The last one is also available as scipy.stats.zscore.Clang
d might flip the sign of samples. If you want to keep the sign you can use: f = a / np.max(np.abs(a))... unless the whole array all zeroes (avoid DivideByZero).Eulaheulalee
Please make sure ptp value is not 0 to not receive nan.Parsonage
numpy.ptp() returns 0, if that is the range, but nan if there is one nan in the array. However, if the range is 0, normalization is not defined. This raises an error as we attempt to divide with 0.Unbiased
M
46

You can also rescale using sklearn.preprocessing.scale. The advantages are that you can adjust normalize the standard deviation, in addition to mean-centering the data, and that you can do this on either axis, by features, or by records.

from sklearn.preprocessing import scale
X = scale(X, axis=0, with_mean=True, with_std=True, copy=True)

The keyword arguments axis, with_mean, with_std are self explanatory, and are shown in their default state. The argument copy performs the operation in-place if it is set to False.

Modestomodesty answered 17/1, 2014 at 20:53 Comment(5)
X = scale( [1,2,3,4], axis=0, with_mean=True, with_std=True, copy=True ) gives me an errorCastello
X = scale( np.array([1,2,3,4]), axis=0, with_mean=True, with_std=True, copy=True ) gives me an array of [0,0,0,0]Castello
sklearn.preprocessing.scale() has the backdraw that you do not know what is going on. What is the factor? What compression of the interval?Childbearing
These scikit preprocessing methods (scale, minmax_scale, maxabs_scale) are meant to be used along one axis only (so either scale the samples (rows) or the features (columns) individually. This makes sense in a machine learing setup, but sometimes you want to calculate the range over the whole array, or use arrays with more than two dimensions.Circassian
Does not work for arrays with dimension > 2.Cellaret
V
22

You are trying to min-max scale the values of audio between -1 and +1 and image between 0 and 255.

Using sklearn.preprocessing.minmax_scale, should easily solve your problem.

e.g.:

audio_scaled = minmax_scale(audio, feature_range=(-1,1))

and

shape = image.shape
image_scaled = minmax_scale(image.ravel(), feature_range=(0,255)).reshape(shape)

note: Not to be confused with the operation that scales the norm (length) of a vector to a certain value (usually 1), which is also commonly referred to as normalization.

Vaughnvaught answered 5/4, 2019 at 1:8 Comment(0)
E
20

This answer to a similar question solved the problem for me with

np.interp(a, (a.min(), a.max()), (-1, +1))
Epilate answered 24/4, 2021 at 16:12 Comment(1)
pretty cool little trick! I'll add to my numpy toolbox...Thermography
B
12

You can use the "i" (as in idiv, imul..) version, and it doesn't look half bad:

image /= (image.max()/255.0)

For the other case you can write a function to normalize an n-dimensional array by colums:

def normalize_columns(arr):
    rows, cols = arr.shape
    for col in xrange(cols):
        arr[:,col] /= abs(arr[:,col]).max()
Beaverette answered 14/11, 2009 at 18:4 Comment(2)
Can you clarify this? The parentheses make it behave differently than without?Amalekite
parantheses don't change anything. the point was to use /= instead of = .. / .. Beaverette
I
6

A simple solution is using the scalers offered by the sklearn.preprocessing library.

scaler = sk.MinMaxScaler(feature_range=(0, 250))
scaler = scaler.fit(X)
X_scaled = scaler.transform(X)
# Checking reconstruction
X_rec = scaler.inverse_transform(X_scaled)

The error X_rec-X will be zero. You can adjust the feature_range for your needs, or even use a standart scaler sk.StandardScaler()

Ivy answered 21/2, 2018 at 14:59 Comment(2)
does not work for 1D arrayCorbeil
Sure, if you consult the documentation of the function (scikit-learn.org/stable/modules/generated/…) the array X: Xarray-like of shape (n_samples, n_features) The data used to compute the per-feature minimum and maximum used for later scaling along the features axis. You can just do X=X[..., np.newaxis] (multiple samples, one feature) and it will work for 1-D array.Ivy
I
3

I tried following this, and got the error

TypeError: ufunc 'true_divide' output (typecode 'd') could not be coerced to provided output parameter (typecode 'l') according to the casting rule ''same_kind''

The numpy array I was trying to normalize was an integer array. It seems they deprecated type casting in versions > 1.10, and you have to use numpy.true_divide() to resolve that.

arr = np.array(img)
arr = np.true_divide(arr,[255.0],out=None)

img was an PIL.Image object.

Interstice answered 14/5, 2018 at 6:50 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.