How to ignore values when using numpy.sum and numpy.mean in matrices
Asked Answered
C

3

7

Is there a way to avoid using specific values when applying sum and mean in numpy?

I'd like to avoid, for instance, the -999 value when calculating the result.

In [14]: c = np.matrix([[4., 2.],[4., 1.]])

In [15]: d = np.matrix([[3., 2.],[4., -999.]])

In [16]: np.sum([c, d], axis=0)
Out[16]:
array([[   7.,    4.],
       [   8., -998.]])

In [17]: np.mean([c, d], axis=0)
Out[17]:
array([[   3.5,    2. ],
       [   4. , -499. ]])
Coachandfour answered 4/7, 2017 at 19:55 Comment(2)
Have you tried using a mask with null values where values = -xxx?Lingwood
What values do you want in [1,1] corner, inplace of the large negative? 1, masked, nan, something else?Huai
C
9

Use a masked array:

>>> c = np.ma.array([[4., 2.], [4., 1.]])
>>> d = np.ma.masked_values([[3., 2.], [4., -999]], -999)

>>> np.ma.array([c, d]).sum(axis=0)
masked_array(data =
 [[7.0 4.0]
 [8.0 1.0]],
             mask =
 [[False False]
 [False False]],
       fill_value = 1e+20)

>>> np.ma.array([c, d]).mean(axis=0)
masked_array(data =
 [[3.5 2.0]
 [4.0 1.0]],
             mask =
 [[False False]
 [False False]],
       fill_value = 1e+20)
Cramp answered 4/7, 2017 at 22:17 Comment(3)
To get the resulting array you can use .data attributeCharlettecharley
Or better, .filled(0) or filled(np.nan), in case a whole row is -999Cramp
Many of the np.ma functions/methods use filled(something_innocuous) when calculating new values. Either that or compressed to remove the masked values.Huai
A
9

One option is to replace the specific value with np.nan and then use numpy.nansum and numpy.nanmean as commented by @s.k:

import numpy as np
def nan_if(arr, value):
    return np.where(arr == value, np.nan, arr)

np.nansum([nan_if(c, -999), nan_if(d, -999)], axis=0)
#array([[ 7.,  4.],
#       [ 8.,  1.]])

np.nanmean([nan_if(c, -999), nan_if(d, -999)], axis=0)
#array([[ 3.5,  2. ],
#       [ 4. ,  1. ]])
Aixlachapelle answered 4/7, 2017 at 20:2 Comment(3)
I was interested to find that numpy.ma seemed to have no usefulness here. For instance, np.sum((ma.masked_values(c, -999), ma.masked_values(d, -999)), axis=1) doesn't give desired output...Charlettecharley
@BradSolomon: use np.ma.sumCramp
@Cramp good find, I saw but didn't think it computed element-wiseCharlettecharley
C
9

Use a masked array:

>>> c = np.ma.array([[4., 2.], [4., 1.]])
>>> d = np.ma.masked_values([[3., 2.], [4., -999]], -999)

>>> np.ma.array([c, d]).sum(axis=0)
masked_array(data =
 [[7.0 4.0]
 [8.0 1.0]],
             mask =
 [[False False]
 [False False]],
       fill_value = 1e+20)

>>> np.ma.array([c, d]).mean(axis=0)
masked_array(data =
 [[3.5 2.0]
 [4.0 1.0]],
             mask =
 [[False False]
 [False False]],
       fill_value = 1e+20)
Cramp answered 4/7, 2017 at 22:17 Comment(3)
To get the resulting array you can use .data attributeCharlettecharley
Or better, .filled(0) or filled(np.nan), in case a whole row is -999Cramp
Many of the np.ma functions/methods use filled(something_innocuous) when calculating new values. Either that or compressed to remove the masked values.Huai
A
0

In the latest version of numpy, np.sum and np.mean have a where parameter to specify which elements to include. This parameter is added for np.sum in v1.17.0 and for np.mean in v1.20.0.

For your example, you can set the parameter as where=(np.array([c, d]) > 0) to only include positive elements:

>>> e = np.array([c, d])
>>> np.sum(e, axis=0, where=(e>0))
array([[7., 4.],
       [8., 1.]])
Allseed answered 11/9 at 10:11 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.