How to ignore values when using numpy.sum and numpy.mean in matrices

Asked 4/7, 2017 at 19:55 Answered 11/9, 2024 at 10:11

Is there a way to avoid using specific values when applying sum and mean in numpy?

I'd like to avoid, for instance, the -999 value when calculating the result.

In [14]: c = np.matrix([[4., 2.],[4., 1.]])

In [15]: d = np.matrix([[3., 2.],[4., -999.]])

In [16]: np.sum([c, d], axis=0)
Out[16]:
array([[   7.,    4.],
       [   8., -998.]])

In [17]: np.mean([c, d], axis=0)
Out[17]:
array([[   3.5,    2. ],
       [   4. , -499. ]])

Coachandfour answered 4/7, 2017 at 19:55 Comment(2)

Have you tried using a mask with null values where values = -xxx? – Lingwood 4/7, 2017 at 19:57

What values do you want in [1,1] corner, inplace of the large negative? 1, masked, nan, something else? – Huai 4/7, 2017 at 20:13

Use a masked array:

>>> c = np.ma.array([[4., 2.], [4., 1.]])
>>> d = np.ma.masked_values([[3., 2.], [4., -999]], -999)

>>> np.ma.array([c, d]).sum(axis=0)
masked_array(data =
 [[7.0 4.0]
 [8.0 1.0]],
             mask =
 [[False False]
 [False False]],
       fill_value = 1e+20)

>>> np.ma.array([c, d]).mean(axis=0)
masked_array(data =
 [[3.5 2.0]
 [4.0 1.0]],
             mask =
 [[False False]
 [False False]],
       fill_value = 1e+20)

Cramp answered 4/7, 2017 at 22:17 Comment(3)

To get the resulting array you can use .data attribute – Charlettecharley 4/7, 2017 at 22:37

Or better, .filled(0) or filled(np.nan), in case a whole row is -999 – Cramp 4/7, 2017 at 22:42

Many of the np.ma functions/methods use filled(something_innocuous) when calculating new values. Either that or compressed to remove the masked values. – Huai 4/7, 2017 at 22:47

One option is to replace the specific value with np.nan and then use numpy.nansum and numpy.nanmean as commented by @s.k:

import numpy as np
def nan_if(arr, value):
    return np.where(arr == value, np.nan, arr)

np.nansum([nan_if(c, -999), nan_if(d, -999)], axis=0)
#array([[ 7.,  4.],
#       [ 8.,  1.]])

np.nanmean([nan_if(c, -999), nan_if(d, -999)], axis=0)
#array([[ 3.5,  2. ],
#       [ 4. ,  1. ]])

Aixlachapelle answered 4/7, 2017 at 20:2 Comment(3)

I was interested to find that numpy.ma seemed to have no usefulness here. For instance, np.sum((ma.masked_values(c, -999), ma.masked_values(d, -999)), axis=1) doesn't give desired output... – Charlettecharley 4/7, 2017 at 20:18

@BradSolomon: use np.ma.sum – Cramp 4/7, 2017 at 22:17

@Cramp good find, I saw but didn't think it computed element-wise – Charlettecharley 4/7, 2017 at 22:39

Use a masked array:

>>> c = np.ma.array([[4., 2.], [4., 1.]])
>>> d = np.ma.masked_values([[3., 2.], [4., -999]], -999)

>>> np.ma.array([c, d]).sum(axis=0)
masked_array(data =
 [[7.0 4.0]
 [8.0 1.0]],
             mask =
 [[False False]
 [False False]],
       fill_value = 1e+20)

>>> np.ma.array([c, d]).mean(axis=0)
masked_array(data =
 [[3.5 2.0]
 [4.0 1.0]],
             mask =
 [[False False]
 [False False]],
       fill_value = 1e+20)

Cramp answered 4/7, 2017 at 22:17 Comment(3)

To get the resulting array you can use .data attribute – Charlettecharley 4/7, 2017 at 22:37

Or better, .filled(0) or filled(np.nan), in case a whole row is -999 – Cramp 4/7, 2017 at 22:42

Many of the np.ma functions/methods use filled(something_innocuous) when calculating new values. Either that or compressed to remove the masked values. – Huai 4/7, 2017 at 22:47

In the latest version of numpy, np.sum and np.mean have a where parameter to specify which elements to include. This parameter is added for np.sum in v1.17.0 and for np.mean in v1.20.0.

For your example, you can set the parameter as where=(np.array([c, d]) > 0) to only include positive elements:

>>> e = np.array([c, d])
>>> np.sum(e, axis=0, where=(e>0))
array([[7., 4.],
       [8., 1.]])

Allseed answered 11/9, 2024 at 10:11 Comment(0)

Recommended topics

Hot tags