How to properly mask a numpy 2D array?
Asked Answered
A

7

44

Say I have a two dimensional array of coordinates that looks something like

x = array([[1,2],[2,3],[3,4]])

Previously in my work so far, I generated a mask that ends up looking something like

mask = [False,False,True]

When I try to use this mask on the 2D coordinate vector, I get an error

newX = np.ma.compressed(np.ma.masked_array(x,mask))

>>>numpy.ma.core.MaskError: Mask and data not compatible: data size 
   is 6, mask size is 3.`

which makes sense, I suppose. So I tried to simply use the following mask instead:

mask2 = np.column_stack((mask,mask))
newX = np.ma.compressed(np.ma.masked_array(x,mask2))

And what I get is close:

>>>array([1,2,2,3])

to what I would expect (and want):

>>>array([[1,2],[2,3]])

There must be an easier way to do this?

Adios answered 5/7, 2016 at 1:18 Comment(0)
S
28

Is this what you are looking for?

import numpy as np
x[~np.array(mask)]
# array([[1, 2],
#        [2, 3]])

Or from numpy masked array:

newX = np.ma.array(x, mask = np.column_stack((mask, mask)))
newX

# masked_array(data =
#  [[1 2]
#  [2 3]
#  [-- --]],
#              mask =
#  [[False False]
#  [False False]
#  [ True  True]],
#        fill_value = 999999)
Soissons answered 5/7, 2016 at 1:30 Comment(0)
G
15

With np.where you can do all sorts of things:

x_maskd = np.where(mask, x, 0)

np.where takes three arguments, a condition, x, and y. All three arguments must be broadcast-able to the same shape. In locations where mask is True, the x value is returned. Otherwise, the y value is returned.

Grow answered 20/1, 2020 at 14:11 Comment(1)
Not many understand that np.where is a line-saver !Flannery
B
9

Your x is 3x2:

In [379]: x
Out[379]: 
array([[1, 2],
       [2, 3],
       [3, 4]])

Make a 3 element boolean mask:

In [380]: rowmask=np.array([False,False,True])

That can be used to select the rows where it is True, or where it is False. In both cases the result is 2d:

In [381]: x[rowmask,:]
Out[381]: array([[3, 4]])

In [382]: x[~rowmask,:]
Out[382]: 
array([[1, 2],
       [2, 3]])

This is without using the MaskedArray subclass. To make such array, we need a mask that matches x in shape. There isn't provision for masking just one dimension.

In [393]: xmask=np.stack((rowmask,rowmask),-1)  # column stack

In [394]: xmask
Out[394]: 
array([[False, False],
       [False, False],
       [ True,  True]], dtype=bool)

In [395]: np.ma.MaskedArray(x,xmask)
Out[395]: 
masked_array(data =
 [[1 2]
 [2 3]
 [-- --]],
             mask =
 [[False False]
 [False False]
 [ True  True]],
       fill_value = 999999)

Applying compressed to that produces a raveled array: array([1, 2, 2, 3])

Since masking is element by element, it could mask one element in row 1, 2 in row 2 etc. So in general compressing, removing the masked elements, will not yield a 2d array. The flattened form is the only general choice.

np.ma makes most sense when there's a scattering of masked values. It isn't of much value if you want want to select, or deselect, whole rows or columns.

===============

Here are more typical masked arrays:

In [403]: np.ma.masked_inside(x,2,3)
Out[403]: 
masked_array(data =
 [[1 --]
 [-- --]
 [-- 4]],
             mask =
 [[False  True]
 [ True  True]
 [ True False]],
       fill_value = 999999)

In [404]: np.ma.masked_equal(x,2)
Out[404]: 
masked_array(data =
 [[1 --]
 [-- 3]
 [3 4]],
             mask =
 [[False  True]
 [ True False]
 [False False]],
       fill_value = 2)

In [406]: np.ma.masked_outside(x,2,3)
Out[406]: 
masked_array(data =
 [[-- 2]
 [2 3]
 [3 --]],
             mask =
 [[ True False]
 [False False]
 [False  True]],
       fill_value = 999999)
Belicia answered 5/7, 2016 at 2:37 Comment(0)
E
3

Since none of these solutions worked for me, I thought to write down what solution did, maybe it will useful for somebody else. I use python 3.x and I worked on two 3D arrays. One, which I call data_3D contains float values of recordings in a brain scan, and the other, template_3D contains integers which represent regions of the brain. I wanted to choose those values from data_3D corresponding to an integer region_code as per template_3D:

my_mask = np.in1d(template_3D, region_code).reshape(template_3D.shape)
data_3D_masked = data_3D[my_mask]

which gives me a 1D array of only relevant recordings.

Ellsworthellwood answered 8/6, 2017 at 10:31 Comment(0)
S
3

If you have

A =  [[  8.   0. 165.  22. 164.  47. 184. 185.]
      [  0.   6. -74. -27.  63.  49. -46. -48.]
      [165. -74.   0.   0.   0.   0.   0.   0.]
      [ 22. -27.   0.   0.   0.   0.   0.   0.]
      [164.  63.   0.   0.   0.   0.   0.   0.]
      [ 47.  49.   0.   0.   0.   0.   0.   0.]
      [184. -46.   0.   0.   0.   0.   0.   0.]
      [185. -48.   0.   0.   0.   0.   0.   0.]]

and your mask is

mask = np.array([True, True, True, False, True, False, True, False])

then your masked A becomes

A[mask, :][:, mask] = [[  8.   0. 165. 164. 184.]
                       [  0.   6. -74.  63. -46.]
                       [165. -74.   0.   0.   0.]
                       [164.  63.   0.   0.   0.]
                       [184. -46.   0.   0.   0.]]
Socman answered 22/9, 2021 at 11:46 Comment(0)
M
1

In your last example, the problem is not the mask. It is your use of compressed. From the docstring of compressed:

Return all the non-masked data as a 1-D array.

So compressed flattens the nonmasked values into a 1-d array. (It has to, because there is no guarantee that the compressed data will have an n-dimensional structure.)

Take a look at the masked array before you compress it:

In [8]: np.ma.masked_array(x, mask2)

Out[8]: 
masked_array(data =
 [[1 2]
 [2 3]
 [-- --]],
             mask =
 [[False False]
 [False False]
 [ True  True]],
       fill_value = 999999)
Madlin answered 5/7, 2016 at 1:46 Comment(2)
You're right, its correct before I compress it. I will read the documentation for a way to remove masked elements while preserving array dimensionality. ThanksAdios
If I understand what you are trying to do, @Psidom's first suggestion looks reasonable. In particular, you probably don't need a masked array. Just index a regular array with a boolean array.Madlin
F
0

masked_X = np.where(mask, X, 0) is the fastest & the simplest way to mask a data :

X = np.array([[2,-1,4],
              [3,-3,1],
              [9,-7,2]])

mask = np.identity(3)

time measure :

%timeit np.where(mask,X,0)

969 ns ± 14.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

%timeit np.ma.array(X, mask=mask)

6.47 µs ± 85.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

I let you conclude !

Flannery answered 17/11, 2022 at 15:38 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.