NumPy indexing: broadcasting with Boolean arrays
Asked Answered
S

1

7

Related to this question, I came across an indexing behaviour via Boolean arrays and broadcasting I do not understand. We know it's possible to index a NumPy array in 2 dimensions using integer indices and broadcasting. This is specified in the docs:

a = np.array([[ 0,  1,  2,  3],
              [ 4,  5,  6,  7],
              [ 8,  9, 10, 11]])

b1 = np.array([False, True, True])
b2 = np.array([True, False, True, False])

c1 = np.where(b1)[0]  # i.e. [1, 2]
c2 = np.where(b2)[0]  # i.e. [0, 2]

a[c1[:, np.newaxis], c2]  # or a[c1[:, None], c2]

array([[ 4,  6],
       [ 8, 10]])

However, the same does not work for Boolean arrays.

a[b1[:, None], b2]

IndexError: too many indices for array

The alternative numpy.ix_ works for both integer and Boolean arrays. This seems to be because ix_ performs specific manipulation for Boolean arrays to ensure consistent treatment.

assert np.array_equal(a[np.ix_(b1, b2)], a[np.ix_(c1, c2)])

array([[ 4,  6],
       [ 8, 10]])

So my question is: why does broadcasting work with integers, but not with Boolean arrays? Is this behaviour documented? Or am I misunderstanding a more fundamental issue?

Seel answered 4/7, 2018 at 16:26 Comment(7)
AFAIK internally those boolean arrays are converted to integer equivalent ones : np.flatnonzero(ar) before being used for indexing. As such np.flatnonzero(b1[:, np.newaxis]) won't be equivalent of np.flatnonzero(b1)[:,None].Thurber
@Divakar, Do you mean b1[:, None] versus np.flatnonzero(b1)[:,None] will be treated differently? The 2 arrays in your comment are identical except for shape (2, 1) vs (2,)Seel
I think it's more of np.nonzero(b1[:,None]) and np.nonzero(b2) not being broadcastable against each other w.r.t. each of its element of their respective tuples as opposed to directly feeding in the integer arrays for indexing that are broadcastable with np.ix_(c1, c2).Thurber
Ok, that makes sense. I just found this bit in the docs to confirm: "if an index includes a Boolean array, the result will be identical to inserting obj.nonzero()". Since this step isn't required for integers, it only affects Boolean arrays. Thanks for your help (as always). Feel free to drop an answer, or I'll add one myself when I have time.Seel
Thanks for the accept, but @Thurber is the one who answered your actual question, I really just wanted to add additional perspective. I'd hate to take his credit, and would be more comfortable if you didn't accept my answer.Epigram
@AndrasDeak Think it doesn't matter who answers it as long as it answers the question to the satisfaction of OP. Would encourage OP to accept, if so.Thurber
@Thurber I strive to be fair at all times. In this case thanks, I'll edit with your solution to make it a complete answer (and next time wait until you post an answer yourself ;)Epigram
F
7

As @Divakar noted in comments, Boolean advanced indices behave as if they were first fed through np.nonzero and then broadcast together, see the relevant documentation for extensive explanations. To quote the docs,

In general if an index includes a Boolean array, the result will be identical to inserting obj.nonzero() into the same position and using the integer array indexing mechanism described above. x[ind_1, boolean_array, ind_2] is equivalent to x[(ind_1,) + boolean_array.nonzero() + (ind_2,)].
[...]
Combining multiple Boolean indexing arrays or a Boolean with an integer indexing array can best be understood with the obj.nonzero() analogy. The function ix_ also supports boolean arrays and will work without any surprises.

In your case broadcasting would not necessarily be a problem, since both arrays have only two nonzero elements. The problem is the number of dimensions in the result:

>>> len(b1[:,None].nonzero())
2
>>> len(b2.nonzero())
1

Consequently the indexing expression a[b1[:,None], b2] would be equivalent to a[b1[:,None].nonzero() + b2.nonzero()], which would put a length-3 tuple inside a, corresponding to a 3d array index. Hence the error you see about "too many indices".

The surprises mentioned in the docs are very close to your example: what if you hadn't injected that singleton dimension? Starting from a length-3 and a length-4 Boolean array you would've ended up with a length-2 advanced index, i.e. a 1d array of size (2,). This is never what you'd want, which is leads us to another piece of trivia in the subject.

There's been a lot of discussion in planning to revamp advanced indexing, see the work-in-progress draft NEP 21. The gist of the issue is that fancy indexing in numpy, while clearly documented, has some very quirky features which aren't practically useful for anything, but which can bite you if you make a mistake by producing surprising results rather than errors.

A relevant quote from the NEP:

Mixed cases involving multiple array indices are also surprising, and only less problematic because the current behavior is so useless that it is rarely encountered in practice. When a boolean array index is mixed with another boolean or integer array, boolean array is converted to integer array indices (equivalent to np.nonzero()) and then broadcast. For example, indexing a 2D array of size (2, 2) like x[[True, False], [True, False]] produces a 1D vector with shape (1,), not a 2D sub-matrix with shape (1, 1).

Now, I emphasize that the NEP is very much work-in-progress, but one of the suggestions in the current state of the NEP is to forbid Boolean arrays in advanced indexing cases such as the above, and only allow them in "outer indexing" scenarios, i.e. exactly what np.ix_ would help you do with your Boolean array:

Boolean indexing is conceptionally outer indexing. Broadcasting together with other advanced indices in the manner of legacy indexing [i.e. the current behaviour] is generally not helpful or well defined. A user who wishes the "nonzero" plus broadcast behaviour can thus be expected to do this manually.

My point is that the behaviour of Boolean advanced indices and their deprecation status (or lack thereof) may change in the not-so-distant future.

Flesh answered 4/7, 2018 at 17:10 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.