numpy shorthand for taking jagged slice

H

3

5

I have an operation that I'm doing commonly which I'm calling a "jagged-slice" because I don't know the real name for it. It's best explained by example:

a = np.random.randn(50, 10)
entries_of_interest = np.random.randint(10, size = 50)  # Vector of 50 indices between 0 and 9
# Now I want the values contained in each row of a at the corresponding index in "entries of interest"
jagged_slice_of_a = a[np.arange(a.shape[0]), entries_of_interest]
# jagged_slice_of_a is now a vector with 50 elements.  Good.

Only problem is it's a bit cumbersome to do this a[np.arange(a.shape[0]), entries_of_interest] indexing (it seems silly to have to construct the "np.arange(a.shape[0])" just for the sake of this). I'd like something like the : operator for this, but the : does something else. Is there any more succinct way to do this operation?

Best answer:

No, there is no better way with native numpy. You can create a helper function for this if you want.

Harvest answered 16/11, 2015 at 12:58 Comment(0)

P

3

This is combersome only in the sense that it requires more typing for a task that seems so simple to you.

a[np.arange(a.shape[0]), entries_of_interest]

But as you note, the syntactically simpler a[:, entries_of_interest] has another interpretation in numpy. Choosing a subset of the columns of an array is a more common task that choosing one (random) item from each row.

Your case is just a specialized instance of

a[I, J]

where I and J are 2 arrays of the same shape. In the general case entries_of_interest could be smaller than a.shape[0] (not all the rows), or larger (several items from some rows), or even be 2d. It could even select certain elements repeatedly.

I have found in other SO questions that performing this kind of element selection is faster when applied to a.flat. But that requires some math to construct the I*n+J kind of flat index.

With your special knowledge of J, constructing I seems extra work, but numpy can't make that kind of assumption. If this selection was more common someone could write a function that wraps your expression

def  peter_selection(a,I):
   # check the a.shape[0]==I.shape[0]
   return a[np.arange(a.shape[0]), I]

Purpurin answered 16/11, 2015 at 17:27 Comment(1)

I thought it was short enough that I could get by without testing it in an Ipython session. :) – Purpurin 16/11, 2015 at 21:39

N

3

I think that your current method is probably the best way.

You can also use choose for this kind of selection. This is syntactically clearer, but is trickier to get right and potentially more limited. The equivalent with this method would be:

entries_of_interest.choose(a.T)

Nitrosamine answered 16/11, 2015 at 13:6 Comment(4)

We discovered in another SO question that choose is limited to 32 items, that is a.T.shape[0] can't be larger than 32. e.g. N=32;np.arange(N).choose(np.arange(N*N).reshape(N,N)) – Purpurin 16/11, 2015 at 17:34

Ah, I was aware of some limitations with choose but couldn't quite remember what they were (I don't use it a great deal). I suppose this is because choose is implemented using the multi-dimensional iterators which place a limit of the number of dimensions an array can have. – Nitrosamine 16/11, 2015 at 17:59

I handn't noticed this before, but there's a note at the end of choose doc that says, in effect, this use of the function works, but it is considered to be an abuse. choices should be, and be thought of as, a sequence, a list or tupe. – Purpurin 16/11, 2015 at 18:57

I also hadn't noticed that subtlety. I guess then that choose is useful mainly for simplifying multi-dimensional fancy indexing on small arrays, or for cases where the indexing/selection needs to be modified using the mode kwarg. Straightforward fancy indexing remains the best solution for cases like the OP's. – Nitrosamine 16/11, 2015 at 22:2

P

3

This is combersome only in the sense that it requires more typing for a task that seems so simple to you.

a[np.arange(a.shape[0]), entries_of_interest]

But as you note, the syntactically simpler a[:, entries_of_interest] has another interpretation in numpy. Choosing a subset of the columns of an array is a more common task that choosing one (random) item from each row.

Your case is just a specialized instance of

a[I, J]

where I and J are 2 arrays of the same shape. In the general case entries_of_interest could be smaller than a.shape[0] (not all the rows), or larger (several items from some rows), or even be 2d. It could even select certain elements repeatedly.

I have found in other SO questions that performing this kind of element selection is faster when applied to a.flat. But that requires some math to construct the I*n+J kind of flat index.

With your special knowledge of J, constructing I seems extra work, but numpy can't make that kind of assumption. If this selection was more common someone could write a function that wraps your expression

def  peter_selection(a,I):
   # check the a.shape[0]==I.shape[0]
   return a[np.arange(a.shape[0]), I]

Purpurin answered 16/11, 2015 at 17:27 Comment(1)

I thought it was short enough that I could get by without testing it in an Ipython session. :) – Purpurin 16/11, 2015 at 21:39

A

1

The elements in jagged_slice_of_a are the diagonal elements of a[:,entries_of_interest]

A slightly less cumbersome way of doing this would therefore be to use np.diagonal to extract them.

jagged_slice_of_a = a[:, entries_of_interest].diagonal()

Apsis answered 16/11, 2015 at 13:5 Comment(2)

Ah. Nice, but it seems to come at a cost of building a huge array for the intermediate step. Or is it just a view? – Harvest 16/11, 2015 at 13:7

Ahh you are correct - it does create the intermediate array not just a view so this method isn't particularly efficient. – Apsis 16/11, 2015 at 13:12

Best answer:

Recommended topics

Hot tags