np.add.at indexing with array
Asked Answered
P

3

20

I'm working on cs231n and I'm having a difficult time understanding how this indexing works. Given that

x = [[0,4,1], [3,2,4]]
dW = np.zeros(5,6)
dout = [[[  1.19034710e-01  -4.65005990e-01   8.93743168e-01  -9.78047129e-01
            -8.88672957e-01  -4.66605091e-01]
         [ -1.38617461e-03  -2.64569728e-01  -3.83712733e-01  -2.61360826e-01
            8.07072009e-01  -5.47607277e-01]
         [ -3.97087458e-01  -4.25187949e-02   2.57931759e-01   7.49565950e-01
           1.37707667e+00   1.77392240e+00]]

       [[ -1.20692745e+00  -8.28111550e-01   6.53041092e-01  -2.31247762e+00
         -1.72370321e+00   2.44308033e+00]
        [ -1.45191870e+00  -3.49328154e-01   6.15445782e-01  -2.84190582e-01
           4.85997687e-02   4.81590106e-01]
        [ -1.14828583e+00  -9.69055406e-01  -1.00773809e+00   3.63553835e-01
          -1.28078363e+00  -2.54448436e+00]]]

The operation they do is

np.add.at(dW, x, dout)

x is a two dimensional array. How does indexing work here? I went through np.ufunc.at documentation but they have simple examples with 1d array and constant:

np.add.at(a, [0, 1, 2, 2], 1)
Planetary answered 3/8, 2017 at 2:36 Comment(1)
#44737879 is an example of add.at with a 2d arrayDicot
D
18
In [226]: x = [[0,4,1], [3,2,4]]
     ...: dW = np.zeros((5,6),int)

In [227]: np.add.at(dW,x,1)
In [228]: dW
Out[228]: 
array([[0, 0, 0, 1, 0, 0],
       [0, 0, 0, 0, 1, 0],
       [0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0],
       [0, 0, 1, 0, 0, 0]])

With this x there aren't any duplicate entries, so add.at is the same as using += indexing. Equivalently we can read the changed values with:

In [229]: dW[x[0], x[1]]
Out[229]: array([1, 1, 1])

The indices work the same either way, including broadcasting:

In [234]: dW[...]=0
In [235]: np.add.at(dW,[[[1],[2]],[2,4,4]],1)
In [236]: dW
Out[236]: 
array([[0, 0, 0, 0, 0, 0],
       [0, 0, 1, 0, 2, 0],
       [0, 0, 1, 0, 2, 0],
       [0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0]])

possible values

The values have to be broadcastable, with respect to the indexes:

In [112]: np.add.at(dW,[[[1],[2]],[2,4,4]],np.ones((2,3)))
...
In [114]: np.add.at(dW,[[[1],[2]],[2,4,4]],np.ones((2,3)).ravel())
...
ValueError: array is not broadcastable to correct shape
In [115]: np.add.at(dW,[[[1],[2]],[2,4,4]],[1,2,3])

In [117]: np.add.at(dW,[[[1],[2]],[2,4,4]],[[1],[2]])

In [118]: dW
Out[118]: 
array([[ 0,  0,  0,  0,  0,  0],
       [ 0,  0,  3,  0,  9,  0],
       [ 0,  0,  4,  0, 11,  0],
       [ 0,  0,  0,  0,  0,  0],
       [ 0,  0,  0,  0,  0,  0]])

In this case the indices define a (2,3) shape, so (2,3),(3,), (2,1), and scalar values work. (6,) does not.

In this case, add.at is mapping a (2,3) array onto a (2,2) subarray of dW.

Dicot answered 3/8, 2017 at 4:13 Comment(3)
I get it when we add integers but I still have a difficult time understanding when adding arrays. For instance, I'm doing np.add.at(dW, x, dout), and dout is a 3 dimensional array. How does np.add.at work in that scenario?Planetary
The indices in [235] broadcast to a (2,3) shape. I'd have to test when dout itself has to be (2,3), or whether it just has to have 6 elements. It might even work with a broadcastable shape (2,1) or (3,). The thing to do is experiment and see what works.Dicot
Testing shows that broadcastability is the key. dout has to broadcast with the elements of x.Dicot
I
7

recently I also have a hard time to understand this line of code. Hope what I got can help you, correct me if I am wrong.

The three arrays in this line of code is following:

x , whose shape is (N,T)
dW,  ---(V,D)
dout ---(N,T,D)

Then we come to the line code we want to figure out what happens

np.add.at(dW, x, dout)

If you dont want to know the thinking procedure. The above code is equivalent to :

for row in range(N):
   for col in range(T):
      dW[ x[row,col]  , :] += dout[row,col, :]

This is the thinking procedure:

Refering to this doc

https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.ufunc.at.html

We know that the x is the index array. So the key is to understand dW[x]. This is the concept of indexing an array(dW) using another array(x). If you are not familiar with this concept, can check out this link

https://docs.scipy.org/doc/numpy-1.13.0/user/basics.indexing.html

Generally speaking, what is returned when index arrays are used is an array with the same shape as the index array, but with the type and values of the array being indexed.

dW[x] will give us an array whose shape is (N,T,D), the (N,T) part comes from x, and the (D) comes from dW (V,D). Note here, every element of x is inside the range of [0, v).

Let's take some number as concrete example

x:    np.array([[0,0],[0,0]]) ---- (2,2) N=2, T=2
dW:   np.array([[0,0],[2,2]]) ---- (2,2) V=2, D=2
dout: np.arange(1,9).reshape(2,2,2)  ----(2,2,2) N=2, T=2, D=2

dW[x] should be [ [[0 0] #this comes from the dW's firt row
                  [0 0]]

                  [[0 0]
                   [0 0]] ]

dW[x] add dout means that add the elemnet item(here, this some trick, later will explian)

np.add.at(dW, x, dout) gives 
 [ [16 20]
   [ 2  2] ]

Why? The procedure is:

It add [1,2] to the first row of dW, which is [0,0].

Why first row? Because the x[0,0] = 0, indicating the first row of dW, dW[0] = dW[0,:] = the first row.

Then it add [3,4] to the first row of dW[0,0]. [3,4]=dout[0,1,:]. [0,0] again, comes from the dW, x[0,1] = 0, still the first row of dW[0].

Then it add [5,6] to the first row of dW.

Then it add [7,8] to the first row of dW.

So the result is [1+3+5+7, 2+4+6+8] = [16,20]. Because we do not touch the second row of dW. The dW's second row remains unchanged.

The trick is that we will only count the origin row once, can think that there is no buffer, and every step plays in the original place.

Illuminating answered 25/1, 2018 at 9:2 Comment(0)
R
1

Let's consider an example based on this assignment from cs231n. If we are talking about multiple directions it's much easier to use a concrete settings.

np.random.seed(1)
N, T, V, D = 2, 3, 7, 6
x = np.random.randint(V, size=(N, T))
dW_man = np.zeros((V, D))

dW_man[x].shape, x.shape
((2, 3, 6), (2, 3))

x
array([[5, 3, 4],
   [0, 1, 3]])

dout = np.arange(2*3*6).reshape(dW_man[x].shape)
dout
array([[[ 0,  1,  2,  3,  4,  5],
    [ 6,  7,  8,  9, 10, 11],
    [12, 13, 14, 15, 16, 17]],

   [[18, 19, 20, 21, 22, 23],
    [24, 25, 26, 27, 28, 29],
    [30, 31, 32, 33, 34, 35]]])

What should be the rows of dW_man[x]? Well [0, 1, ...] should be added to the row 5, [ 6, 7, ..] - to the row 3. And also [30, 31, ...] should be added to the row 3. So let's compute it manually. See more examples and explanation in this GitHub gist: link.

dW_man[5] = dout[0, 0]
dW_man[3] = dout[0, 1]
dW_man[4] = dout[0, 2]

dW_man[0] = dout[1, 0]
dW_man[1] = dout[1, 1]
dW_man[3] = dout[1, 2]

dW_man
array([[18., 19., 20., 21., 22., 23.],
   [24., 25., 26., 27., 28., 29.],
   [ 0.,  0.,  0.,  0.,  0.,  0.],
   [30., 31., 32., 33., 34., 35.],
   [12., 13., 14., 15., 16., 17.],
   [ 0.,  1.,  2.,  3.,  4.,  5.],
   [ 0.,  0.,  0.,  0.,  0.,  0.]])

Now let's use np.add.at.

np.random.seed(1)
N, T, V, D = 2, 3, 7, 6
x = np.random.randint(V, size=(N, T))
dW = np.zeros((V, D))
dout = np.arange(2*3*6).reshape(dW[x].shape)
np.add.at(dW, x, dout)

dW
array([[18., 19., 20., 21., 22., 23.],
       [24., 25., 26., 27., 28., 29.],
       [ 0.,  0.,  0.,  0.,  0.,  0.],
       [36., 38., 40., 42., 44., 46.],
       [12., 13., 14., 15., 16., 17.],
       [ 0.,  1.,  2.,  3.,  4.,  5.],
       [ 0.,  0.,  0.,  0.,  0.,  0.]])
Rhachis answered 4/11, 2021 at 10:40 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.