How does pytorch broadcasting work?
Asked Answered
U

2

20
torch.add(torch.ones(4,1), torch.randn(4))

produces a Tensor with size: torch.Size([4,4]).

Can someone provide a logic behind this?

Unexacting answered 16/7, 2018 at 22:53 Comment(0)
C
43

PyTorch broadcasting is based on numpy broadcasting semantics which can be understood by reading numpy broadcasting rules or PyTorch broadcasting guide. Expounding the concept with an example would be intuitive to understand it better. So, please see the example below:

In [27]: t_rand
Out[27]: tensor([ 0.23451,  0.34562,  0.45673])

In [28]: t_ones
Out[28]: 
tensor([[ 1.],
        [ 1.],
        [ 1.],
        [ 1.]])

Now for torch.add(t_rand, t_ones), visualize it like:

               # shape of (3,)
               tensor([ 0.23451,      0.34562,       0.45673])
      # (4, 1)          | | | |       | | | |        | | | |
      tensor([[ 1.],____+ | | |   ____+ | | |    ____+ | | |
              [ 1.],______+ | |   ______+ | |    ______+ | |
              [ 1.],________+ |   ________+ |    ________+ |
              [ 1.]])_________+   __________+    __________+

which should give the output with tensor of shape (4,3) as:

# shape of (4,3)
In [33]: torch.add(t_rand, t_ones)
Out[33]: 
tensor([[ 1.23451,  1.34562,  1.45673],
        [ 1.23451,  1.34562,  1.45673],
        [ 1.23451,  1.34562,  1.45673],
        [ 1.23451,  1.34562,  1.45673]])

Also, note that we get exactly the same result even if we pass the arguments in a reverse order as compared to the previous one:

# shape of (4, 3)
In [34]: torch.add(t_ones, t_rand)
Out[34]: 
tensor([[ 1.23451,  1.34562,  1.45673],
        [ 1.23451,  1.34562,  1.45673],
        [ 1.23451,  1.34562,  1.45673],
        [ 1.23451,  1.34562,  1.45673]])

Anyway, I prefer the former way of understanding for more straightforward intuitiveness.


For pictorial understanding, I culled out more examples which are enumerated below:

Example-1:

broadcasting-1


Example-2::

theano broadcasting

T and F stand for True and False respectively and indicate along which dimensions we allow broadcasting (source: Theano).


Example-3:

Here are some shapes where the array b is broadcasted appropriately to attempt to match the shape of the array a.

broadcastable shapes

As shown above, the broadcasted b may still not match the shape of a, and so the operation a + b will fail whenever the final broadcasted shapes do not match.

Conservative answered 17/7, 2018 at 0:4 Comment(1)
great answer, and example, especially the pictorial examplesOsteoblast
I
12

Example for a + b

Let:

a.shape = (2, 3, 4, 5, 1, 1, 1)
b.shape = (      4, 1, 6, 7, 8)

Step 1: b will be padded on the left (only the left!) until both have the same number of axes:

a.shape = (2, 3, 4, 5, 1, 1, 1)
b.shape = (1, 1, 4, 1, 6, 7, 8)    <-- padded left with 1s

Step 2: Next, if an axis of b has length 1, that axis will be repeated until its length matches the corresponding axis of a:

a.shape = (2, 3, 4, 5, 1, 1, 1)
b.shape = (2, 3, 4, 5, 6, 7, 8)    <-- changed 1s to match a

Step 3: Next, if an axis of a has length 1, that axis will be repeated until its length matches the corresponding axis of b:

a.shape = (2, 3, 4, 5, 6, 7, 8)    <-- changed 1s to match b
b.shape = (2, 3, 4, 5, 6, 7, 8)

These shapes match, so a + b will run successfully. (If they had not matched, a + b would fail.)

Intoxicated answered 22/1, 2022 at 10:51 Comment(1)
This answer is much better than the most voted one since it clearly shows the routine of how we get the final result.Honorary

© 2022 - 2024 — McMap. All rights reserved.