How the function dimshuffle works in Theano

I am having tough time understanding what and how dimshuffle() works implemented in Theano? I got the following set of examples in the official documentation but couldn't understand their meaning.

Can anyone explain what each examples mean in the following?

(‘x’) -> make a 0d (scalar) into a 1d vector
(0, 1) -> identity for 2d vectors
(1, 0) -> inverts the first and second dimensions
(‘x’, 0) -> make a row out of a 1d vector (N to 1xN)
(0, ‘x’) -> make a column out of a 1d vector (N to Nx1)
(2, 0, 1) -> AxBxC to CxAxB
(0, ‘x’, 1) -> AxB to Ax1xB
(1, ‘x’, 0) -> AxB to Bx1xA
(1,) -> This remove dimensions 0. It must be a broadcastable dimension (1xA to A)

Please note, I know about broadcasting concept in numpy python.

Without 'x', dimshuffle is same as transpose

For explanatory purpose, let's fake numpy has a dimshuffle function

x = np.arange(60).reshape((3,4,5))
x.dimshuffle(0, 1, 2).shape # gives (3, 4, 5)
x.dimshuffle(2, 1, 0).shape # gives (5, 4, 3)

Since we have:

shp = (3,4,5)
(shp[2], shp[1], shp[0]) == (5, 4, 3)

The arguments 2, 1, 0 to dimshuffle just means the permutation to the shape tuple.

Whenever there is 'x' present, it adds 1 sized dimension into the array:

x = np.arange(60).reshape((3,4,5))
x.dimshuffle(2, 1, 0, 'x').shape # (5, 4, 3, 1)
x.dimshuffle(2, 1, 'x', 0).shape # (5, 4, 1, 3)
x.dimshuffle(2, 'x', 1, 0).shape # (5, 1, 4, 3)

Whenever the permutation is missing an index (or multiple), these indices are removed from shape tuple, provided that they are 1 (which is broadcastable)

x = np.arange(1337).reshape(2,1337,1)
y = x.dimshuffle(1,0) # this works since shape[2] is 1
y.shape # (1337, 2)
z = y.dimshuffle(1) # ERROR

Note theano has no way to determine the shape of symbolic tensor, so dimshuffle with dimensional removal must refer to broadcastable attribute. (This is different to tensorflow as you can specify shape at compile-time)

>>> x = T.vector()
>>> x.broadcastable
(False,)
>>> y = x.dimshuffle('x', 0, 'x')
>>> y.broadcastable # the new dims are broadcastable because we added via 'x'
(True, False, True)

With dimshuffle, you can save multiple calls to transpose and expand_dims (note Theano has no expand_dims)

Recommended topics

Hot tags