Keras Dropout with noise_shape
Asked Answered
U

2

8

I have a question about Keras function Dropout with the argument of noise_shape.

Question 1:

What's the meaning of if your inputs have shape (batch_size, timesteps, features) and you want the dropout mask to be the same for all timesteps, you can use noise_shape=(batch_size, 1, features)?, and what's the benefit of adding this argument?

Does it mean the number of neurons that will be dropped out is same along time step? which means at every timestep t, there would be n neurons dropped?

Question 2: Do I have to include 'batch_size' in noise_shape when creating models? --> see the following example.

Suppose I have a multivariate time series data in the shape of (10000, 1, 100, 2) --> (number of data, channel, timestep, number of features).

Then I create batches with batch size of 64 --> (64, 1, 100, 2)

If I want to create a CNN model with drop out, I use Keras functional API:

inp = Input([1, 100, 2])
conv1 = Conv2D(64, kernel_size=(11,2), strides(1,1),data_format='channels_first')(inp)
max1 = MaxPooling2D((2,1))(conv1)
max1_shape = max1._keras_shape
drop1 = Dropout((0.1, noise_shape=[**?**, max1._keras_shape[1], 1, 1]))

Because the output shape of layer max1 should be (None, 64, 50, 1), and I cannot assign None to the question mark (which corresponds to batch_size)

I wonder how should I cope with this? Should I just use (64, 1, 1) as noise_shape? or should I define a variable called 'batch_size', then pass it to this argument like (batch_size, 64, 1, 1)?

Ulises answered 5/10, 2017 at 11:59 Comment(0)
C
6

Question 1:

It's kind of like a numpy broadcast I think.

Imagine you have 2 batches witch 3 timesteps and 4 features (It's a small example to make it easier to show it): (2, 3, 4)

If you use a noise shape of (2, 1, 4), each batch will have its own dropout mask that will be applied to all timesteps.

So let's say these are the weights of shape (2, 3, 4):

array([[[  1,   2,   3,   4],
        [  5,   6,   7,   8],
        [ 10,  11,  12,  13]],

       [[ 14,  15,  16,  17],
        [ 18,  19,  20,  21],
        [ 22,  23,  24,  25]]])

And this would be the random noise_shape (2, 1, 4) (1 is like keep and 0 is like turn it off):

array([[[ 1,  1,  1,  0]],

       [[ 1,  0,  0,  1]]])

So you have these two noise shapes (For every batch one). Then it will be kinda broadcast along the timestep axis.

array([[[ 1,  1,  1,  0],
        [ 1,  1,  1,  0],
        [ 1,  1,  1,  0]],

       [[ 1,  0,  0,  1],
        [ 1,  0,  0,  1],
        [ 1,  0,  0,  1]]])

and applied to the weights:

array([[[  1,   2,   3,   0],
        [  5,   6,   7,   0],
        [ 10,  11,  12,   0]],

       [[ 14,   0,   0,  17],
        [ 18,   0,   0,  21],
        [ 22,   0,   0,  25]]])

Question 2:

I'm not sure about your second question to be honest.

Edit: What you can do is take the first dimension of the shape of the input, which should be the batch_size, as proposed in this github issue:

import tensorflow as tf

...

batch_size = tf.shape(inp)[0]
drop1 = Dropout((0.1, noise_shape=[batch_size, max1._keras_shape[1], 1, 1]))

As you can see I'm on tensorflow backend. Dunno if theano also has these problems and if it does you might just be able to solve it with the theano shape equivalent.

Cambric answered 11/1, 2018 at 14:39 Comment(0)
I
1

Below is sample code to see what exactly is happening. The output log is self explanatory.

While if you are bothered about dynamic batch_size just make first element of noise_shape to None as follows i.e.

dl1 = tk.layers.Dropout(0.2, noise_shape=[_batch_size, 1, _num_features])

to

dl1 = tk.layers.Dropout(0.2, noise_shape=[None, 1, _num_features])

import tensorflow as tf
import tensorflow.keras as tk
import numpy as np

_batch_size = 5
_time_steps = 2
_num_features = 3
input = np.random.random((_batch_size, _time_steps, _num_features))
dl = tk.layers.Dropout(0.2)
dl1 = tk.layers.Dropout(0.2, noise_shape=[_batch_size, 1, _num_features])

out = dl(input, training=True).numpy()
out1 = dl1(input, training=True).numpy()


for i in range(_batch_size):
    print(">>>>>>>>>>>>>>>>>>>>>>>>>>>>", i)
    print("input")
    print(input[i])
    print("out")
    print(out[i])
    print("out1")
    print(out1[i])

The output is:

>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0
input
[[0.53853024 0.80089701 0.64374258]
 [0.06481775 0.31187039 0.5029061 ]]
out
[[0.6731628  1.0011213  0.        ]
 [0.08102219 0.38983798 0.6286326 ]]
out1
[[0.6731628  0.         0.8046782 ]
 [0.08102219 0.         0.6286326 ]]
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1
input
[[0.70746014 0.08990712 0.58195288]
 [0.75798534 0.50140453 0.04914242]]
out
[[0.8843252  0.11238389 0.        ]
 [0.9474817  0.62675565 0.        ]]
out1
[[0.         0.11238389 0.        ]
 [0.         0.62675565 0.        ]]
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2
input
[[0.85253707 0.55813084 0.70741476]
 [0.98812977 0.21565134 0.67909392]]
out
[[1.0656713  0.69766355 0.8842684 ]
 [0.         0.26956415 0.        ]]
out1
[[1.0656713  0.69766355 0.8842684 ]
 [1.2351623  0.26956415 0.84886736]]
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3
input
[[0.9837272  0.3504008  0.37425778]
 [0.67648931 0.74456052 0.6229444 ]]
out
[[1.2296591  0.438001   0.        ]
 [0.84561163 0.93070066 0.7786805 ]]
out1
[[0.         0.438001   0.46782222]
 [0.         0.93070066 0.7786805 ]]
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 4
input
[[0.45599217 0.80992091 0.04458478]
 [0.12214568 0.09821599 0.51525869]]
out
[[0.5699902  1.0124011  0.        ]
 [0.1526821  0.         0.64407337]]
out1
[[0.5699902  1.0124011  0.05573097]
 [0.1526821  0.12276999 0.64407337]]
Immoral answered 4/1, 2022 at 19:13 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.