I have a question about Keras function Dropout with the argument of noise_shape.
Question 1:
What's the meaning of if your inputs have shape (batch_size, timesteps, features) and you want the dropout mask to be the same for all timesteps, you can use noise_shape=(batch_size, 1, features)?, and what's the benefit of adding this argument?
Does it mean the number of neurons that will be dropped out is same along time step? which means at every timestep t, there would be n neurons dropped?
Question 2: Do I have to include 'batch_size' in noise_shape when creating models? --> see the following example.
Suppose I have a multivariate time series data in the shape of (10000, 1, 100, 2) --> (number of data, channel, timestep, number of features).
Then I create batches with batch size of 64 --> (64, 1, 100, 2)
If I want to create a CNN model with drop out, I use Keras functional API:
inp = Input([1, 100, 2])
conv1 = Conv2D(64, kernel_size=(11,2), strides(1,1),data_format='channels_first')(inp)
max1 = MaxPooling2D((2,1))(conv1)
max1_shape = max1._keras_shape
drop1 = Dropout((0.1, noise_shape=[**?**, max1._keras_shape[1], 1, 1]))
Because the output shape of layer max1 should be (None, 64, 50, 1), and I cannot assign None to the question mark (which corresponds to batch_size)
I wonder how should I cope with this? Should I just use (64, 1, 1) as noise_shape? or should I define a variable called 'batch_size', then pass it to this argument like (batch_size, 64, 1, 1)?