understanding output shape of keras Conv2DTranspose
Asked Answered
C

4

11

I am having a hard time understanding the output shape of keras.layers.Conv2DTranspose

Here is the prototype:

keras.layers.Conv2DTranspose(
    filters,
    kernel_size,
    strides=(1, 1),
    padding='valid',
    output_padding=None,
    data_format=None,
    dilation_rate=(1, 1),
    activation=None,
    use_bias=True,
    kernel_initializer='glorot_uniform',
    bias_initializer='zeros',
    kernel_regularizer=None,
    bias_regularizer=None,
    activity_regularizer=None,
    kernel_constraint=None,
    bias_constraint=None
)

In the documentation (https://keras.io/layers/convolutional/), I read:

If output_padding is set to None (default), the output shape is inferred.

In the code (https://github.com/keras-team/keras/blob/master/keras/layers/convolutional.py), I read:

out_height = conv_utils.deconv_length(height,
                                      stride_h, kernel_h,
                                      self.padding,
                                      out_pad_h,
                                      self.dilation_rate[0])
out_width = conv_utils.deconv_length(width,
                                     stride_w, kernel_w,
                                     self.padding,
                                     out_pad_w,
                                     self.dilation_rate[1])
if self.data_format == 'channels_first':
    output_shape = (batch_size, self.filters, out_height, out_width)
else:
    output_shape = (batch_size, out_height, out_width, self.filters)

and (https://github.com/keras-team/keras/blob/master/keras/utils/conv_utils.py):

def deconv_length(dim_size, stride_size, kernel_size, padding, output_padding, dilation=1):

    """Determines output length of a transposed convolution given input length.
    # Arguments
        dim_size: Integer, the input length.
        stride_size: Integer, the stride along the dimension of `dim_size`.
        kernel_size: Integer, the kernel size along the dimension of `dim_size`.
        padding: One of `"same"`, `"valid"`, `"full"`.
        output_padding: Integer, amount of padding along the output dimension, can be set to `None` in which case the output length is inferred.
        dilation: dilation rate, integer.
    # Returns
        The output length (integer).
    """

    assert padding in {'same', 'valid', 'full'}
    if dim_size is None:
        return None

    # Get the dilated kernel size
    kernel_size = kernel_size + (kernel_size - 1) * (dilation - 1)

    # Infer length if output padding is None, else compute the exact length
    if output_padding is None:
        if padding == 'valid':
            dim_size = dim_size * stride_size + max(kernel_size - stride_size, 0)
        elif padding == 'full':
            dim_size = dim_size * stride_size - (stride_size + kernel_size - 2)
        elif padding == 'same':
            dim_size = dim_size * stride_size
    else:
        if padding == 'same':
            pad = kernel_size // 2
        elif padding == 'valid':
            pad = 0
        elif padding == 'full':
            pad = kernel_size - 1

        dim_size = ((dim_size - 1) * stride_size + kernel_size - 2 * pad + output_padding)

    return dim_size

I understand that Conv2DTranspose is kind of a Conv2D, but reversed.

Since applying a Conv2D with kernel_size = (3, 3), strides = (10, 10) and padding = "same" to a 200x200 image will output a 20x20 image, I assume that applying a Conv2DTranspose with kernel_size = (3, 3), strides = (10, 10) and padding = "same" to a 20x20 image will output a 200x200 image.

Also, applying a Conv2D with kernel_size = (3, 3), strides = (10, 10) and padding = "same" to a 195x195 image will also output a 20x20 image.

So, I understand that there is kind of an ambiguity on the output shape when applying a Conv2DTranspose with kernel_size = (3, 3), strides = (10, 10) and padding = "same" (user might want output to be 195x195, or 200x200, or many other compatible shapes).

I assume that "the output shape is inferred." means that a default output shape is computed according to the parameters of the layer, and I assume that there is a mechanism to specify an output shape differnet from the default one, if necessary.

This said, I do not really understand

  • the meaning of the "output_padding" parameter

  • the interactions between parameters "padding" and "output_padding"

  • the various formulas in the function keras.conv_utils.deconv_length

Could someone explain this?

Many thanks,

Julien

Contuse answered 18/2, 2019 at 16:29 Comment(6)
I strongly believe "output_padding" is exactly the parameter you're looking for to create different ouptut sizes.Enter
Yes I suspect that, now what I would like is (1) the specific meaning of the "output_padding" parameter (2) the interactions between parameters "padding" and "output_padding" (3) an explanation the various formulas in the function keras.conv_utils.deconv_lengthContuse
Does this not help? keras.io/layers/convolutional It appears to contain a decent amount of relevant documentation.Flamenco
@Flamenco this documentation (which I refer to in my question) provides high-level, general idea of the transposed convolution. It does not offer the detailed explanation that I need, which is why I posted on SO in the first place.Contuse
ok, sorry not to be helpful.Flamenco
Cross-posted: stats.stackexchange.com/q/393114/2921, https://mcmap.net/q/1026795/-understanding-output-shape-of-keras-conv2dtranspose/781723. Please do not post the same question on multiple sites. Each community should have an honest shot at answering without anybody's time being wasted.Multivocal
C
4

I may have found a (partial) answer.

I found it in the Pytorch documentation, which appears to be much clearer than the Keras documentation on this topic.

When applying Conv2D with a stride greater than 1 to images which dimensions are close, we get output images with the same dimensions.

For instance, when applied a Conv2D with kernel size of 3x3, stride of 7x7 and padding "same", the following image dimensions

22x22, 23x23, ..., 28x28, 22x28, 28x22, 27x24, etc. (7x7 = 49 combinations)

will ALL yield an output dimension of 4x4.

That is because output_dimension = ceiling(input_dimension / stride).

As a consequence, when applying a Conv2DTranspose with kernel size of 3x3, stride of 7x7 and padding "same", there is an ambiguity about the output dimension.

Any of the 49 possible output dimensions would be correct.

The parameter output_padding is a way to resolve the ambiguity by choosing explicitly the output dimension.

In my example, the minimum output size is 22x22, and output_padding provides a number of lines (between 0 and 6) to add at the bottom of the output image and a number of columns (between 0 and 6) to add at the right of the output image.

So I can get output_dimensions = 24x25 if I use outout_padding = (2, 3)

What I still do not understand, however, is the logic that keras uses to choose a certain output image dimension when output_padding is not specified (when it 'infers" the output shape)

A few pointers:

https://pytorch.org/docs/stable/nn.html#torch.nn.ConvTranspose2d https://discuss.pytorch.org/t/the-output-size-of-convtranspose2d-differs-from-the-expected-output-size/1876/5 https://discuss.pytorch.org/t/question-about-the-output-padding-in-nn-convtrasnpose2d/19740 https://discuss.pytorch.org/t/what-does-output-padding-exactly-do-in-convtranspose2d/2688

So to answer my own questions:

  • the meaning of the "output_padding" parameter: see above
  • the interactions between parameters "padding" and "output_padding": these parameters are independant
  • the various formulas in the function keras.conv_utils.deconv_length
    • For now, I do not understand the part when output_padding is None;
    • I ignore the case when padding == 'full' (not supported by Conv2DTranspose);
    • The formula for padding == 'valid' seems correct (can be computed by reversing the formula of Conv2D)
    • The formula for padding == 'same' seems incorrect to me, in case kernel_size is even. (As a matter of fact, keras crashes when trying to build a Conv2DTranspose layer with input_dimension = 5x5, kernel_size = 2x2, stride = 7x7 and padding = 'same'. It appears to me that there is a bug in keras, I will start another thread for this topic...)
Contuse answered 26/2, 2019 at 17:27 Comment(2)
This is an insightful discussion. For autoencoders it is crucial to have the same output dimension. Toying with the output padding seems such an oblique way of specifying the output dimensions...Incretion
Hi, the output_padding is the right parameter to adjust the output dimension. It is its very purpose. However, there is a bug in keras, the formula for the output dimension is incorrect.Contuse
J
1

Outpadding in Conv2DTranspose is also what I am concerned about when designing an autoencoder.

Assume stride is always 1. Along the encoder path, for each convolution layer, I chose padding='valid', which means that if my input image is HXW, and the filter is sized mXn, the output of the layer will be (H-(m-1))X(W-(n-1)).

In the corresponding Con2DTranspose layer along the decoder path, if I use Theano, in order to resume the input size of its corresponding Con2D, I have to chose padding='full', and out_padding = None or 0 (no difference), which implies the input size will be expanded by [m-1, n-1] around it, that is, (m-1)/2 for top and bottom, and (n-1)/2 for left and right.

If I use tensorflow, I will have to choose padding = 'same', and out_padding = 2*((filter_size-1)//2), I think that is Keras' intended behaviour.

If stride is not 1, then you will have to calculate carefully how many output paddings are to be added.

In Conv2D out_size = floor(in_size+2*padding_size-filter_size)/stride+1)

If we choose padding = 'same', Keras will automatically set padding = (filter_size-1)/2; whilst if we choose 'valid', padding_size will be set 0, which is the convention of any N-D convolutions.

Conversely, in Con2DTranspose out_size = (in_size-1)*stride+filter_size-2*padding_size

where padding_size refers to how many pixels will actually be padded caused by 'padding' option and out_padding together. Based upon the discussion above, there is no 'full' option on tensorflow, we will have to use out_padding to resume the input size of its corresponding Con2D.

Could you try and see if it works properly and let me know, please?

So in summary, I think out_padding is used for facilitating different backends.

Janel answered 8/6, 2019 at 10:42 Comment(0)
M
0

When output_padding=None, Keras uses the deconv_output_length method to compute the output length, which sets it to:

if padding == 'valid':
   length = input_length * stride + max(filter_size - stride, 0)
elif padding == 'same':
   length = input_length * stride

Now in the documentation it says that if output_padding is set, the output length will be

((input_length - 1) * stride + filter_size - 2 * padding + output_padding

So using this we can figure out what the default output_padding is.

  • In the padding='valid' case, padding = 0 in the above, so solving for output_padding:

    output_padding = max(stride - filter_size, 0)
    

padding='valid'

In this case, padding = 0 in the above, so solving for output_padding:

  output_padding = max(stride - filter_size, 0)

and one can check that setting this results in the same as setting it to None

padding = 'same'

This case is much more mysterious, and in fact it seems to be impossible to get the same as output_padding=None by setting it to any integer. For example with strides=2 and kernel_size=2, for an output_padding larger than 1, it gives a warning that the stride must be larger than the output padding. For anything smaller than 1 it gives a warning that the size of out_backprop doesn't match computed. So the only value that works is 1, but this results in a different output shape from None.

In fact it is not implemented by setting output_padding to some default value, it is only used to compute the output shape, which then is used in the convolution method.

Miller answered 17/9, 2021 at 10:38 Comment(0)
S
0

In addition to the answers above, I'll describe one numeric example for deeper understanding. Suppose we have the previous layer of the shape H = (16, 48, 64) and we're adding a new Conv2DTranspose layer:

Conv2DTranspose(filters=3, kernel_size = 373, strides=5, padding='valid', activation='relu')

The formula for the output shape would be:

(H[0] - 1) * Conv2DTranspose.strides + Conv2DTranspose.kernel_size 
(H[1] - 1) * Conv2DTranspose.strides + Conv2DTranspose.kernel_size
Conv2DTranspose.filters

In numbers:

(16 - 1) * 5 + 373 = 448
(48 - 1) * 5 + 373 = 608
3

Model summary for described numbers

Spangler answered 6/4 at 18:4 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.