Tensorflow has an official realization of resnet in github. And it uses fixed padding instead of normal tf.layers.conv2d.
Something like this:
def conv2d_fixed_padding(inputs, filters, kernel_size, strides, data_format):
"""Strided 2-D convolution with explicit padding."""
# The padding is consistent and is based only on `kernel_size`, not on the
# dimensions of `inputs` (as opposed to using `tf.layers.conv2d` alone).
if strides > 1:
inputs = fixed_padding(inputs, kernel_size, data_format)
return tf.layers.conv2d(
inputs=inputs, filters=filters, kernel_size=kernel_size, strides=strides,
padding=('SAME' if strides == 1 else 'VALID'), use_bias=False,
kernel_initializer=tf.variance_scaling_initializer(),
data_format=data_format)
What's the purpose of doing this? We can get a 16x16 feature map if we input a image of size 32x32 and use tf.layer.conv2d setting padding method to SAME, stride 2. But in the code above, it will pad zero in both side of image and then use padding method VALID.