How does shift-and-stitch in a fully convolutional network work?

I

2

13

I am still struggling with the "shift and stitch" trick in FCN after repeating reading it many times.

Can someone give some intuitional explanation?

Industrial answered 19/11, 2016 at 8:51 Comment(1)

Please, next time, be aware of the fact that there are more appropriate sites to ask these questions, such as Artificial Intelligence Stack Exchange, Cross Validated SE and Data Science SE. – Interpretative 14/6, 2020 at 14:18

E

14

In FCN, the final output you get (by default without utilizing any tricks for upsampling) is at a lower resolution compared to the input. Assuming you have an input image of shape 100x100 and you get an output (from the network) of shape 10x10. Mapping the output directly to the input resolution will look patchy (even with high order interpolation).

Now, you take the same input and shift it a bit and get the output and repeat this process multiple times. You end up with a set of output images and a vector of shifts corresponding to each output. These output images with the shift vectors can be utilized (stitch) to get better resolution in the final schematic map.

One might think of it as taking multiple (shifted) low-resolution images of an object and combining (stitch) them to get a higher resolution image.

Eddins answered 2/9, 2017 at 18:7 Comment(0)

T

15

While this question has been answered, I found this image here that better-explained shift-and-stitch. Just image your FCN is a 2x2 max-pooling layer (Also the numbers represent pixel values not index values btw). So the values are being max-pulled after doing the shifting and then we stitch the results into the original image:

Tangerine answered 1/8, 2018 at 13:58 Comment(1)

very intuitive interpretation. – Industrial 23/8, 2018 at 12:50

E

14

In FCN, the final output you get (by default without utilizing any tricks for upsampling) is at a lower resolution compared to the input. Assuming you have an input image of shape 100x100 and you get an output (from the network) of shape 10x10. Mapping the output directly to the input resolution will look patchy (even with high order interpolation).

Now, you take the same input and shift it a bit and get the output and repeat this process multiple times. You end up with a set of output images and a vector of shifts corresponding to each output. These output images with the shift vectors can be utilized (stitch) to get better resolution in the final schematic map.

One might think of it as taking multiple (shifted) low-resolution images of an object and combining (stitch) them to get a higher resolution image.

Eddins answered 2/9, 2017 at 18:7 Comment(0)

Recommended topics

Hot tags