In the ResNet architecture, why is the ReLU activation applied after the element-wise addition with the residual in a residual block, instead of before it?
Why is ReLU applied after residual connection in ResNet?
Asked Answered
Because it was proposed this way. Residual Connections have been investigated in the following work: https://arxiv.org/pdf/1603.05027.pdf and they have found, that Skip -> BN -> RELU -> Conv -> BN -> RELU -> Conv -> Add works best.
However, the differences in performance are negligible and therefore the original ResNet formulation prevailed. Still, you can read the paper if you want to know what works and what does not.
Thank you. So is it just an elliptical result? Or is there any theoretical insights in the design choice? –
Pyramidon
This is just an empirical result. I mean they try to justify their actions with some hand wavy stuff, but its not a sound theory yet. There are not many theoretical works on skip connections. –
Montelongo
© 2022 - 2024 — McMap. All rights reserved.