Does it make sense to build a residual network with only fully connected layers (instedad of convolutional layers)?

Asked 26/5, 2020 at 14:0 Answered 9/2, 2023 at 16:56

python machine-learning neural-network conv-neural-network deep-residual-networks

Residual networks are always built with convolutional layers. I have never seen residual networks with only fully connected layers. Does it work to build a residual network with only fully connected layers?

Kronick answered 26/5, 2020 at 14:0 Comment(2)

Yes, there is nothing in the concept/theory of residual connections limiting it to convolutions. – Gawlas 26/5, 2020 at 14:6

@Gawlas Could you show me which papers/codes build residual works with only fully connected layers? Thanks! It would be great help that I can learn some comparisons about fully connected layers with and without residual networks. – Kronick 27/5, 2020 at 7:43

Yes, you can use residual networks in fully connected networks. Skipped connections help the learning for fully connected layers.

Here is a nice paper (not mine unfortunately) where it is done and where the authors explain in detail why it helps the learning. https://arxiv.org/pdf/1701.09175.pdf

I am using this in a paper I am writing with 50 layers and skipped connections.

Min answered 2/4, 2021 at 21:21 Comment(2)

Paper is not helpful IMO ... – Pasol 26/4, 2022 at 20:31

@Pasol Why? – Disfavor 3/5, 2023 at 0:31

As Tapio, I also disagree with Giuseppe's conclusion. Residual layers are said to help improving performance in multiple ways: They let the gradient flow better, they might help in localization etc. My guess is that some of these advantages like gradient flow, hold also for networks consisting of fully connected layers.

Other ideas like saying we learn residuals F(X)-X, where F is a residual block are more questionable due to absence of spatial correlation. That is, for CNNs, where residual connections are mostly used, we have some form of locality, that is if you have the feature map X of some layer (you can also think of X as the input) and the output of a residual block F(X) then both X and F(X) correlate. That is, oftentimes the map at location X[i,j] is similar to that of F(X)[i,j]. This does not hold for fully connected networks, since neurons do not contain spatial information. However, to what extent this matters is probably an open problem :-) .

Thighbone answered 17/2, 2021 at 20:38 Comment(0)

So, let's start with: what is the aim of ResNets?

Given an input X, which is propagated through a certain ensemble of layers, let's call with F(X) the output of this ensemble. If we denote with H(X) the desired output (the ideal mapping, i.e. F(X)!=H(X)), a resnet learn H(X) = F(X) + X, that can be written as F(X) = H(X)-X, i.e the residual, from which the name residual network.

Thus, what is the gain of a resnet?

In a resnet, the mapping of a following layer performs at least as well as the previous one. Why? Because, at lest, it learns the mapping of an identity (F(X)=X).

This is a crucial aspect related to convolutional networks. Indeed, deeper nets should perform better than networks with lesser depth, but this does not always happen. From this rises the necessity to build a network that guarantees such behavior.

Is this true also for dense networks? No, it is not. There is a known theorem (Universal Approximation Theorem) for dense nets, which states: any kind of network is equivalent to a two dense layers net with an adequate number of hidden units distributed between the two layers. For this reason, it is not necessary to increase the depth of a dense net, rather it is necessary to find the right number of hidden units.

If you want you can explore the original paper by He et al 2015.

Baculiform answered 26/5, 2020 at 14:21 Comment(3)

I disagree with your conclusion. Even though the Universal Approximation Theorem says that a fully connected neural network can represent any function it does not mean that it can learn any function. Check out Chapter 6.4.1. in deeplearningbook.org/contents/mlp.html , it's got some nice discussion about it. That being said shallower dense networks seem to be a lot more common, maybe for a practical reason.. – Plumbic 8/10, 2020 at 20:36

A scientific conclusion even if we disagree with worth credits, so I upvoted. – Heidelberg 26/4, 2021 at 14:11

I think it is only one dense layer ... not two? – Pasol 26/4, 2022 at 20:30

In fact, I am doing this very thing right now. https://github.com/ollewelin/Nerual_Netwok_CPP You can download my C/C++ code no libraries involved CPU fully connected neural network with residual skip connection implemented. Change in Makefile to

SRCS = residual_net.cpp fc_m_resnet.cpp
PROG = residual_net

make

./residual_net

--> press Y , Y ,Y and MNIST digits are downloaded to your disk and then residual_net is run. This residual can be connected to arbitrary in/out connections and arbitrary number of mid block objects. The trick I did for skip connections to be connected regardless of different input/output between mid blocks are these lines in the C code:

For forwards:

void fc_m_resnet::forward_pass(void)

....

output_layer[src_n_cnt % dst_nodes] += input_layer[src_n_cnt]; // Input nodes are > output nodes

.... or

output_layer[dst_nodes] += input_layer[dst_n_cnt % src_nodes]; // Input nodes are < output nodes

For backpropagation:

void fc_m_resnet::backpropagtion_and_update(void)

....

i_layer_delta[src_n_cnt] += o_layer_delta[src_n_cnt % dst_nodes]; // Input nodes are > output nodes

.... or

i_layer_delta[dst_n_cnt] += o_layer_delta[dst_n_cnt]; // Input nodes are the same as output nodes simple add operation at output side

Nauseating answered 9/2, 2023 at 16:56 Comment(0)

Recommended topics

Hot tags