Residual networks are always built with convolutional layers. I have never seen residual networks with only fully connected layers. Does it work to build a residual network with only fully connected layers?
Yes, you can use residual networks in fully connected networks. Skipped connections help the learning for fully connected layers.
Here is a nice paper (not mine unfortunately) where it is done and where the authors explain in detail why it helps the learning. https://arxiv.org/pdf/1701.09175.pdf
I am using this in a paper I am writing with 50 layers and skipped connections.
As Tapio, I also disagree with Giuseppe's conclusion. Residual layers are said to help improving performance in multiple ways: They let the gradient flow better, they might help in localization etc. My guess is that some of these advantages like gradient flow, hold also for networks consisting of fully connected layers.
Other ideas like saying we learn residuals F(X)-X, where F is a residual block are more questionable due to absence of spatial correlation. That is, for CNNs, where residual connections are mostly used, we have some form of locality, that is if you have the feature map X of some layer (you can also think of X as the input) and the output of a residual block F(X) then both X and F(X) correlate. That is, oftentimes the map at location X[i,j] is similar to that of F(X)[i,j]. This does not hold for fully connected networks, since neurons do not contain spatial information. However, to what extent this matters is probably an open problem :-) .
So, let's start with: what is the aim of ResNets?
Given an input X
, which is propagated through a certain ensemble of layers, let's call with F(X)
the output of this ensemble. If we denote with H(X)
the desired output (the ideal mapping, i.e. F(X)!=H(X)
), a resnet learn H(X) = F(X) + X
, that can be written as F(X) = H(X)-X
, i.e the residual, from which the name residual network.
Thus, what is the gain of a resnet?
In a resnet, the mapping of a following layer performs at least as well as the previous one. Why? Because, at lest, it learns the mapping of an identity (F(X)=X
).
This is a crucial aspect related to convolutional networks. Indeed, deeper nets should perform better than networks with lesser depth, but this does not always happen. From this rises the necessity to build a network that guarantees such behavior.
Is this true also for dense networks? No, it is not. There is a known theorem (Universal Approximation Theorem) for dense nets, which states: any kind of network is equivalent to a two dense layers net with an adequate number of hidden units distributed between the two layers. For this reason, it is not necessary to increase the depth of a dense net, rather it is necessary to find the right number of hidden units.
If you want you can explore the original paper by He et al 2015.
In fact, I am doing this very thing right now.
https://github.com/ollewelin/Nerual_Netwok_CPP
You can download my C/C++ code no libraries involved CPU fully connected neural network with residual skip connection implemented. Change in
Makefile
to
SRCS = residual_net.cpp fc_m_resnet.cpp
PROG = residual_net
make
./residual_net
--> press Y , Y ,Y
and MNIST digits are downloaded to your disk and then residual_net is run. This residual can be connected to arbitrary in/out connections and arbitrary number of mid block objects. The trick I did for skip connections to be connected regardless of different input/output between mid blocks are these lines in the C code:
For forwards:
void fc_m_resnet::forward_pass(void)
....
output_layer[src_n_cnt % dst_nodes] += input_layer[src_n_cnt]; // Input nodes are > output nodes
.... or
output_layer[dst_nodes] += input_layer[dst_n_cnt % src_nodes]; // Input nodes are < output nodes
For backpropagation:
void fc_m_resnet::backpropagtion_and_update(void)
....
i_layer_delta[src_n_cnt] += o_layer_delta[src_n_cnt % dst_nodes]; // Input nodes are > output nodes
.... or
i_layer_delta[dst_n_cnt] += o_layer_delta[dst_n_cnt]; // Input nodes are the same as output nodes simple add operation at output side
© 2022 - 2024 — McMap. All rights reserved.