Siamese networks: Why does the network to be duplicated?

Asked 8/2, 2018 at 19:21 Answered 13/3, 2021 at 11:7

facebook neural-network conv-neural-network metrics face-recognition

The DeepFace paper from Facebook uses a Siamese network to learn a metric. They say that the DNN that extracts the 4096 dimensional face embedding has to be duplicated in a Siamese network, but both duplicates share weights. But if they share weights, every update to one of them will also change the other. So why do we need to duplicate them?

Why can't we just apply one DNN to two faces and then do backpropagation using the metric loss? Do they maybe mean this and just talk about duplicated networks for "better" understanding?

Quote from the paper:

We have also tested an end-to-end metric learning ap- proach, known as Siamese network [8]: once learned, the face recognition network (without the top layer) is repli- cated twice (one for each input image) and the features are used to directly predict whether the two input images be- long to the same person. This is accomplished by: a) taking the absolute difference between the features, followed by b) a top fully connected layer that maps into a single logistic unit (same/not same). The network has roughly the same number of parameters as the original one, since much of it is shared between the two replicas, but requires twice the computation. Notice that in order to prevent overfitting on the face verification task, we enable training for only the two topmost layers.

Paper: https://research.fb.com/wp-content/uploads/2016/11/deepface-closing-the-gap-to-human-level-performance-in-face-verification.pdf

Trotta answered 8/2, 2018 at 19:21 Comment(0)

The short answer is that yes, I think that looking at the architecture of the network will help you understand what is going on. You have two networks that are "joined at the hip" i.e. sharing weights. That's what makes it a "Siamese network". The trick is that you want the two images you feed into the network to pass through the same embedding function. So to ensure that this happens both branches of the network need to share weights.

Then we combine the two embeddings into a metric loss (called "contrastive loss" in the image below). And we can back-propagate as normal, we just have two input branches available so that we can feed in two images at a time.

I think a picture is worth a thousand words. So check out how a Siamese network is constructed (at least conceptually) below.

Darlenadarlene answered 30/4, 2018 at 14:52 Comment(3)

I was actually going to ask "why the name Siamese Neural Network?" but this question answers it already (Siamese from "Siamese Twins" i.e. Conjoined twins sharing common parts, in case of Siamese Network it is sharing common weights). Thanks for the clear explanation. – Barnie 9/11, 2018 at 7:12

What would be the differece between feeding input_1 and input_2 to the same embedding? I don't get why there is two replicated networks, if they are the same, would not one be enough? – Querida 6/9, 2020 at 20:39

There isn't any difference. The trick that is going on here is that by having 2 branches I can feed two images to the network at the sme time. I want to be able to do that. – Darlenadarlene 7/9, 2020 at 2:16

The gradients depend on the activation values. So for each branch gradients will be different and final update could be based on some averaging to share the weights

Cuthbertson answered 13/3, 2021 at 11:7 Comment(0)

Recommended topics

Hot tags