metric learning and contrastive learning difference

Asked 9/4, 2022 at 14:40 Answered 12/4, 2023 at 14:34

machine-learning deep-learning embedding supervised-learning self-supervised-learning

I researched some materials，and know that the goal of contrastive learning and metric learning are both to learn such an embedding space in which similar sample pairs stay close to each other while dissimilar ones are far apart. But what is the difference of metric learning and contrastive learning? I can not understand.

Someone can give some advises? Thanks.

Perineurium answered 9/4, 2022 at 14:40 Comment(1)

I'm also curious. Previously I thought contrastive learning is more like a self-supervised version of (supervised) metric learning, but there are just so many paradigms (regarding losses, supervision, negative sampling, etc.) now and they cross the margins a lot. As far as I know, most papers claiming themselves "metric" or "contrastive" learning use the same set of loss functions. – Melleta 15/6, 2022 at 20:36

Metric learning is an umbrella term that encompasses several methods for learning embeddings, including contrastive learning and triplet loss-based methods. These methods all focus on optimizing the learned embedding structure to reflect the desired relationships between samples. A contrastive based loss is often chosen due to its theoretical connection with a lower bound on mutual information and the smooth induced loss landscape, which helps with gradient descent. Unfortunately, using lots of negative example is expensive and as such other proposed losses have been introduced. These other losses can be interpreted as also learning a good embedding space, which is the general objective of metric learning.

tldr; Contrastive learning is a specific approach within metric learning that focuses on comparing positive (similar) and negative (dissimilar) pairs of examples. It is a natural extension of a triplet loss to introduce more negative examples per anchor.

High answered 12/4, 2023 at 14:34 Comment(3)

do you have a reference for this claim, I cannot find any paper or authentic source claiming that. I am just asking because I am curious if contrastive learning is a subset of metric learning. a proof will be appreciated because I need to present it. thanks – Clamworm 13/4, 2023 at 10:39

A Metric Learning Reality Check ECCV'20 Musgrave et. al. Discuss the general objective of metric learning to learn an embedding space and provide specific embedding losses (contrastive, triplet) as examples to do this. – High 13/4, 2023 at 13:56

ok, so metric learning is the main branch. Thanks for the reference – Clamworm 14/4, 2023 at 5:56

According to , "A metric learning algorithm basically aims at finding the parameters of the metric such that it best agrees(disagree) with some constraints, in an effort to approximate the underlying semantic metric".

According to paper with code, "The goal of Metric Learning is to learn a representation function that maps objects into an embedded space. The distance in the embedded space should preserve the objects’ similarity — similar objects get close and dissimilar objects get far away. Various loss functions have been developed for Metric Learning. For example, the contrastive loss guides the objects from the same class to be mapped to the same point and those from different classes to be mapped to different points whose distances are larger than a margin. Triplet loss is also popular, which requires the distance between the anchor sample and the positive sample to be smaller than the distance between the anchor sample and the negative sample."

In my opinion, contrastive learning is a type of metric learn.

Haggadah answered 27/2, 2023 at 20:55 Comment(0)

In my opinion, the aim of metric learning is to learn an embedding function such that two samples that are similar conceptually (or semantically, i.e. at high-level, not at the level of pixels for example) should be also close in the embedding space, where an embedding is usually a d-dimensional vector.

If the model has correctly captured the similarity function you should be able to "compare" samples by reasoning on something as simple as an Euclidean distance on the embedding space.
A popular approach for metric learning are Siamese Networks, in which you have two neural networks where the second is a copy (i.e. same layers and weights) of the first. During training you provide pairs of data samples in the form (anchor, positive) and (anchor, negative): basically, you force positive pairs to share a common embedding, while negative ones are pushed apart from the anchor. Indeed, variations of this idea exists, like the triplet loss and the introduction of one or more "margins" (to prevent collapsing embeddings and obvious solutions.)
The main motivation for metric learning is that comparing two data points in input space is often meaningless and ambiguous (e.g. images of airplanes can be found to be similar due to blue sky and not to the plane itself), because you can't capture high-level (or semantic) features of the data.

Instead, contrastive learning try to constrain the model to learn a suitable representation of the input data.

Also in this case you have pairs of inputs, but the difference is that the second input is usually a "variation" of the first. This is usually done via data augmentation. In some cases, you start from the same image, augment it twice (but differently!) to get two versions of it.
The goal is to enable the model learn features to represent conceptually similar data in a meaningful way: e.g., you can teach the model about rotation/translation invariance.
The applications of contrastive learning are usually about pre-training, for later fine-tuning aimed at improving (classification) performance, ensure properties (like invariances) and robustness, but also to reduce number of data used, and even improve in low-shot scenarios in which you want to correctly predict some new class even if the model has been trained on zero or very few samples from such class.

To conclude, metric learning is used to compare data to understand their similarity (like in face recognition) while contrastive learning deals with learning better representations to improve the model under various aspects. I can add that, to me, both fields fall under what is called representation learning, which is quite a generic and wider concept.

Elly answered 28/3, 2023 at 9:13 Comment(0)

Recommended topics

Hot tags