What are differences between AutoModelForSequenceClassification vs AutoModel
Asked Answered
A

2

22

We can create a model from AutoModel(TFAutoModel) function:

from transformers import AutoModel 
model = AutoModel.from_pretrained('distilbert-base-uncase')

In other hand, a model is created by AutoModelForSequenceClassification(TFAutoModelForSequenceClassification):

from transformers import AutoModelForSequenceClassification
model = AutoModelForSequenceClassification('distilbert-base-uncase')

As I know, both models use distilbert-base-uncase library to create models. From name of methods, the second class( AutoModelForSequenceClassification ) is created for Sequence Classification.

But what are really differences in 2 classes? And how to use them correctly?

(I searched in huggingface but it is not clear)

Aeolic answered 10/11, 2021 at 3:33 Comment(1)
I got a difference : with AutoModel, we can use last_hidden_state to get the [CLS] token. AutoModelSequenceClassification last_hidden_state is not exist.Aeolic
D
20

The difference between AutoModel and AutoModelForSequenceClassification model is that AutoModelForSequenceClassification has a classification head on top of the model outputs which can be easily trained with the base model

Denys answered 5/12, 2021 at 9:7 Comment(6)
So can I use AutoModel for Classification purpose?Aeolic
No actually from the Hugging face course you can see that,For our example, we will need a model with a sequence classification head (to be able to classify the sentences as positive or negative). So, we won’t actually use the AutoModel class, but AutoModelForSequenceClassification:huggingface.co/course/chapter2/2?fw=ptDenys
Does that means AutoModel have frozen weights while AutoModelForSequenceClassification have trainable weights? Actually, I have a requirement where I only want the model to act as a extractor and not as a trainable model.Pritchett
@Denys What is classification head?Teenateenage
@subho: what is the classification head? Is it a linear layer with D_in=number_of_classes? Any online pouter will be very useful!Ottavia
It would depend on the architecture, this is the head used for RoBERTa for example - github.com/huggingface/transformers/blob/…Edgeways
F
11

To add more information,

both are classes that instantiate any model from a checkpoint; the difference is what you want it to be returned like features or logits to be processed further.

Automodel class:

Returns hidden_states/features i.e., contextual understanding of the input sentences by the model.

AutoModelForSequenceClassification (consider sequence classification task) class:

The output of Automodel is an input to the classifier head (usually one or few linear layers) which outputs logit/s for input sequence/s. The Softmax of logit is interpreted as a probability. The illustration of an entire pipeline as shown below:

Illustration of an entire pipeline

The different tasks could have been performed with the same architecture, but each of these tasks will have a different head associated with it (as mentioned on hugging face)

  • model + sequence classification head --> AutoModelForSequenceClassification
  • model + question answering head --> AutoModelForQuestionAnswering
  • model + token classification head --> AutoModelForTokenClassification

we can customize these heads according to our case study (e.g., adding dropout/dense layers or modifying the last layer from 5 to 2 nodes, or converting the question_answering head to the text_classification head) on top of the model.

A nice blog on customizing head.

Source: HuggingFace

Facilitation answered 1/12, 2023 at 21:55 Comment(1)
Very interesting! Thank you I have a small qst: None of those 2 models returns the attention. Is there any way to get them. I used output_attentions=True, but still output.attentions = None for AutoModelForSequenceClassification and AutoModel doesn't have that attribute.Theca

© 2022 - 2024 — McMap. All rights reserved.