What are differences between AutoModelForSequenceClassification vs AutoModel

Asked 10/11, 2021 at 3:33 Answered 1/12, 2023 at 21:55

Solved nlp text-classification huggingface-transformers

We can create a model from AutoModel(TFAutoModel) function:

from transformers import AutoModel 
model = AutoModel.from_pretrained('distilbert-base-uncase')

In other hand, a model is created by AutoModelForSequenceClassification(TFAutoModelForSequenceClassification):

from transformers import AutoModelForSequenceClassification
model = AutoModelForSequenceClassification('distilbert-base-uncase')

As I know, both models use distilbert-base-uncase library to create models. From name of methods, the second class( AutoModelForSequenceClassification ) is created for Sequence Classification.

But what are really differences in 2 classes? And how to use them correctly?

(I searched in huggingface but it is not clear)

Aeolic answered 10/11, 2021 at 3:33 Comment(1)

I got a difference : with AutoModel, we can use last_hidden_state to get the [CLS] token. AutoModelSequenceClassification last_hidden_state is not exist. – Aeolic 10/11, 2021 at 9:38

The difference between AutoModel and AutoModelForSequenceClassification model is that AutoModelForSequenceClassification has a classification head on top of the model outputs which can be easily trained with the base model

Denys answered 5/12, 2021 at 9:7 Comment(6)

So can I use AutoModel for Classification purpose? – Aeolic 5/12, 2021 at 9:19

No actually from the Hugging face course you can see that,For our example, we will need a model with a sequence classification head (to be able to classify the sentences as positive or negative). So, we won’t actually use the AutoModel class, but AutoModelForSequenceClassification:huggingface.co/course/chapter2/2?fw=pt – Denys 7/12, 2021 at 14:33

Does that means AutoModel have frozen weights while AutoModelForSequenceClassification have trainable weights? Actually, I have a requirement where I only want the model to act as a extractor and not as a trainable model. – Pritchett 30/12, 2021 at 17:6

@Denys What is classification head? – Teenateenage 26/1, 2023 at 1:38

@subho: what is the classification head? Is it a linear layer with D_in=number_of_classes? Any online pouter will be very useful! – Ottavia 11/2, 2023 at 12:58

It would depend on the architecture, this is the head used for RoBERTa for example - github.com/huggingface/transformers/blob/… – Edgeways 20/7, 2023 at 2:11

To add more information,

both are classes that instantiate any model from a checkpoint; the difference is what you want it to be returned like features or logits to be processed further.

Automodel class:

Returns hidden_states/features i.e., contextual understanding of the input sentences by the model.

AutoModelForSequenceClassification (consider sequence classification task) class:

The output of Automodel is an input to the classifier head (usually one or few linear layers) which outputs logit/s for input sequence/s. The Softmax of logit is interpreted as a probability. The illustration of an entire pipeline as shown below:

The different tasks could have been performed with the same architecture, but each of these tasks will have a different head associated with it (as mentioned on hugging face)

model + sequence classification head --> AutoModelForSequenceClassification
model + question answering head --> AutoModelForQuestionAnswering
model + token classification head --> AutoModelForTokenClassification

we can customize these heads according to our case study (e.g., adding dropout/dense layers or modifying the last layer from 5 to 2 nodes, or converting the question_answering head to the text_classification head) on top of the model.

A nice blog on customizing head.

Source: HuggingFace

Facilitation answered 1/12, 2023 at 21:55 Comment(1)

Very interesting! Thank you I have a small qst: None of those 2 models returns the attention. Is there any way to get them. I used output_attentions=True, but still output.attentions = None for AutoModelForSequenceClassification and AutoModel doesn't have that attribute. – Theca 23/4 at 10:4

Recommended topics

Hot tags