Fine-tuning model's classifier layer with new label

Asked 19/4, 2021 at 8:32 Answered 24/6 at 22:1

I would like to fine-tune already fine-tuned BertForSequenceClassification model with new dataset containing just 1 additional label which hasn't been seen by model before.

By that, I would like to add 1 new label to the set of labels that model is currently able of classifying properly.

Moreover, I don't want classifier weights to be randomly initialized, I'd like to keep them intact and just update them accordingly to the dataset examples while increasing the size of classifier layer by 1.

The dataset used for further fine-tuning could look like this:

sentece,label
intent example 1,new_label
intent example 2,new_label
...
intent example 10,new_label

My model's current classifier layer looks like this:

Linear(in_features=768, out_features=135, bias=True)

How could I achieve it?
Is it even a good approach?

Unclinch answered 19/4, 2021 at 8:32 Comment(0)

You can just extend the weights and bias of your model with new values. Please have a look at the commented example below:

#This is the section that loads your model
#I will just use an pretrained model for this example
import torch
from torch import nn
from transformers import AutoModelForSequenceClassification, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("jpcorb20/toxic-detector-distilroberta")
model = AutoModelForSequenceClassification.from_pretrained("jpcorb20/toxic-detector-distilroberta")
#we check the output of one sample to compare it later with the extended layer
#to verify that we kept the previous learnt "knowledge"
f = tokenizer.encode_plus("This is an example", return_tensors='pt')
print(model(**f).logits)

#Now we need to find out the name of the linear layer you want to extend
#The layers on top of distilroberta are wrapped inside a classifier section
#This name can differ for you because it can be chosen randomly
#use model.parameters instead find the classification layer
print(model.classifier)

#The output shows us that the classification layer is called `out_proj`
#We can now extend the weights by creating a new tensor that consists of the
#old weights and a randomly initialized tensor for the new label 
model.classifier.out_proj.weight = nn.Parameter(torch.cat((model.classifier.out_proj.weight, torch.randn(1,768)),0))

#We do the same for the bias:
model.classifier.out_proj.bias = nn.Parameter(torch.cat((model.classifier.out_proj.bias, torch.randn(1)),0))

#and be happy when we compare the output with our expectation 
print(model(**f).logits)

Output:

tensor([[-7.3604, -9.4899, -8.4170, -9.7688, -8.4067, -9.3895]],
       grad_fn=<AddmmBackward>)
RobertaClassificationHead(
  (dense): Linear(in_features=768, out_features=768, bias=True)
  (dropout): Dropout(p=0.1, inplace=False)
  (out_proj): Linear(in_features=768, out_features=6, bias=True)
)
tensor([[-7.3604, -9.4899, -8.4170, -9.7688, -8.4067, -9.3895,  2.2124]],
       grad_fn=<AddmmBackward>)

Please note, that you should fine-tune your model. The new weights are randomly initialized and will therefore negatively impact the performance.

Uncork answered 21/4, 2021 at 0:19 Comment(5)

Thank you for helping me again. I was able to extend classification layer and verify that weights remained intact. However, with randomly intialized new weight-bias pair, model's overall accuracy considerably(randomly) fell. During fine-tuning such model with dataset as shown above seems like the general classification abilities were highly impacted by that fine-tuning which I wouldn't expect since the model was fine-tuned before with thousands of examples and here we've got just couple. Could you point me some fields regarding that matter where I should put more interest in? – Unclinch 23/4, 2021 at 12:9

@Unclinch I am not surprised by that. When you check the results of the model that was not finetuned, your sentences will probably all got labeled with the new class. The linear layer applies a simple transformation y=xA^T+b and you later apply something like argmax to select the class of your sentence. While the weights of the other classes are fairly after your finetuning, the newly introduced class is not and is therefore overlapping every other class or not present at all. – Uncork 25/4, 2021 at 21:37

@Unclinch In case your main objective was saving some time, you could try to freeze all layers except the classification head and finetune this model. – Uncork 25/4, 2021 at 21:48

there was a discussion about freezing layers for fine-tuning transformer based models and the conclusion seems to be that freezing isn't a good idea – Magdalenmagdalena 18/5, 2021 at 21:2

@RameshArvind Saying that it isn't a good idea is not what sgugger meant. He said that transformers are usually fully trained to get the best results but that doesn't mean it isn't a good idea to freeze some layers. For example training bert for mrpc without freezing takes 2:36 minutes and achieves 88%, while training only the classification head takes 0:53 minutes and already achieves 81%. – Uncork 23/5, 2021 at 22:42

I worked through cronoik's answer and found a few changes that may be deberta specific. In my case:

The classifier label is accessed directly (e.g: model.classifier.weight and model.classifier.weight.size())
You need to update the underlying model to expect additional classes otherwise it raises an error at training time.

The resulting code to update deberta to add as many additional classification labels as exist in the id2label function was:

  model = AutoModelForSequenceClassification.from_pretrained(
      trained_model_path)
  std = torch.std(model.classifier.weight)

  #scale to current stds, 
  new_tensor = torch.randn(len(id2label) - model.classifier.weight.size(dim=0),
                           model.classifier.weight.size(dim=1)
                           ) * std
  weight_with_new_output = nn.Parameter(torch.cat((model.classifier.weight,new_tensor),0))
  
  #now reload model but with new id2label args. This sets up the rest of the model
  # (loss, etc) to expect more outputs
  # replace randomized weights in model.classifier.weight with previous weights and the new random weights for new classes
  model = AutoModelForSequenceClassification.from_pretrained(
      trained_model_path,
      num_labels=len(global_var_for_categories), 
      id2label=id2label, 
      label2id=label2id,
      problem_type="multi_label_classification",
      ignore_mismatched_sizes=True
  )
  model.classifier.weight = weight_with_new_output

Casino answered 24/6 at 22:1 Comment(0)

Recommended topics

Hot tags