I am finetuning the HuggingFace facebook/bart-large-mnli
model to suit my need, I use the following parameters:
training_args = TrainingArguments(
output_dir=model_directory, # output directory
num_train_epochs=30, # total number of training epochs
per_device_train_batch_size=1, # batch size per device during training - 16 - Don't go over 1, it's out of memory
per_device_eval_batch_size=2, # batch size for evaluation - 64 - Don't go over 2, it's out of memory
warmup_steps=500, # number of warmup steps for learning rate scheduler - 500
weight_decay=0.01, # strength of weight decay
)
model = BartForSequenceClassification.from_pretrained("facebook/bart-large-mnli")
trainer = Trainer(
model=model, # the instantiated π€ Transformers model to be trained
args=training_args, # training arguments, defined above
compute_metrics=compute_metrics, # a function to compute the metrics
train_dataset=train_dataset, # training dataset
eval_dataset=test_dataset # evaluation dataset
)
# Train the trainer
trainer.train()
The compute_metrics
I use is:
import numpy as np
from datasets import Dataset, load_metric
from transformers import EvalPrediction
def compute_metrics(p: EvalPrediction):
metric_acc = load_metric("accuracy")
preds = p.predictions[0] if isinstance(p.predictions, tuple) else p.predictions
preds = np.argmax(preds, axis=1)
result = {}
result["accuracy"] = metric_acc.compute(predictions=preds, references=p.label_ids)["accuracy"]
return result
But no matter how much train or test data I use, or how many epochs, when I use trainer.evaluate()
I get an accuracy of 0.5.
My questions are:
- How do I improve it?
- How do I implement other metrics for the evaluation? for example F1 score.
I tried changing (adding) the metrics to this:
def compute_metrics(p: EvalPrediction):
load_accuracy = load_metric("accuracy")
load_f1 = load_metric("f1")
preds = p.predictions[0] if isinstance(p.predictions, tuple) else p.predictions
preds = np.argmax(preds, axis=1)
result = {}
result["accuracy"] = load_accuracy.compute(predictions=preds, references=p.label_ids)["accuracy"]
result["f1"] = load_f1.compute(predictions=preds, references=p.label_ids)["f1"]
return result
But then I got this error while running trainer.evaluate()
:
ValueError: pos_label=1 is not a valid label. It should be one of [0, 2]
You can refer to my previous question for more details about my finetuning here
Update:
This is the tokenizer I used:
from transformers import BartTokenizerFast
tokenizer = BartTokenizerFast.from_pretrained('facebook/bart-large-mnli')
And as stated in my other linked questions, this is what I used in order to create and convert my dataset
As I wrote above, you can refer to the linked questions that I had for more data about all of my processes, I feel like it's unnecessary to put everything in every single question again, correct me if I'm wrong.
Trainer
class not include atokenizer
? β Southdown