Why doesn't trainer report evaluation metrics while training in the tutorial?

F

5

I am following this tutorial to learn about the trainer API. https://huggingface.co/transformers/training.html

I copied the code as below:

from datasets import load_dataset

import numpy as np
from datasets import load_metric

metric = load_metric("accuracy")

def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    return metric.compute(predictions=predictions, references=labels)

print('Download dataset ...')
raw_datasets = load_dataset("imdb")
from transformers import AutoTokenizer

print('Tokenize text ...')
tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")
def tokenize_function(examples):
    return tokenizer(examples["text"], padding="max_length", truncation=True)
tokenized_datasets = raw_datasets.map(tokenize_function, batched=True)

print('Prepare data ...')
small_train_dataset = tokenized_datasets["train"].shuffle(seed=42).select(range(500))
small_eval_dataset = tokenized_datasets["test"].shuffle(seed=42).select(range(500))
full_train_dataset = tokenized_datasets["train"]
full_eval_dataset = tokenized_datasets["test"]

print('Define model ...')
from transformers import AutoModelForSequenceClassification
model = AutoModelForSequenceClassification.from_pretrained("bert-base-cased", num_labels=2)

print('Define trainer ...')
from transformers import TrainingArguments, Trainer
training_args = TrainingArguments("test_trainer", evaluation_strategy="epoch")
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=small_train_dataset,
    eval_dataset=small_eval_dataset,
    compute_metrics=compute_metrics,
)

print('Fine-tune train ...')
trainer.evaluate()

However, it doesn't report anything about training metrics, but the following message:

Download dataset ...
Reusing dataset imdb (/Users/congminmin/.cache/huggingface/datasets/imdb/plain_text/1.0.0/4ea52f2e58a08dbc12c2bd52d0d92b30b88c00230b4522801b3636782f625c5b)
Tokenize text ...
100%|██████████| 25/25 [00:06<00:00,  4.01ba/s]
100%|██████████| 25/25 [00:06<00:00,  3.99ba/s]
100%|██████████| 50/50 [00:13<00:00,  3.73ba/s]
Prepare data ...
Define model ...
Some weights of the model checkpoint at bert-base-cased were not used when initializing BertForSequenceClassification: ['cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-cased and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Define trainer ...
Fine-tune train ...
100%|██████████| 63/63 [08:35<00:00,  8.19s/it]

Process finished with exit code 0

Isn't the tutorial updated? should I make some configuration changes to report the metrics?

Farflung answered 20/5, 2021 at 17:31 Comment(0)

N

3

I think you need to tell the trainer how often to evaluate performance with evaluation_strategy and eval_steps in TrainingArguments

Naturopathy answered 23/11, 2021 at 21:49 Comment(0)

S

1

The evaluate function returns the metrics, it doesn't print them. Does

metrics=trainer.evaluate()
print(metrics)

work? Also, the message is saying you're using the base bert model, which was not pretrained for sentence classification, but rather the base language model .Therefore it doesn't have the initialized weights for the task and should be trained

Senior answered 21/5, 2021 at 11:17 Comment(1)

Hi, Samer, but this tutorial is exactly for fine-tuning a pretrained language model and should work for sentence classification. Is that right? The code I pasted from the tutorial contains fine tuning on sentence classification. – Farflung 22/5, 2021 at 4:5

K

1

Why are you doing trainer.evaluate() ? This just runs validation on the validation set. If you want to fine-tune or train, you need to do:

trainer.train()

Kellner answered 26/5, 2021 at 20:30 Comment(0)

H

1

The key point is label, bert model need label field is "labels", so you have to rename column.

# check fields
print(next(iter(small_train_dataset)).keys())

# rename field
small_train_dataset = small_train_dataset.rename_column("label", "labels")
small_eval_dataset = small_eval_dataset.rename_column("label", "labels")

from doc

rename the column "label" to "labels" (because the model expect the argument to be named labels)

Halothane answered 22/1, 2022 at 6:51 Comment(0)

M

0

You should add the evaluation_strategy='epoch' or evaluation_strategy='steps' to your trainer arguments. The default is no evaluation during training.

Militia answered 7/12, 2022 at 18:18 Comment(0)

Recommended topics

Hot tags