Currently, I'm building a new transformer-based model with huggingface-transformers, where attention layer is different from the original one. I used run_glue.py
to check performance of my model on GLUE benchmark. However, I found that Trainer class of huggingface-transformers saves all the checkpoints that I set, where I can set the maximum number of checkpoints to save. However, I want to save only the weight (or other stuff like optimizers) with best performance on validation dataset, and current Trainer class doesn't seem to provide such thing. (If we set the maximum number of checkpoints, then it removes older checkpoints, not ones with worse performances). Someone already asked about same question on Github, but I can't figure out how to modify the script and do what I want. Currently, I'm thinking about making a custom Trainer class that inherits original one and change the train()
method, and it would be great if there's an easy and simple way to do this. Thanks in advance.
You may try the following parameters from trainer in the huggingface
training_args = TrainingArguments(
output_dir='/content/drive/results', # output directory
do_predict= True,
num_train_epochs=3, # total number of training epochs
**per_device_train_batch_size=4, # batch size per device during training
per_device_eval_batch_size=2**, # batch size for evaluation
warmup_steps=1000, # number of warmup steps for learning rate
save_steps=1000,
save_total_limit=10,
load_best_model_at_end= True,
weight_decay=0.01, # strength of weight decay
logging_dir='./logs', # directory for storing logs
logging_steps=0, evaluate_during_training=True)
There may be better ways to avoid too many checkpoints and selecting the best model. So far you can not save only the best model, but you check when the evaluation yields better results than the previous one.
I have not seen any parameter for that. However, there is a workaround.
Use following combinations
evaluation_strategy =‘steps’,
eval_steps = 10, # Evaluation and Save happens every 10 steps
save_total_limit = 5, # Only last 5 models are saved. Older ones are deleted.
load_best_model_at_end=True,
When I tried with the above combination, at any time 5 previous models will be saved in output directory, but if the best model is not one among them, it will keep the best model as well. So it will be 1 + 5 models. You can change save_total_limit = 1 so it will serve your purpose
This answer could be useful
training_args = TrainingArguments(
output_dir=repo_name,
group_by_length=True,
length_column_name='input_length',
per_device_train_batch_size=24,
gradient_accumulation_steps=2,
evaluation_strategy="steps",
num_train_epochs=20,
fp16=True,
save_steps=1000,
save_strategy='steps', # we cannot set it to "no". Otherwise, the model cannot guess the best checkpoint.
eval_steps=1000,
logging_steps=1000,
learning_rate=5e-5,
warmup_steps=500,
save_total_limit=3,
load_best_model_at_end = True # this will let the model save the best checkpoint
)
As indicated here as well, there are different ways to save the best checkpoint.
If you use save_total_limits=2
and load_best_model_at_end=True
, then the latest and the best model will be saved. From the numbers in the names of these directories, one could infer which checkpoint is which. Even if save_total_limits=1
, it is likely that two models will be saved again, the best and the latest (to resume training), if they are not the same.
When load_best_model_at_end=True
, then doing trainer.state.best_model_checkpoint
after training can be used to get the best model.
If the best model is loaded at the end of training, then this trainer.save_model(output_dir=custom_path)
can also save the best model in a separate directory.
© 2022 - 2024 — McMap. All rights reserved.
load_best_model_at_end = True
, thansave_steps
and, obviously,save_total_limit
will be ignored – Nonce