I have a train dataset of size 4107.
DatasetDict({
train: Dataset({
features: ['input_ids'],
num_rows: 4107
})
valid: Dataset({
features: ['input_ids'],
num_rows: 498
})
})
In my training arguments, the batch size is 8 and number of epochs is 2.
from transformers import Trainer, TrainingArguments
args = TrainingArguments(
output_dir="code_gen_epoch",
per_device_train_batch_size=8,
per_device_eval_batch_size=8,
evaluation_strategy="epoch",
save_strategy="epoch",
eval_steps=100,
logging_steps=100,
gradient_accumulation_steps=8,
num_train_epochs=2,
weight_decay=0.1,
warmup_steps=1_000,
lr_scheduler_type="cosine",
learning_rate=3.0e-4,
# save_steps=200,
# fp16=True,
load_best_model_at_end = True,
)
trainer = Trainer(
model=model,
tokenizer=tokenizer,
args=args,
data_collator=data_collator,
train_dataset=tokenized_datasets["train"],
eval_dataset=tokenized_datasets["valid"],
)
When I start the training, I can see that the number of steps is 128.
My assumption is that the steps should have been 4107/8 = 512(approx) for 1 epoch. For 2 epochs 512+512 = 1024.
I don't understand how it came to be 128.