How is the number of steps calculated in HuggingFace trainer?
Asked Answered
S

1

9

I have a train dataset of size 4107.

DatasetDict({
    train: Dataset({
        features: ['input_ids'],
        num_rows: 4107
    })
    valid: Dataset({
        features: ['input_ids'],
        num_rows: 498
    })
})

In my training arguments, the batch size is 8 and number of epochs is 2.

from transformers import Trainer, TrainingArguments

args = TrainingArguments(
    output_dir="code_gen_epoch",
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    eval_steps=100,
    logging_steps=100,
    gradient_accumulation_steps=8,
    num_train_epochs=2,
    weight_decay=0.1,
    warmup_steps=1_000,
    lr_scheduler_type="cosine",
    learning_rate=3.0e-4,
    # save_steps=200,
    # fp16=True,
    load_best_model_at_end = True,
)

trainer = Trainer(
    model=model,
    tokenizer=tokenizer,
    args=args,
    data_collator=data_collator,
    train_dataset=tokenized_datasets["train"],
    eval_dataset=tokenized_datasets["valid"],
)

When I start the training, I can see that the number of steps is 128.

enter image description here

My assumption is that the steps should have been 4107/8 = 512(approx) for 1 epoch. For 2 epochs 512+512 = 1024.

I don't understand how it came to be 128.

Superstition answered 13/4, 2023 at 7:6 Comment(0)
D
10

Since you're specifying gradient_accumulation_steps=8, the effective number of steps is is also divided by 8. This is because you're not doing a backward pass on every batch, but on a certain number of accumulated batches.

Hence, the resulting number of steps in an epoch would be: 4107 instances ÷ 8 batch size ÷ 8 gradient accumulation ≈ 128 steps. When gradient accumulation is disabled (gradient_accumulation_steps=1) you get 512 steps (4107 ÷ 8 ÷ 1 ≈ 512).

Dominga answered 13/4, 2023 at 12:11 Comment(1)
Check out HuggingFace documentation for more details on gradient accumulation huggingface.co/docs/transformers/v4.19.2/en/…Endowment

© 2022 - 2024 — McMap. All rights reserved.