Suppress HuggingFace logging warning: "Setting `pad_token_id` to `eos_token_id`:{eos_token_id} for open-end generation."

Asked 17/10, 2021 at 23:40 Answered 19/3 at 10:31

Solved huggingface-transformers huggingface-tokenizers

In HuggingFace, every time I call a pipeline() object, I get a warning:

`"Setting `pad_token_id` to `eos_token_id`:{eos_token_id} for open-end generation."

How do I suppress this warning without suppressing all logging warnings? I want other warnings, but I don't want this one.

Adolfoadolph answered 17/10, 2021 at 23:40 Comment(0)

The warning comes for any text generation task done by HuggingFace. This is explained here, and you can see the code here.

Avoid that warning by manually setting the pad_token_id (e.g., to match the tokenizer or the eos_token_id).

Set the pad_token_id in the generation_config with:

model.generation_config.pad_token_id = tokenizer.pad_token_id

Alternatively, if you only need to make a single call to generate:

When you call

model.generate(**encoded_input)

just change it to

model.generate(**encoded_input, pad_token_id=tokenizer.eos_token_id)

Handstand answered 8/3, 2022 at 15:42 Comment(3)

What other effects would changing the PAD token to the EOS token have? – Adolfoadolph 8/3, 2022 at 16:7

why doesn't this just get set automaticly from the models configuration? – Mohur 29/12, 2023 at 17:56

I'm not sure if anything has changed since the post was made, but model.generation_config.pad_token_ids = tokenizer.pad_token_id appears to have a typo; removing the "s" to give model.generation_config.pad_token_id = tokenizer.pad_token_id worked for me. – Geezer 16/5 at 20:30

For a text-generation pipeline, you need to set the pad_token_id in the generator call to suppress the warning:

from transformers import pipeline

generator = pipeline('text-generation', model='gpt2')
sample = generator('test test', pad_token_id=generator.tokenizer.eos_token_id)

Outfight answered 8/3, 2023 at 21:3 Comment(2)

What do you mean by "to suppress the output". @Outfight – Reggi 25/6, 2023 at 11:51

I meant not having the generator print out the warning mentioned in the question: "Setting pad_token_id to eos_token_id:{eos_token_id} for open-end generation." I'll edit the answer to clarify that – Outfight 25/6, 2023 at 21:49

If you cannot access the generate call to add the pad_token_id parameter as suggested in this answer you can also set it in the generation config of the model like this:

model.generation_config.pad_token_id = model.generation_config.eos_token_id

Example for GPT-2:

gpt2 = transformers.AutoModelForCausalLM.from_pretrained('gpt2')
tokenizer = transformers.AutoTokenizer.from_pretrained('gpt2')

gpt2.generation_config.pad_token_id = gpt2.generation_config.eos_token_id

model_inputs = tokenizer("foo", return_tensors='pt')
model_outputs = gpt2.generate(**model_inputs, max_new_tokens=20)
print(tokenizer.decode(model_outputs[0]))

Concentration answered 19/3 at 10:31 Comment(1)

this is the way! +1 for gpt2.generation_config.pad_token_id = gpt2.generation_config.eos_token_id – Spectacled 9/5 at 21:1

Recommended topics

Hot tags