Suppress HuggingFace logging warning: "Setting `pad_token_id` to `eos_token_id`:{eos_token_id} for open-end generation."
Asked Answered
A

3

46

In HuggingFace, every time I call a pipeline() object, I get a warning:

`"Setting `pad_token_id` to `eos_token_id`:{eos_token_id} for open-end generation."

How do I suppress this warning without suppressing all logging warnings? I want other warnings, but I don't want this one.

Adolfoadolph answered 17/10, 2021 at 23:40 Comment(0)
H
72

The warning comes for any text generation task done by HuggingFace. This is explained here, and you can see the code here.

Avoid that warning by manually setting the pad_token_id (e.g., to match the tokenizer or the eos_token_id).

Set the pad_token_id in the generation_config with:

model.generation_config.pad_token_id = tokenizer.pad_token_id

Alternatively, if you only need to make a single call to generate:

When you call

model.generate(**encoded_input)

just change it to

model.generate(**encoded_input, pad_token_id=tokenizer.eos_token_id)
Handstand answered 8/3, 2022 at 15:42 Comment(3)
What other effects would changing the PAD token to the EOS token have?Adolfoadolph
why doesn't this just get set automaticly from the models configuration?Mohur
I'm not sure if anything has changed since the post was made, but model.generation_config.pad_token_ids = tokenizer.pad_token_id appears to have a typo; removing the "s" to give model.generation_config.pad_token_id = tokenizer.pad_token_id worked for me.Geezer
O
16

For a text-generation pipeline, you need to set the pad_token_id in the generator call to suppress the warning:

from transformers import pipeline

generator = pipeline('text-generation', model='gpt2')
sample = generator('test test', pad_token_id=generator.tokenizer.eos_token_id)
Outfight answered 8/3, 2023 at 21:3 Comment(2)
What do you mean by "to suppress the output". @OutfightReggi
I meant not having the generator print out the warning mentioned in the question: "Setting pad_token_id to eos_token_id:{eos_token_id} for open-end generation." I'll edit the answer to clarify thatOutfight
C
2

If you cannot access the generate call to add the pad_token_id parameter as suggested in this answer you can also set it in the generation config of the model like this:

model.generation_config.pad_token_id = model.generation_config.eos_token_id

Example for GPT-2:

gpt2 = transformers.AutoModelForCausalLM.from_pretrained('gpt2')
tokenizer = transformers.AutoTokenizer.from_pretrained('gpt2')

gpt2.generation_config.pad_token_id = gpt2.generation_config.eos_token_id

model_inputs = tokenizer("foo", return_tensors='pt')
model_outputs = gpt2.generate(**model_inputs, max_new_tokens=20)
print(tokenizer.decode(model_outputs[0]))
Concentration answered 19/3 at 10:31 Comment(1)
this is the way! +1 for gpt2.generation_config.pad_token_id = gpt2.generation_config.eos_token_idSpectacled

© 2022 - 2024 — McMap. All rights reserved.