from transformers import BertTokenizer, BertForMaskedLM
import torch
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertForMaskedLM.from_pretrained('bert-base-uncased')
input_ids = torch.tensor(tokenizer.encode("Hello, my dog is cute", add_special_tokens=True)).unsqueeze(0) # Batch size 1
outputs = model(input_ids, masked_lm_labels=input_ids)
loss, prediction_scores = outputs[:2]
This code is from huggingface transformers page. https://huggingface.co/transformers/model_doc/bert.html#bertformaskedlm
I cannot understand the masked_lm_labels=input_ids
argument in model
.
How does it work? Does it means that it will automatically mask some of the text when input_ids
is passed?