I am trying to train an NER model using the HuggingFace transformers library on Colab cloud GPUs, pickle it and load the model on my own CPU to make predictions.
Code
The model is the following:
from transformers import BertForTokenClassification
model = BertForTokenClassification.from_pretrained(
"bert-base-cased",
num_labels=NUM_LABELS,
output_attentions = False,
output_hidden_states = False
)
I am using this snippet to save the model on Colab
import torch
torch.save(model.state_dict(), FILENAME)
Then load it on my local CPU using
# Initiating an instance of the model type
model_reload = BertForTokenClassification.from_pretrained(
"bert-base-cased",
num_labels=len(tag2idx),
output_attentions = False,
output_hidden_states = False
)
# Loading the model
model_reload.load_state_dict(torch.load(FILENAME, map_location='cpu'))
model_reload.eval()
The code snippet used to tokenize the text and make actual predictions is the same both on the Colab GPU notebook instance and my CPU notebook instance.
Expected Behavior
The GPU-trained model behaves correctly and classifies the following tokens perfectly:
O [CLS]
O Good
O morning
O ,
O my
O name
O is
B-per John
I-per Kennedy
O and
O I
O am
O working
O at
B-org Apple
O in
O the
O headquarters
O of
B-geo Cupertino
O [SEP]
Actual Behavior
When loading the model and use it to make predictions on my CPU, the predictions are totally wrong:
I-eve [CLS]
I-eve Good
I-eve morning
I-eve ,
I-eve my
I-eve name
I-eve is
I-geo John
B-eve Kennedy
I-eve and
I-eve I
I-eve am
I-eve working
I-eve at
I-gpe Apple
I-eve in
I-eve the
I-eve headquarters
I-eve of
B-org Cupertino
I-eve [SEP]
Does anyone have ideas why it doesn't work? Did I miss something?
state_dict
with us? – Felicitous