I need to use pipeline
in order to get the tokenization and inference from the distilbert-base-uncased-finetuned-sst-2-english
model over my dataset.
My data is a list of sentences, for recreation purposes we can assume it is:
texts = ["this is the first sentence", "of my data.", "In fact, thats not true,", "but we are going to assume it", "is"]
Before using pipeline
, I was getting the logits from the model outputs like this:
with torch.no_grad():
logits = model(**tokenized_test).logits
Now I have to use pipeline, so this is the way I'm getting the model's output:
selected_model = "distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(selected_model)
model = AutoModelForSequenceClassification.from_pretrained(selected_model, num_labels=2)
classifier = pipeline('sentiment-analysis', model=model, tokenizer=tokenizer)
print(classifier(text))
which gives me:
[{'label': 'POSITIVE', 'score': 0.9746173024177551}, {'label': 'NEGATIVE', 'score': 0.5020197629928589}, {'label': 'NEGATIVE', 'score': 0.9995120763778687}, {'label': 'NEGATIVE', 'score': 0.9802979826927185}, {'label': 'POSITIVE', 'score': 0.9274746775627136}]
And I cant get the 'logits' field anymore.
Is there a way to get the logits
instead of the label
and score
? Would a custom pipeline be the best and/or easiest way to do it?