How to get the logits of the model with a text classification pipeline from HuggingFace?

selected_model = "distilbert-base-uncased-finetuned-sst-2-english" tokenizer = AutoTokenizer.from_pretrained(selected_model) model = AutoModelForSequenceClassification.from_pretrained(selected_model, num_labels=2) classifier = pipeline('sentiment-analysis', model=model, tokenizer=tokenizer) print(classifier(text))

When you use the default pipeline, the postprocess function will usually take the softmax, e.g.

from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained('distilbert-base-uncased-finetuned-sst-2-english')
model = AutoModelForSequenceClassification.from_pretrained('distilbert-base-uncased-finetuned-sst-2-english')


text = ['hello this is a test',
 'that transforms a list of sentences',
 'into a list of list of sentences',
 'in order to emulate, in this case, two batches of the same lenght',
 'to be tokenized by the hf tokenizer for the defined model']

classifier(text, batch_size=2, truncation="only_first")

[out]:

[{'label': 'NEGATIVE', 'score': 0.9379090666770935},
 {'label': 'POSITIVE', 'score': 0.9990271329879761},
 {'label': 'NEGATIVE', 'score': 0.9726701378822327},
 {'label': 'NEGATIVE', 'score': 0.9965035915374756},
 {'label': 'NEGATIVE', 'score': 0.9913086891174316}]

So what you want is to overload the postprocess logic by inheriting from the pipeline.

To check which pipeline the classifier inherits do this:

classifier = pipeline('sentiment-analysis', model=model, tokenizer=tokenizer)
type(classifier)

[out]:

transformers.pipelines.text_classification.TextClassificationPipeline

Now that you know the parent class of the task pipeline you want to use, now you can do this and still enjoy the perks of the precoded batching from TextClassificationPipeline:

from transformers import TextClassificationPipeline

class MarioThePlumber(TextClassificationPipeline):
    def postprocess(self, model_outputs):
        best_class = model_outputs["logits"]
        return best_class

pipe = MarioThePlumber(model=model, tokenizer=tokenizer)

pipe(text, batch_size=2, truncation="only_first")

[out]:

[tensor([[ 1.5094, -1.2056]]),
 tensor([[-3.4114,  3.5229]]),
 tensor([[ 1.8835, -1.6886]]),
 tensor([[ 3.0780, -2.5745]]),
 tensor([[ 2.5383, -2.1984]])]

Recommended topics

Hot tags