I currently use a huggingface pipeline for sentiment-analysis like so:
from transformers import pipeline
classifier = pipeline('sentiment-analysis', device=0)
The problem is that when I pass texts larger than 512 tokens, it just crashes saying that the input is too long. Is there any way of passing the max_length and truncate parameters from the tokenizer directly to the pipeline?
My work around is to do:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
model_name = "nlptown/bert-base-multilingual-uncased-sentiment"
model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
classifier = pipeline('sentiment-analysis', model=model, tokenizer=tokenizer, device=0)
And then when I call the tokenizer:
pt_batch = tokenizer(text, padding=True, truncation=True, max_length=512, return_tensors="pt")
But it would be much nicer to simply be able to call the pipeline directly like so:
classifier(text, padding=True, truncation=True, max_length=512)