Transformers v4.x: Convert slow tokenizer to fast tokenizer

Asked 23/12, 2020 at 22:44 Answered 5/3 at 16:41

Solved python nlp huggingface-transformers huggingface-tokenizers

I'm following the transformer's pretrained model xlm-roberta-large-xnli example

from transformers import pipeline
classifier = pipeline("zero-shot-classification",
                      model="joeddav/xlm-roberta-large-xnli")

and I get the following error

ValueError: Couldn't instantiate the backend tokenizer from one of: (1) a `tokenizers` library serialization file, (2) a slow tokenizer instance to convert or (3) an equivalent slow tokenizer class to instantiate and convert. You need to have sentencepiece installed to convert a slow tokenizer to a fast one.

I'm using Transformers version '4.1.1'

Featherstone answered 23/12, 2020 at 22:44 Comment(0)

According to Transformers v4.0.0 release, sentencepiece was removed as a required dependency. This means that

"The tokenizers that depend on the SentencePiece library will not be available with a standard transformers installation"

including the XLMRobertaTokenizer. However, sentencepiece can be installed as an extra dependency

pip install transformers[sentencepiece]

pip install sentencepiece

if you have transformers already installed.

Featherstone answered 23/12, 2020 at 22:44 Comment(1)

pip install sentencepiece followed by kernel/runtime restart solves the issue. – Lucilelucilia 6/4, 2021 at 10:5

If you are in google collab:

Factory reset the runtime.
Upgrade the pip by using the following command (pip install --upgrade pip)
Install sentencepiece using the following command (!pip install sentencepiece)

Samantha answered 1/4, 2021 at 18:43 Comment(1)

Yes , that's work the most important thing is to restart the session . – Pearlene 16/1 at 6:53

The below preceding code worked for me in colab notebook

!pip install transformers[sentencepiece]

Vive answered 24/8, 2021 at 4:37 Comment(0)

Tokenizers that depend on the SentencePiece library will not be available with a standard transformers installation.

You should install sentencepiece additionally along with transformer

pip install transformers[sentencepiece]

This is needed for slow versions of:XLNetTokenizer, AlbertTokenizer, CamembertTokenizer, MBartTokenizer, PegasusTokenizer, T5Tokenizer, ReformerTokenizer, XLMRobertaTokenizer

Source: For more information in Github

Theda answered 4/7, 2023 at 13:9 Comment(0)

or you can set use_fast=False paramater in AutoTokenizer.frompretrained()

Theodor answered 5/3 at 16:41 Comment(0)

Recommended topics

Hot tags