Huggingface AlBert tokenizer NoneType error with Colab
Asked Answered
F

6

21

I simply tried the sample code from hugging face website: https://huggingface.co/albert-base-v2

from transformers import AlbertTokenizer, AlbertModel
tokenizer = AlbertTokenizer.from_pretrained('albert-base-v2')
text = "Replace me by any text you'd like."
encoded_input = tokenizer(text, return_tensors='pt')

then I got the following error at the tokenizer step: encoded_input = tokenizer(text, return_tensors='pt')

TypeError: 'NoneType' object is not callable

I tried the same code on my local machine, it worked no problem. The problem seems within Colab. However, I do need help to run this model on colab GPU.

My python version on colab is Python 3.6.9.

Flee answered 23/1, 2021 at 1:0 Comment(1)
Can you please add the versions of the transformers library you are using (i.e. transformers.__version__)?Tauro
F
44

I found the answer. After install, import the AlbertTokenizer and Tokenizer=..., I received an error asking me to install SentencePiece package. However, after I install this package and run tokenizer again, I started receiving the error above. So I open a brand new colab session, and install everything including the SentencePiece before creating tokenizer, and this time it worked. The Nonetype error simply means it doesn't know what is albert-base-v2. However if you install the packages in right order colab will recognize better the relationship between AlbertTokenizer and SentencePiece. In short for this to work in colab

  1. Open a new Colab session
  2. Install Transformers and SentencePiece
  3. import AlbertTokenizer
  4. create tokenizer.
Flee answered 7/2, 2021 at 15:52 Comment(1)
Thanks! This helped. Sometimes it's the good old "did you tried to turn it on and off again?"Tajuanatak
P
11

MeiNan Zhu's answer is correct.

Installing or importing SentencePiece before transformers works.

pip install Sentencepiece
!pip install transformers

tokenizer = XLNetTokenizer.from_pretrained('xlnet-base-cased', do_lower_case=True)

type(tokenizer)

transformers.models.xlnet.tokenization_xlnet.XLNetTokenizer

Planarian answered 15/2, 2021 at 17:43 Comment(0)
R
4

I just tried restarting the kernel and run again after installing - !pip install sentencepiece - and it worked..

Roseline answered 16/3, 2022 at 14:14 Comment(0)
P
4

If you have installed transformers and sentencepiece library and still face NoneType error, restart your colab runtime by pressing shortcut key CTRL+M .

(note the dot in shortcuts key)

or use runtime menu and rerun all imports.

Note: don't rerun the library installation cells (cells that contain pip install xxx)

Plant answered 1/7, 2022 at 18:14 Comment(0)
P
1

Had the exact same problem, just restarting the Jupyter Notebook kernel after installing the sentencepiece library did the trick for me.

Pigpen answered 24/3, 2022 at 9:29 Comment(0)
D
0

I was having this issue with LlamaTokenizer.from_pretarined(MODEL_NAME).

With every huggingface tokenizer, you need the sentencepiece library to be installed. Here is my code:

!pip install -qqq transformers==4.28.1 --progress-bar off
!pip install -qqq bitsandbytes==0.38.1 --progress-bar off
!pip install -qqq accelerate==0.18.0 --progress-bar off
!pip install -qqq sentencepiece==0.1.99 --progress-bar off

Once you install the libraries, make sure you import them in order:

import textwrap
import torch
import sentencepiece
from transformers import LlamaForCausalLM, LlamaTokenizer, GenerationConfig

And then make sure you restart your kernel, wherever you are writing code (Google-colab, VS Code,etc.). Otherwise, the installed libraries will not be recognized and you will get the weird NoneType error.

Delapaz answered 10/5, 2023 at 6:29 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.