spaCy Tokenizer LEMMA and ORTH Exceptions Not Working
Asked Answered
M

2

6

I'm following an example from Chapter #2 in the book: Natural Language Processing with Python and spaCy by Yuli Vasiliev 2020

enter image description here

The example is suppose to produce the lemmatization output:

['I', 'am' , 'flying' , 'to', 'Frisco']

['-PRON-', 'be' , 'fly' , 'to', 'San Francisco']

I get the following error:

nlp.tokenizer.add_special_case(u'Frisco', sf_special_case)
  File "spacy\tokenizer.pyx", line 601, in spacy.tokenizer.Tokenizer.add_special_case
  File "spacy\tokenizer.pyx", line 589, in spacy.tokenizer.Tokenizer._validate_special_case
ValueError: [E1005] Unable to set attribute 'LEMMA' in tokenizer exception for 'Frisco'. Tokenizer exceptions are only allowed to specify ORTH and NORM.

Could someone please advise for a workaround? I'm not sure if SpaCy version 3.0.3 was changed to no longer allow LEMMA to be part of tokenizer exception? Thanks!

Meadowlark answered 25/2, 2021 at 0:2 Comment(3)
Yeah, this was changed 6 months ago.Wilford
To clarify, this was changed in spaCy v3.0, which was released less than a month ago. This book is most likely using spaCy v2 (I'd guess v2.2 or v2.3, hopefully the author provides the exact version somewhere), so downgrade to spaCy v2 to run these examples, e.g. pip install spacy==2.2.4 or pip install spacy==2.3.5.Sausage
Thanks for clarifying and for the suggestion.Meadowlark
L
2

See https://github.com/explosion/spaCy/issues/7014

import spacy

nlp = spacy.load('en_core_web_sm')

nlp.get_pipe("attribute_ruler").add([[{"TEXT": "Frisco"}]], {"LEMMA": "San Francisco"})

doc = nlp(u'I am flying to Frisco and after to frisco')    
print(['token:%s lemma:%s' % (t.text, t.lemma_) for t in doc])
Leavetaking answered 21/4, 2021 at 8:7 Comment(1)
Thanks but please explain succinctly the issue in words here, not just via link. The text of answers here on SO get indexed for search, retrieval, similarity, dupe-finding etc.; links don't. "Lemmas for Contractions have changed with SpaCy 3.0 ... some code breakage; they plan to have some updates in the v3.1 models to partially fix"Cabernet
B
0

Try in Google Colab. The code works fine as it uses version 2.2.4 of spacy.

Bazan answered 2/8, 2021 at 4:21 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.