Custom NERs training with spaCy 3 throws ValueError
Asked Answered
R

2

1

I am trying to add custom NER labels using spacy 3. I found tutorials for older versions and made adjustments for spacy 3. Here is the whole code I am using:

import random
import spacy
from spacy.training import Example

LABEL = 'ANIMAL'
TRAIN_DATA = [
    ("Horses are too tall and they pretend to care about your feelings", {'entities': [(0, 6, LABEL)]}),
    ("Do they bite?", {'entities': []}),
    ("horses are too tall and they pretend to care about your feelings", {'entities': [(0, 6, LABEL)]}),
    ("horses pretend to care about your feelings", {'entities': [(0, 6, LABEL)]}),
    ("they pretend to care about your feelings, those horses", {'entities': [(48, 54, LABEL)]}),
    ("horses?", {'entities': [(0, 6, LABEL)]})
]
nlp = spacy.load('en_core_web_sm')  # load existing spaCy model
ner = nlp.get_pipe('ner')
ner.add_label(LABEL)
print(ner.move_names) # Here I see, that the new label was added
optimizer = nlp.create_optimizer()
# get names of other pipes to disable them during training
other_pipes = [pipe for pipe in nlp.pipe_names if pipe != "ner"]
with nlp.disable_pipes(*other_pipes):  # only train NER
    for itn in range(20):
        random.shuffle(TRAIN_DATA)
        losses = {}
        for text, annotations in TRAIN_DATA:
            doc = nlp(text)
            example = Example.from_dict(doc, annotations)
            nlp.update([example], drop=0.35, sgd=optimizer, losses=losses)
        print(losses)
# test the trained model # add some dummy sentences with many NERs

test_text = 'Do you like horses?'
doc = nlp(test_text)
print("Entities in '%s'" % test_text)
for ent in doc.ents:
    print(ent.label_, " -- ", ent.text)

This code outputs the ValueError exception, but only after 2 iterations - notice the first 2 lines:

{'ner': 9.862242701536594}
{'ner': 8.169456698315201}
Traceback (most recent call last):
  File ".\custom_ner_training.py", line 46, in <module>
    nlp.update([example], drop=0.35, sgd=optimizer, losses=losses)
  File "C:\ogr\moje\python\spacy_pg\myvenv\lib\site-packages\spacy\language.py", line 1106, in update
    proc.update(examples, sgd=None, losses=losses, **component_cfg[name])
  File "spacy\pipeline\transition_parser.pyx", line 366, in spacy.pipeline.transition_parser.Parser.update
  File "spacy\pipeline\transition_parser.pyx", line 478, in spacy.pipeline.transition_parser.Parser.get_batch_loss
  File "spacy\pipeline\_parser_internals\ner.pyx", line 310, in spacy.pipeline._parser_internals.ner.BiluoPushDown.set_costs
ValueError

I see the ANIMAL label was added by calling ner.move_names.

When I change my the value LABEL = 'PERSON, the code runs successfully and recognizes horses as PERSON on the new data. This is why I am assuming, there is no error in the code itself.

Is there something I am missing? What am I doing wrong? Could someone reproduce, please?

NOTE: This is my first question ever here. I hope I provided all information. If not, let me know in the comments.

Reddy answered 22/2, 2021 at 7:3 Comment(0)
W
2

You need to change the following line in the for loop

doc = nlp(text)

to

doc = nlp.make_doc(text)

The code should work and produce the following results:

{'ner': 9.60289144264557}
{'ner': 8.875474230820478}
{'ner': 6.370401408220459}
{'ner': 6.687456469517201}
... 
{'ner': 1.3796682589133492e-05}
{'ner': 1.7709562613218738e-05}

Entities in 'Do you like horses?'
ANIMAL  --  horses
Wheeler answered 24/2, 2021 at 18:0 Comment(3)
making this change did not work for me with spacy 3.0.3Pearsall
I am using the exact version: Name: spacy - Version: 3.0.3Wheeler
I had to add spacy-lookups-data to my requirements. Your solution works for me now.Pearsall
P
0

One more potential reason could be the misaligned label info in the corpus. You can check if there are extra spaces in the training data. If you, so can first remove extra spaces from the text and calculate the start and end positions of the label within the text.

Piselli answered 21/7, 2022 at 9:46 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.