Spacy:Trying to set conflicting doc.ents: A token can only be part of one entity, so make sure the entities you're setting don't overlap
Asked Answered
H

1

3

I trying to use spacy to extract required custom entities from the text.

import spacy
from spacy_lookup import Entity
data = {0:["count"],1:["unique count","unique"]}

def processText(text):
    nlp = spacy.blank('en')
    for i,arr in data.items():
        fLabel = "test:"+str(i)
        fEntitty = Entity(keywords_list=list(set(arr)),label=fLabel)
        fEntitty.name = fLabel
        nlp.add_pipe(fEntitty)
    match_doc = nlp(text)
    print(match_doc.ents)
processText("unique count of city")

But the above code is throwing error like

ValueError: [E103] Trying to set conflicting doc.ents: '(1, 2, 'test:0')' and '(0, 2, 'test:1')'. A token can only be part of one entity, so make sure the entities you're setting don't overlap.

Not only this case, and also the same issue with the person name, something like Karthik vs Karthik reddy, Jon vs Jon Allen Could anyone please help me out to resolve this issue.

Thanks in advance!!

Housekeeping answered 27/8, 2020 at 16:49 Comment(0)
E
5

In spaCy, named entities can never be overlapping. If "Jon Allen" is a name, you shouldn't also annotate "John" as a name. So before training, you'll have to fix these overlapping/conflicting cases.

EDIT after discussion in the comments: You'll want to implement an on_match function to filter out the matches to a non-overlapping set.

Ephraimite answered 28/8, 2020 at 7:36 Comment(5)
Thanks for the response. But in real-time there might be overlapping of words, in that we need to extract the best match. Let's say "Karthik" and "Karthik Reddy" are two different persons. then text might be "how much salary for Karthik?", but it will throw the same error.Housekeeping
I'm not sure what you mean. Whether or not entities are overlapping, is determined within a sentence. You can definitely have "Karthik" annotated as an entity in one sentence, and "Karthik Reddy" in another sentence. But within the same sentence, you can't annotate one specific word twice.Ephraimite
The error will come when we tried to use nlp("How much salary for karthik"), here nlp trained with both the words "karthik" and "karthik Freddy". check the posted example above there you can see I am trying to ask ""unique count of city", but it's throwing error since overlapping of trained enties "count" and "unique count". Please if u can run above code in your system, u can understand better.Housekeeping
You've defined two patterns: one is "count", and another is "unique count". Then you run a dictionary-matching search over a text, which includes the words "unique count". As a result, both your patterns will match and will try to set their label to the word "count". However, "count" can only get one label as a named entity in spaCy. (note that there is no training here, only pattern matching)Ephraimite
Here are more relevant docs on pattern/dictionary-based matching: spacy.io/usage/rule-based-matching And here are more docs on actually training an NER system: spacy.io/usage/training#nerEphraimite

© 2022 - 2024 — McMap. All rights reserved.