Perhaps I've skipped over a part of the docs, but what I am trying to determine is a unique ID for each entity in the standard NER toolset. For example:
import spacy
from spacy import displacy
import en_core_web_sm
nlp = en_core_web_sm.load()
text = "This is a text about Apple Inc based in San Fransisco. "\
"And here is some text about Samsung Corp. "\
"Now, here is some more text about Apple and its products for customers in Norway"
doc = nlp(text)
for ent in doc.ents:
print('ID:{}\t{}\t"{}"\t'.format(ent.label,ent.label_,ent.text,))
displacy.render(doc, jupyter=True, style='ent')
returns:
ID:381 ORG "Apple Inc" ID:382 GPE "San Fransisco" ID:381 ORG "Samsung Corp." ID:381 ORG "Apple" ID:382 GPE "Norway"
I have been looking at ent.ent_id
and ent.ent_id_
but these are inactive according to the docs. I couldn't find anything in ent.root
either.
For example, in GCP NLP each entity is returned with an ⟨entity⟩number that enables you to identify multiple instances of the same entity within a text.
This is a ⟨text⟩2 about ⟨Apple Inc⟩1 based in ⟨San Fransisco⟩4. And here is some ⟨text⟩3 about ⟨Samsung Corp⟩6. Now, here is some more ⟨text⟩8 about ⟨Apple⟩1 and its ⟨products⟩5 for ⟨customers⟩7 in ⟨Norway⟩9"
Does spaCy support something similar? Or is there a way using NLTK or Stanford?