How to get a description for each Spacy NER entity?
Asked Answered
T

3

7

I am using Spacy NER model to extract from a text, some named entities relevant to my problem, such us DATE, TIME, GPE among others.

For example, I need to recognize the Time Zone in the following sentence:

"Australian Central Time"

With Spacy model en_core_web_lg, I got the following result:

doc = nlp("Australian Central Time")
print([(ent.label_, ent.text) for ent in doc.ents])
    
>> [('NORP', 'Australian')]

My problem is: I don't have a clear idea about what exactly means entity NORP and more general what exactly means each Spacy NER entity (leaving aside the intuitive values of course).

I found the following snippet to get the complete entities list, but after that I'm blocked:

import spacy
nlp = spacy.load("en_core_web_lg")
nlp.get_pipe("ner").labels

I'm pretty new to using Spacy NLP and didn't find what I'm looking for on the official documentation, so any help will be appreciated!

BTW, I'm using Spacy version 3.2.1.

Then answered 24/1, 2022 at 15:2 Comment(0)
T
7

Most labels have definitions you can access using spacy.explain(label).

For NORP: "Nationalities or religious or political groups"

For more details you would need to look into the annotation guidelines for the resources listed in the model documentation under https://spacy.io/models/.

Thomasenathomasin answered 24/1, 2022 at 16:1 Comment(3)
Thanks, it is exactly what I was looking for!Then
Is there a list of explanations online? I'd like to avoid installing spacy just to get the explanations.Doorstop
@Doorstop See my answer below. Hope that helps!Mercado
M
6

The whole list is as below. As of February 2023, there are 18 labels in the English model.

PERSON:      People, including fictional.
NORP:        Nationalities or religious or political groups.
FAC:         Buildings, airports, highways, bridges, etc.
ORG:         Companies, agencies, institutions, etc.
GPE:         Countries, cities, states.
LOC:         Non-GPE locations, mountain ranges, bodies of water.
PRODUCT:     Objects, vehicles, foods, etc. (Not services.)
EVENT:       Named hurricanes, battles, wars, sports events, etc.
WORK_OF_ART: Titles of books, songs, etc.
LAW:         Named documents made into laws.
LANGUAGE:    Any named language.
DATE:        Absolute or relative dates or periods.
TIME:        Times smaller than a day.
PERCENT:     Percentage, including ”%“.
MONEY:       Monetary values, including unit.
QUANTITY:    Measurements, as of weight or distance.
ORDINAL:     “first”, “second”, etc.
CARDINAL:    Numerals that do not fall under another type.

Source: Mikael Davidsson on Medium.

Mercado answered 1/2, 2023 at 22:53 Comment(0)
V
2

This will give each label and description:

nlp = spacy.load("en_core_web_trf", disable=["tagger", "parser", "attribute_ruler", "lemmatizer"])
for label in nlp.get_pipe('ner').labels:
    print(f"{label}: {spacy.explain(label)}")

returns:

CARDINAL: Numerals that do not fall under another type
DATE: Absolute or relative dates or periods
EVENT: Named hurricanes, battles, wars, sports events, etc.
FAC: Buildings, airports, highways, bridges, etc.
GPE: Countries, cities, states
LANGUAGE: Any named language
LAW: Named documents made into laws.
LOC: Non-GPE locations, mountain ranges, bodies of water
MONEY: Monetary values, including unit
NORP: Nationalities or religious or political groups
ORDINAL: "first", "second", etc.
ORG: Companies, agencies, institutions, etc.
PERCENT: Percentage, including "%"
PERSON: People, including fictional
PRODUCT: Objects, vehicles, foods, etc. (not services)
QUANTITY: Measurements, as of weight or distance
TIME: Times smaller than a day
WORK_OF_ART: Titles of books, songs, etc.
Vail answered 14/5 at 1:9 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.