Is it possible to get a confidence score on Spacy Named-entity recognition
Asked Answered
T

3

10

I need to get a confidence score on the predictions done by Spacy NER.

CSV file

Text,Amount & Nature,Percent of Class
"T. Rowe Price Associates, Inc.","28,223,360 (1)",8.7% (1)
100 E. Pratt Street,Not Listed,Not Listed
"Baltimore, MD 21202",Not Listed,Not Listed
"BlackRock, Inc.","21,871,854 (2)",6.8% (2)
55 East 52nd Street,Not Listed,Not Listed
"New York, NY 10022",Not Listed,Not Listed
The Vanguard Group,"21,380,085 (3)",6.64% (3)
100 Vanguard Blvd.,Not Listed,Not Listed
"Malvern, PA 19355",Not Listed,Not Listed
FMR LLC,"20,784,414 (4)",6.459% (4)
245 Summer Street,Not Listed,Not Listed
"Boston, MA 02210",Not Listed,Not Listed

Code

import pandas as pd
import spacy
with open('/path/table.csv') as csvfile:
    reader1 = csv.DictReader(csvfile)
    data1 =[["Text","Amount & Nature","Prediction"]]
    for row in reader1:
        AmountNature = row["Amount & Nature"]
        nlp = spacy.load('en_core_web_sm') 
        doc1 = nlp(row["Text"])

        for ent in doc1.ents:
            #output = [ent.text, ent.start_char, ent.end_char, ent.label_]
            label1 = ent.label_
            text1 = ent.text
        data1.append([str(doc1),AmountNature,label1])
my_df1 = pd.DataFrame(data1)
my_df1.columns = my_df1.iloc[0]
my_df1 = my_df1.drop(my_df1.index[[0]])
my_df1.to_csv('/path/output.csv', index=False, header=["Text","Amount & Nature","Prediction"])

Output CSV

Text,Amount & Nature,Prediction
"T. Rowe Price Associates, Inc.","28,223,360 (1)",ORG
100 E. Pratt Street,Not Listed,FAC
"Baltimore, MD 21202",Not Listed,CARDINAL
"BlackRock, Inc.","21,871,854 (2)",ORG
55 East 52nd Street,Not Listed,LOC
"New York, NY 10022",Not Listed,DATE
The Vanguard Group,"21,380,085 (3)",ORG
100 Vanguard Blvd.,Not Listed,FAC
"Malvern, PA 19355",Not Listed,DATE
FMR LLC,"20,784,414 (4)",ORG
245 Summer Street,Not Listed,CARDINAL
"Boston, MA 02210",Not Listed,GPE

Here on the above output, is it possible to get a Confident Score on the Spacy NER prectiction. If yes, how do I achieve that?

Can someone please help me on this?

Twopence answered 8/1, 2019 at 6:2 Comment(1)
Hi, any progress in finding the confidence scores?Peloria
U
5

No, it is not possible to get a confidence score for your models in Spacy (Unfortunately). As referenced in this issue #881, it is possible to get the scores if get_beam_parses is used eventhough it seems to come with its own set of issues as mentioned in the thread.

While using F1 Scores are good for an overall evaluation, I would prefer if Spacy provided individual confidence scores for its predictions which it does not provide at the moment.

Unerring answered 8/3, 2019 at 16:8 Comment(0)
A
1

Either get a fully annotated dataset or manually annotate it yourself (seeing as you have a CSV file, this might be your preferred option). That way you can distinguish ground truth from what your Spacy predicted. Based off that you can calculate a confusion matrix. I recommend using the F1 score as a measure of confidence.

Here are some great links talking about various publicly available datasets and annotation methods (including CRF).

Auston answered 8/1, 2019 at 7:18 Comment(0)
J
1

I created a package, which does provide a solution based on word embedding similarity based on a few examples.

import spacy
import concise_concepts

data = {
    "ORG": ["Google", "Apple", "Amazon"],
    "GPE": ["Netherlands", "France", "China"],
}

text = """Sony was founded in Japan."""

nlp = spacy.load("en_core_web_lg")
nlp.add_pipe("concise_concepts", config={"data": data, "ent_score": True})
doc = nlp(text)

print([(ent.text, ent.label_, ent._.ent_score) for ent in doc.ents])
# output
#
# [('Sony', 'ORG', 0.63740385), ('Japan', 'GPE', 0.5896993)]
Jemadar answered 1/4, 2022 at 11:7 Comment(1)
I ran your above code. i get this output. [('Sony', 'ORG', 0.1920157), ('Japan', 'GPE', 0.062257014)]Grazia

© 2022 - 2024 — McMap. All rights reserved.