I'm evaluating a custom NER model that I built using Spacy. I'm evaluating the training sets using Spacy's Scorer class.
def Eval(examples):
# test the saved model
print("Loading from", './model6/')
ner_model = spacy.load('./model6/')
scorer = Scorer()
try:
for input_, annot in examples:
doc_gold_text = ner_model.make_doc(input_)
gold = GoldParse(doc_gold_text, entities=annot['entities'])
pred_value = ner_model(input_)
scorer.score(pred_value, gold)
except Exception as e: print(e)
print(scorer.scores)
It works fine but I don't understand the output. Here's what I get for each training set.
{'uas': 0.0, 'las': 0.0, 'ents_p': 90.14084507042254, 'ents_r': 92.7536231884058, 'ents_f': 91.42857142857143, 'tags_acc': 0.0, 'token_acc': 100.0}
{'uas': 0.0, 'las': 0.0, 'ents_p': 91.12227805695142, 'ents_r': 93.47079037800687, 'ents_f': 92.28159457167091, 'tags_acc': 0.0, 'token_acc': 100.0}
{'uas': 0.0, 'las': 0.0, 'ents_p': 92.45614035087719, 'ents_r': 92.9453262786596, 'ents_f': 92.70008795074759, 'tags_acc': 0.0, 'token_acc': 100.0}
{'uas': 0.0, 'las': 0.0, 'ents_p': 94.5993031358885, 'ents_r': 94.93006993006993, 'ents_f': 94.76439790575917, 'tags_acc': 0.0, 'token_acc': 100.0}
{'uas': 0.0, 'las': 0.0, 'ents_p': 92.07920792079209, 'ents_r': 93.15525876460768, 'ents_f': 92.61410788381743, 'tags_acc': 0.0, 'token_acc': 100.0}
Does anyone know what the keys are? I've looked over Spacy's documentation and could not find anything.
Thanks!
ents_p
,ents_r
andents_f
are calculated based on per entity basis. That is to say spaCy considers all entities in your document(s) to find true positive, false positive and false negative. I had an initial impressive that so long an predicted entity in a sentence matches the gold set, it would +1 to the true positive count but I was wrong. For those interested to dig in, do look intolanguage.py
,scorer.py
andevaluate.py
to run through the calculations. – Lyns