Understanding Spacy's Scorer Output
Asked Answered
W

1

18

I'm evaluating a custom NER model that I built using Spacy. I'm evaluating the training sets using Spacy's Scorer class.

    def Eval(examples):
    # test the saved model
    print("Loading from", './model6/')
    ner_model = spacy.load('./model6/')

    scorer = Scorer()
    try:
        for input_, annot in examples:
            doc_gold_text = ner_model.make_doc(input_)
            gold = GoldParse(doc_gold_text, entities=annot['entities'])
            pred_value = ner_model(input_)
            scorer.score(pred_value, gold)
    except Exception as e: print(e)

    print(scorer.scores)

It works fine but I don't understand the output. Here's what I get for each training set.

{'uas': 0.0, 'las': 0.0, 'ents_p': 90.14084507042254, 'ents_r': 92.7536231884058, 'ents_f': 91.42857142857143, 'tags_acc': 0.0, 'token_acc': 100.0}

{'uas': 0.0, 'las': 0.0, 'ents_p': 91.12227805695142, 'ents_r': 93.47079037800687, 'ents_f': 92.28159457167091, 'tags_acc': 0.0, 'token_acc': 100.0}

{'uas': 0.0, 'las': 0.0, 'ents_p': 92.45614035087719, 'ents_r': 92.9453262786596, 'ents_f': 92.70008795074759, 'tags_acc': 0.0, 'token_acc': 100.0}

{'uas': 0.0, 'las': 0.0, 'ents_p': 94.5993031358885, 'ents_r': 94.93006993006993, 'ents_f': 94.76439790575917, 'tags_acc': 0.0, 'token_acc': 100.0}

{'uas': 0.0, 'las': 0.0, 'ents_p': 92.07920792079209, 'ents_r': 93.15525876460768, 'ents_f': 92.61410788381743, 'tags_acc': 0.0, 'token_acc': 100.0}

Does anyone know what the keys are? I've looked over Spacy's documentation and could not find anything.

Thanks!

Webfoot answered 1/6, 2018 at 13:41 Comment(0)
I
24
  • UAS (Unlabelled Attachment Score) and LAS (Labelled Attachment Score) are standard metrics to evaluate dependency parsing. UAS is the proportion of tokens whose head has been correctly assigned, LAS is the proportion of tokens whose head has been correctly assigned with the right dependency label (subject, object, etc).
  • ents_p, ents_r, ents_f are the precision, recall and fscore for the NER task.
  • tags_acc is the POS tagging accuracy.
  • token_acc seems to be the precision for token segmentation.
Ingoing answered 1/6, 2018 at 14:2 Comment(1)
To add on, ents_p, ents_r and ents_f are calculated based on per entity basis. That is to say spaCy considers all entities in your document(s) to find true positive, false positive and false negative. I had an initial impressive that so long an predicted entity in a sentence matches the gold set, it would +1 to the true positive count but I was wrong. For those interested to dig in, do look into language.py, scorer.py and evaluate.py to run through the calculations.Lyns

© 2022 - 2024 — McMap. All rights reserved.