FastText 0.9.2 - why is recall 'nan'?
Asked Answered
M

1

5

I trained a supervised model in FastText using the Python interface and I'm getting weird results for precision and recall.

First, I trained a model:

model = fasttext.train_supervised("train.txt", wordNgrams=3, epoch=100, pretrainedVectors=pretrained_model)

Then I get results for the test data:

def print_results(N, p, r):
    print("N\t" + str(N))
    print("P@{}\t{:.3f}".format(1, p))
    print("R@{}\t{:.3f}".format(1, r))

print_results(*model.test('test.txt'))

But the results are always odd, because they show precision and recall @1 as identical, even for different datasets, e.g. one output is:

N   46425
P@1 0.917
R@1 0.917

Then when I look for the precision and recall for each label, I always get recall as 'nan':

print(model.test_label('test.txt'))

And the output is:

{'__label__1': {'precision': 0.9202150724134941, 'recall': nan, 'f1score': 1.8404301448269882}, '__label__5': {'precision': 0.9134956983264135, 'recall': nan, 'f1score': 1.826991396652827}}

Does anyone know why this might be happening?

P.S.: To try a reproducible example of this behavior, please refer to https://github.com/facebookresearch/fastText/issues/1072 and run it with FastText 0.9.2

Miche answered 14/5, 2020 at 0:21 Comment(3)
Maybe github.com/facebookresearch/fastText/issues/192 can helpCufic
Thanks for the link. But that seems to be a label issue and I'm pretty sure it's not labels for me, because I'm getting a number for N and a real value for precision, just not recall. For now I'll just write my own functions to calculate PR :/Miche
For the record this is tracked in github.com/facebookresearch/fastText/issues/1072 which has been accidentally closed (FastText doesn't seem to be actively maintained anymore btw)Rutter
R
6

It looks like FastText 0.9.2 has a bug in the computation of recall, and that should be fixed with this commit.

Installing a "bleeding edge" version of FastText e.g. with

pip install git+https://github.com/facebookresearch/fastText.git@b64e359d5485dda4b4b5074494155d18e25c8d13 --quiet

and rerunning your code should allow to get rid of the nan values in the recall computation.

Rutter answered 20/5, 2020 at 21:34 Comment(1)
This fixed the problem! Thanks for investigating.Miche

© 2022 - 2024 — McMap. All rights reserved.