CRF++ allows us to get marginal probabilities for each tag (a kind of confidece measure for each output tag) and a conditional probably for the output (confidence measure for the entire output).
% crf_test -v2 -m model test.data
# 0.478113
Rockwell NNP B B/0.992465 B/0.992465 I/0.00144946 O/0.00608594
International NNP I I/0.979089 B/0.0105273 I/0.979089 O/0.0103833
Corp. NNP I I/0.954883 B/0.00477976 I/0.954883 O/0.040337
's POS B B/0.986396 B/0.986396 I/0.00655976 O/0.00704426
Tulsa NNP I I/0.991966 B/0.00787494 I/0.991966 O/0.00015949
unit NN I I/0.996169 B/0.00283111 I/0.996169 O/0.000999975
..
Tensorflow has its own implementation for crf. After training a crf model, we can get the best tag sequence y
and its unormalized score for each test input sequence x
through tf.contrib.crf.viterbi_decode()
or tf.contrib.crf.crf_decode()
.
However, it's not sufficient for me to get one best sequence. Currently, the top-k best sequences and their corresponding scores are all useful to me. I notice that currently the aforementioned two functions do not provide these informations. Hence, i am wondering is it possible to get the top-k best candidates after minor modifications to tensorflow source code.
- top-k tag sequences and their corresponding unormalize scores.
- marginal probabilities for each tag (as CRF++)