I'm trying to work out what's the best model to adapt for an open named entity recognition problem (biology/chemistry, so no dictionary of entities exists but they have to be identified by context).
Currently my best guess is to adapt Syntaxnet so that instead of tagging words as N, V, ADJ etc, it learns to tag as BEGINNING, INSIDE, OUT (IOB notation).
However I am not sure which of these approaches is the best?
- Syntaxnet
- word2vec
- seq2seq (I think this is not the right one as I need it to learn on two aligned sequences, whereas seq2seq is designed for sequences of differing lengths as in translation)
Would be grateful for a pointer to the right method! thanks!