I am trying to understand and learn SyntaxNet. I am trying to figure out whether is there any way to use SyntaxNet for Name Entity Recognition of a corpus. Any sample code or helpful links would be appreciated.
While Syntaxnet does not explicitly offer any Named Entity Recognition functionality, Parsey McParseface does part of speech tagging and produces the output as a Co-NLL table.
Any proper noun is tagged as NNP and I have found that a simple regex identifier like so: <NNP>+
i.e. one or more proper nouns put together, gives a fairly good yield of named entities within a document. It is of course rudimentary and rule-based but effective nonetheless.
In order to pipe the Co-NLL data to an output file from the demo.sh script (located in "/opt/tensorflow/models/syntaxnet/syntaxnet") comment out the section of the code that pipes it to conll2ascii.py so that the script looks like so:
PARSER_EVAL=bazel-bin/syntaxnet/parser_eval
MODEL_DIR=syntaxnet/models/parsey_mcparseface
[[ "$1" == "--conll" ]] && INPUT_FORMAT=stdin-conll || INPUT_FORMAT=stdin
$PARSER_EVAL \
--input=$INPUT_FORMAT \
--output=stdout-conll \
--hidden_layer_sizes=64 \
--arg_prefix=brain_tagger \
--graph_builder=structured \
--task_context=$MODEL_DIR/context.pbtxt \
--model_path=$MODEL_DIR/tagger-params \
--slim_model \
--batch_size=1024 \
--alsologtostderr \
| \
$PARSER_EVAL \
--input=stdin-conll \
--output=sample-param \
--hidden_layer_sizes=512,512 \
--arg_prefix=brain_parser \
--graph_builder=structured \
--task_context=$MODEL_DIR/context.pbtxt \
--model_path=$MODEL_DIR/parser-params \
--slim_model \
--batch_size=1024 \
--alsologtostderr
You will also notice that the output parameter was changed in the above file to sample-param. We will now set this. Make your way to the context.pbtxt file (located in "/opt/tensorflow/models/syntaxnet/syntaxnet/models/parsey_mcparseface") and create an input parameter to point to your output file. It should look something like so:
input {
name: 'sample-param'
record_format: 'conll-sentence'
Part {
file_pattern: "directory/prepoutput.txt"
}
}
Save and close the file and return to "/opt/tensorflow/models/syntaxnet" and run syntaxnet/demo.sh as given in the syntaxnet tutorial. On completion go to the specified output folder and you should have a table in co-nll format. You can then run a simple iterative program that goes over each entry and identifies the pos tags and based on this can try variations of my suggested format for entity recognition.
Hope this helped!
I have used GATE which is able to identify Named Entity Recognition and it does not required parsing NER. Although the part of speech tagger in SyntaxNet can identify noun, noun modifier and etc(which is more powerful tool for specifing different roles of name entities), I am not sure how fast it is going to perform in terms of identifying NERs.
No, I never came across any tool or approach that use/require parsing for Named Entity Recognition (NER).
Although, NER may benefit marginally from features related to the parse tree, it's roundabout way to do it since parsing is very slow compared to general implementations of NER. This is also the reason why even Parts of Speech tags are not used as features in an NER system.
Hope this helps.
© 2022 - 2024 — McMap. All rights reserved.