I started playing with Syntaxnet two days ago and I'm wondering how to use/export the output (ascii tree or conll ) in a format that is easy to parse (ie : Json, XML, python graph).
Thanks for your help !
I started playing with Syntaxnet two days ago and I'm wondering how to use/export the output (ascii tree or conll ) in a format that is easy to parse (ie : Json, XML, python graph).
Thanks for your help !
Before going to ascii tree(I think you are following demo.sh), the input goes through tagging and parsing. Remove the last step in the command pipeline.
Your modified demo.sh file will look like this :-
PARSER_EVAL=bazel-bin/syntaxnet/parser_eval
MODEL_DIR=syntaxnet/models/parsey_mcparseface
[[ "$1" == "--conll" ]] && INPUT_FORMAT=stdin-conll || INPUT_FORMAT=stdin
$PARSER_EVAL \
--input=$INPUT_FORMAT \
--output=stdout-conll \
--hidden_layer_sizes=64 \
--arg_prefix=brain_tagger \
--graph_builder=structured \
--task_context=$MODEL_DIR/context.pbtxt \
--model_path=$MODEL_DIR/tagger-params \
--slim_model \
--batch_size=1024 \
--alsologtostderr \
| \
$PARSER_EVAL \
--input=stdin-conll \
--output=stdout-conll \
--hidden_layer_sizes=512,512 \
--arg_prefix=brain_parser \
--graph_builder=structured \
--task_context=$MODEL_DIR/context.pbtxt \
--model_path=$MODEL_DIR/parser-params \
--slim_model \
--batch_size=1024 \
--alsologtostderr \
You can then run:-
$ echo 'Bob brought the pizza to Alice.' | syntaxnet/demo.sh 1>sample.txt 2>dev/null
You result will be stored in sample.txt and it looks like this :-
1 Bob _ NOUN NNP _ 2 nsubj _ _
2 brought _ VERB VBD _ 0 ROOT _ _
3 the _ DET DT _ 4 det _ _
4 pizza _ NOUN NN _ 2 dobj _ _
5 to _ ADP IN _ 2 prep _ _
6 Alice _ NOUN NNP _ 5 pobj _ _
7 . _ . . _ 2 punct _ _
From, here you can easily get information about head of each word, parts of speech and type of node by splitting data with \n
The ascii tree by itself is build by using above.
I came here looking for a legend for the output parts of speech. It was shared in a deleted answer--which other users may not be able to see.
The parts of speech abbreviations seem to match up with the Penn Parts of Speech Tags for my sentences so far. Quoting that table here in case the page goes down or changes:
- CC Coordinating conjunction
- CD Cardinal number
- DT Determiner
- EX Existential there
- FW Foreign word
- IN Preposition or subordinating conjunction
- JJ Adjective
- JJR Adjective, comparative
- JJS Adjective, superlative
- LS List item marker
- MD Modal
- NN Noun, singular or mass
- NNS Noun, plural
- NNP Proper noun, singular
- NNPS Proper noun, plural
- PDT Predeterminer
- POS Possessive ending
- PRP Personal pronoun
- PRP$ Possessive pronoun
- RB Adverb
- RBR Adverb, comparative
- RBS Adverb, superlative
- RP Particle
- SYM Symbol
- TO to
- UH Interjection
- VB Verb, base form
- VBD Verb, past tense
- VBG Verb, gerund or present participle
- VBN Verb, past participle
- VBP Verb, non-3rd person singular present
- VBZ Verb, 3rd person singular present
- WDT Wh-determiner
- WP Wh-pronoun
- WP$ Possessive wh-pronoun
- WRB Wh-adverb
I wrote a blog post explaining how to get the output of SyntaxNet for any given language, into Python, specifically into NLTK, and use it's output with Dependency Graph and Tree classes.
You can check it here: http://www.davidsbatista.net/blog/2017/03/25/syntaxnet/
© 2022 - 2024 — McMap. All rights reserved.