How to use syntaxnet output
Asked Answered
I

3

12

I started playing with Syntaxnet two days ago and I'm wondering how to use/export the output (ascii tree or conll ) in a format that is easy to parse (ie : Json, XML, python graph).

Thanks for your help !

Iaea answered 17/6, 2016 at 7:29 Comment(0)
O
9

Before going to ascii tree(I think you are following demo.sh), the input goes through tagging and parsing. Remove the last step in the command pipeline.

Your modified demo.sh file will look like this :-

PARSER_EVAL=bazel-bin/syntaxnet/parser_eval
MODEL_DIR=syntaxnet/models/parsey_mcparseface
[[ "$1" == "--conll" ]] && INPUT_FORMAT=stdin-conll || INPUT_FORMAT=stdin

$PARSER_EVAL \
  --input=$INPUT_FORMAT \
  --output=stdout-conll \
  --hidden_layer_sizes=64 \
  --arg_prefix=brain_tagger \
  --graph_builder=structured \
  --task_context=$MODEL_DIR/context.pbtxt \
  --model_path=$MODEL_DIR/tagger-params \
  --slim_model \
  --batch_size=1024 \
  --alsologtostderr \
   | \
  $PARSER_EVAL \
  --input=stdin-conll \
  --output=stdout-conll \
  --hidden_layer_sizes=512,512 \
  --arg_prefix=brain_parser \
  --graph_builder=structured \
  --task_context=$MODEL_DIR/context.pbtxt \
  --model_path=$MODEL_DIR/parser-params \
  --slim_model \
  --batch_size=1024 \
  --alsologtostderr \

You can then run:-

$ echo 'Bob brought the pizza to Alice.' | syntaxnet/demo.sh 1>sample.txt 2>dev/null

You result will be stored in sample.txt and it looks like this :-

1   Bob _   NOUN    NNP _   2   nsubj   _   _
2   brought _   VERB    VBD _   0   ROOT    _   _
3   the _   DET DT  _   4   det _   _
4   pizza   _   NOUN    NN  _   2   dobj    _   _
5   to  _   ADP IN  _   2   prep    _   _
6   Alice   _   NOUN    NNP _   5   pobj    _   _
7   .   _   .   .   _   2   punct   _   _

From, here you can easily get information about head of each word, parts of speech and type of node by splitting data with \n

The ascii tree by itself is build by using above.

Oilbird answered 24/6, 2016 at 11:38 Comment(3)
Thanks for your answer. In order to parse the output file, I used the CONLL file reader available in the Python NLTK library.Iaea
Hey does anyone know where I can get an explanation of what each row stand for. I'm guessing there must be some main "categories". It's also strange that there never seems to be anything in the last 2 rows.Smaragdine
@Smaragdine you can find the explanation of the CoNLL format here: ilk.uvt.nl/conll/#dataformatImmunotherapy
G
5

I came here looking for a legend for the output parts of speech. It was shared in a deleted answer--which other users may not be able to see.

The parts of speech abbreviations seem to match up with the Penn Parts of Speech Tags for my sentences so far. Quoting that table here in case the page goes down or changes:

  1. CC Coordinating conjunction
  2. CD Cardinal number
  3. DT Determiner
  4. EX Existential there
  5. FW Foreign word
  6. IN Preposition or subordinating conjunction
  7. JJ Adjective
  8. JJR Adjective, comparative
  9. JJS Adjective, superlative
  10. LS List item marker
  11. MD Modal
  12. NN Noun, singular or mass
  13. NNS Noun, plural
  14. NNP Proper noun, singular
  15. NNPS Proper noun, plural
  16. PDT Predeterminer
  17. POS Possessive ending
  18. PRP Personal pronoun
  19. PRP$ Possessive pronoun
  20. RB Adverb
  21. RBR Adverb, comparative
  22. RBS Adverb, superlative
  23. RP Particle
  24. SYM Symbol
  25. TO to
  26. UH Interjection
  27. VB Verb, base form
  28. VBD Verb, past tense
  29. VBG Verb, gerund or present participle
  30. VBN Verb, past participle
  31. VBP Verb, non-3rd person singular present
  32. VBZ Verb, 3rd person singular present
  33. WDT Wh-determiner
  34. WP Wh-pronoun
  35. WP$ Possessive wh-pronoun
  36. WRB Wh-adverb
Ginter answered 2/1, 2017 at 15:11 Comment(0)
J
5

I wrote a blog post explaining how to get the output of SyntaxNet for any given language, into Python, specifically into NLTK, and use it's output with Dependency Graph and Tree classes.

You can check it here: http://www.davidsbatista.net/blog/2017/03/25/syntaxnet/

Jacquijacquie answered 1/4, 2017 at 13:34 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.