How to use syntaxnet output

Asked 17/6, 2016 at 7:29 Answered 1/4, 2017 at 13:34

I started playing with Syntaxnet two days ago and I'm wondering how to use/export the output (ascii tree or conll ) in a format that is easy to parse (ie : Json, XML, python graph).

Thanks for your help !

Iaea answered 17/6, 2016 at 7:29 Comment(0)

Before going to ascii tree(I think you are following demo.sh), the input goes through tagging and parsing. Remove the last step in the command pipeline.

Your modified demo.sh file will look like this :-

PARSER_EVAL=bazel-bin/syntaxnet/parser_eval
MODEL_DIR=syntaxnet/models/parsey_mcparseface
[[ "$1" == "--conll" ]] && INPUT_FORMAT=stdin-conll || INPUT_FORMAT=stdin

$PARSER_EVAL \
  --input=$INPUT_FORMAT \
  --output=stdout-conll \
  --hidden_layer_sizes=64 \
  --arg_prefix=brain_tagger \
  --graph_builder=structured \
  --task_context=$MODEL_DIR/context.pbtxt \
  --model_path=$MODEL_DIR/tagger-params \
  --slim_model \
  --batch_size=1024 \
  --alsologtostderr \
   | \
  $PARSER_EVAL \
  --input=stdin-conll \
  --output=stdout-conll \
  --hidden_layer_sizes=512,512 \
  --arg_prefix=brain_parser \
  --graph_builder=structured \
  --task_context=$MODEL_DIR/context.pbtxt \
  --model_path=$MODEL_DIR/parser-params \
  --slim_model \
  --batch_size=1024 \
  --alsologtostderr \

You can then run:-

$ echo 'Bob brought the pizza to Alice.' | syntaxnet/demo.sh 1>sample.txt 2>dev/null

You result will be stored in sample.txt and it looks like this :-

1   Bob _   NOUN    NNP _   2   nsubj   _   _
2   brought _   VERB    VBD _   0   ROOT    _   _
3   the _   DET DT  _   4   det _   _
4   pizza   _   NOUN    NN  _   2   dobj    _   _
5   to  _   ADP IN  _   2   prep    _   _
6   Alice   _   NOUN    NNP _   5   pobj    _   _
7   .   _   .   .   _   2   punct   _   _

From, here you can easily get information about head of each word, parts of speech and type of node by splitting data with \n

The ascii tree by itself is build by using above.

Oilbird answered 24/6, 2016 at 11:38 Comment(3)

Thanks for your answer. In order to parse the output file, I used the CONLL file reader available in the Python NLTK library. – Iaea 27/6, 2016 at 8:47

Hey does anyone know where I can get an explanation of what each row stand for. I'm guessing there must be some main "categories". It's also strange that there never seems to be anything in the last 2 rows. – Smaragdine 9/8, 2016 at 3:12

@Smaragdine you can find the explanation of the CoNLL format here: ilk.uvt.nl/conll/#dataformat – Immunotherapy 16/9, 2016 at 23:1

I came here looking for a legend for the output parts of speech. It was shared in a deleted answer--which other users may not be able to see.

The parts of speech abbreviations seem to match up with the Penn Parts of Speech Tags for my sentences so far. Quoting that table here in case the page goes down or changes:

CC Coordinating conjunction

CD Cardinal number

DT Determiner

EX Existential there

FW Foreign word

IN Preposition or subordinating conjunction

JJ Adjective

JJR Adjective, comparative

JJS Adjective, superlative

LS List item marker

MD Modal

NN Noun, singular or mass

NNS Noun, plural

NNP Proper noun, singular

NNPS Proper noun, plural

PDT Predeterminer

POS Possessive ending

PRP Personal pronoun

PRP$ Possessive pronoun

RB Adverb

RBR Adverb, comparative

RBS Adverb, superlative

RP Particle

SYM Symbol

TO to

UH Interjection

VB Verb, base form

VBD Verb, past tense

VBG Verb, gerund or present participle

VBN Verb, past participle

VBP Verb, non-3rd person singular present

VBZ Verb, 3rd person singular present

WDT Wh-determiner

WP Wh-pronoun

WP$ Possessive wh-pronoun

WRB Wh-adverb

Ginter answered 2/1, 2017 at 15:11 Comment(0)

I wrote a blog post explaining how to get the output of SyntaxNet for any given language, into Python, specifically into NLTK, and use it's output with Dependency Graph and Tree classes.

You can check it here: http://www.davidsbatista.net/blog/2017/03/25/syntaxnet/

Jacquijacquie answered 1/4, 2017 at 13:34 Comment(0)

Recommended topics

Hot tags