Syntaxnet / Parsey McParseface python API

Asked 22/8, 2016 at 9:4 Answered 5/9, 2017 at 9:55

I've installed syntaxnet and am able to run the parser with the provided demo script. Ideally, I would like to run it directly from python. The only code I found was this:

import subprocess
import os
os.chdir(r"../models/syntaxnet")
subprocess.call([    
"echo 'Bob brought the pizza to Alice.' | syntaxnet/demo.sh"
], shell = True)

which is a complete disaster - inefficient and over-complex (calling python from python should be done with python).

How can I call the python APIs directly, without going through shell scripts, standard I/O, etc?

EDIT - Why isn't this as easy as opening syntaxnet/demo.sh and reading it?

This shell script calls two python scripts (parser_eval and conll2tree) which are written as python scripts and can't be imported into a python module without causing multiple errors. A closer look yields additional script-like layers and native code. These upper layers need to be refactored in order to run the whole thing in a python context. Hasn't anyone forked syntaxnet with such a modification or intend to do so?

Gangway answered 22/8, 2016 at 9:4 Comment(8)

Hint: Open syntaxnet/demo.sh file and read it. – Bosley 22/8, 2016 at 13:12

@Bosley haven't anyone done that already? – Gangway 22/8, 2016 at 14:24

Did you try to open demo.sh? It's very small shell script. It uses parser_eval and conll2tree. You can just import and call these files with required parameters. – Bosley 22/8, 2016 at 20:18

@Gangway did you find the solution? – Undergrowth 14/9, 2016 at 21:27

@Ngeunpo I have not. What I do for now is sending the script batches of sentences (say ~500) separated by "\n" so that I pay the calling overhead only once per hundreds of sentences. This is still very weak since 1. There's a limit on the size of a shell script so I can't use much larger batches when needed. 2. When processing tens of millions of sentences, even this optimized process can still consume days even on a fairly strong server. 3. The batch optimization complicates and breaks the modularity of the code. – Gangway 2/10, 2016 at 7:28

@Bosley since you brought up a common mistake, I'll add a more detailed explanation of why this is not as easy as opening syntaxnet/demo.sh – Gangway 2/10, 2016 at 7:39

@Gangway can you please provide a gist of your code? I'm running into the same wall as you here. Also, are you parsing the output? – Bonaventura 6/10, 2016 at 16:55

another problem is that the model/net is loaded into memory each time we do a query, and we want it to reside in memory. – Anthocyanin 5/4, 2019 at 9:37

All in all it doesn't look like it would be a problem to refactor the two scripts demo.sh runs (https://github.com/tensorflow/models/blob/master/syntaxnet/syntaxnet/parser_eval.py and https://github.com/tensorflow/models/blob/master/syntaxnet/syntaxnet/conll2tree.py) into a Python module that exposes a Python API you can call.

Both scripts use Tensorflow's tf.app.flags API (described here in this SO question: What's the purpose of tf.app.flags in TensorFlow?), so those would have to be refactored out to regular arguments, as tf.app.flags is a process-level singleton.

So yeah, you'd just have to do the work to make these callable as a Python API :)

Centrum answered 2/10, 2016 at 9:8 Comment(2)

do you know for a fact that this wasn't done by someone else and is already available online? – Gangway 2/10, 2016 at 9:19

@zvisofer: No. Googling for "Syntaxnet Python API" finds this GH issue github.com/tensorflow/models/issues/148 ... to me, that looks like they're saying "do it yourself". – Centrum 2/10, 2016 at 9:36

There is a Rest API here for both syntaxnet and dragnn.

I had run them successfully on my cloud server. Some points I want to share:

build docker

sudo docker build -< ./Dockerfile

Some error may occur when build syntaxnet, just follow the ./Dockerfile and build the docker manually, it's easy to follow.
download pre-trained model

model for syntaxnet is here, eg the Chinese model http://download.tensorflow.org/models/parsey_universal/Chinese.zip

model for dragnn located here

unzip them into folders eg ./synataxnet_data, so you have something like ./synataxnet_data/Chinese

run and test

3.1 Synataxnet

run 

    docker run -p 9000:9000 -v ./synataxnet_data/:/models ljm625/syntaxnet-rest-api

test

     curl -X POST -d '{ "strings": [["今天天气很好","猴子爱吃 桃子"]] }' -H "Content-Type: application/json" http://xxx.xxx.xxx.xxx:9000/api/v1/query/Chinese

3.2 dragnn

run

    sudo docker run -p 9001:9000 -v ./dragnn_data:/models ljm625/syntaxnet-rest-api:dragnn

test

    http://Yourip:9001/api/v1/use/Chinse

    curl -X POST -d '{ "strings": ["今天 天气 很好","猴子 爱  吃 桃子"],"tree":true }' -H "Content-Type: application/json" http://xxx.xx.xx.xx:9001/api/v1/query

4.test results and problems

From my testing with Chinese model, the syntaxnet is slow , it spend 3 seconds to process one query, and 9 seconds for a batch of 50 queries. There is a fixed cost for loading model.

For the dragnn model, it's fast, but I'm not satisfied with the parsing result (only test with Chinese).

PS: I don't like the way synataxnet works, like using bazel and reading data from stdin, if you want to customize it, you could find some info here

Other resource that help https://github.com/dsindex/syntaxnet/blob/master/README_api.md

Vaporetto answered 4/7, 2017 at 10:7 Comment(1)

Does you server loads into memory the neural net every time you query, or does it stay in memory? – Anthocyanin 5/4, 2019 at 9:38

The best way to integrate SyntaxNet with your own code is to have it as a web service. I did that to parse Portuguese text.

I started by adapting an existing Docker Container with SyntaxNet and Tensorflow serving, to run only for Portuguese, to keep memory low. It runs really fast and it's easy to integrate with your code.

I did a blog post about it, and you can easily adapt it to any other language:

http://davidsbatista.net/blog/2017/07/22/SyntaxNet-API-Portuguese/

Gibbeon answered 5/9, 2017 at 9:55 Comment(0)

From what I can tell, the currently recommended way to use syntaxnet from python is via DRAGNN.

Gravitate answered 1/5, 2017 at 2:2 Comment(0)

Recommended topics

Hot tags