Syntaxnet / Parsey McParseface python API
Asked Answered
G

4

14

I've installed syntaxnet and am able to run the parser with the provided demo script. Ideally, I would like to run it directly from python. The only code I found was this:

import subprocess
import os
os.chdir(r"../models/syntaxnet")
subprocess.call([    
"echo 'Bob brought the pizza to Alice.' | syntaxnet/demo.sh"
], shell = True)

which is a complete disaster - inefficient and over-complex (calling python from python should be done with python).

How can I call the python APIs directly, without going through shell scripts, standard I/O, etc?

EDIT - Why isn't this as easy as opening syntaxnet/demo.sh and reading it?

This shell script calls two python scripts (parser_eval and conll2tree) which are written as python scripts and can't be imported into a python module without causing multiple errors. A closer look yields additional script-like layers and native code. These upper layers need to be refactored in order to run the whole thing in a python context. Hasn't anyone forked syntaxnet with such a modification or intend to do so?

Gangway answered 22/8, 2016 at 9:4 Comment(8)
Hint: Open syntaxnet/demo.sh file and read it.Bosley
@Bosley haven't anyone done that already?Gangway
Did you try to open demo.sh? It's very small shell script. It uses parser_eval and conll2tree. You can just import and call these files with required parameters.Bosley
@Gangway did you find the solution?Undergrowth
@Ngeunpo I have not. What I do for now is sending the script batches of sentences (say ~500) separated by "\n" so that I pay the calling overhead only once per hundreds of sentences. This is still very weak since 1. There's a limit on the size of a shell script so I can't use much larger batches when needed. 2. When processing tens of millions of sentences, even this optimized process can still consume days even on a fairly strong server. 3. The batch optimization complicates and breaks the modularity of the code.Gangway
@Bosley since you brought up a common mistake, I'll add a more detailed explanation of why this is not as easy as opening syntaxnet/demo.shGangway
@Gangway can you please provide a gist of your code? I'm running into the same wall as you here. Also, are you parsing the output?Bonaventura
another problem is that the model/net is loaded into memory each time we do a query, and we want it to reside in memory.Anthocyanin
C
4

All in all it doesn't look like it would be a problem to refactor the two scripts demo.sh runs (https://github.com/tensorflow/models/blob/master/syntaxnet/syntaxnet/parser_eval.py and https://github.com/tensorflow/models/blob/master/syntaxnet/syntaxnet/conll2tree.py) into a Python module that exposes a Python API you can call.

Both scripts use Tensorflow's tf.app.flags API (described here in this SO question: What's the purpose of tf.app.flags in TensorFlow?), so those would have to be refactored out to regular arguments, as tf.app.flags is a process-level singleton.

So yeah, you'd just have to do the work to make these callable as a Python API :)

Centrum answered 2/10, 2016 at 9:8 Comment(2)
do you know for a fact that this wasn't done by someone else and is already available online?Gangway
@zvisofer: No. Googling for "Syntaxnet Python API" finds this GH issue github.com/tensorflow/models/issues/148 ... to me, that looks like they're saying "do it yourself".Centrum
V
4

There is a Rest API here for both syntaxnet and dragnn.

I had run them successfully on my cloud server. Some points I want to share:

  1. build docker

    sudo docker build -< ./Dockerfile

    Some error may occur when build syntaxnet, just follow the ./Dockerfile and build the docker manually, it's easy to follow.

  2. download pre-trained model

    model for syntaxnet is here, eg the Chinese model http://download.tensorflow.org/models/parsey_universal/Chinese.zip

    model for dragnn located here

    unzip them into folders eg ./synataxnet_data, so you have something like ./synataxnet_data/Chinese

  3. run and test

    3.1 Synataxnet

    run 
    
        docker run -p 9000:9000 -v ./synataxnet_data/:/models ljm625/syntaxnet-rest-api
    
    test
    
         curl -X POST -d '{ "strings": [["今天天气很好","猴子爱吃 桃子"]] }' -H "Content-Type: application/json" http://xxx.xxx.xxx.xxx:9000/api/v1/query/Chinese
    

    3.2 dragnn

    run
    
        sudo docker run -p 9001:9000 -v ./dragnn_data:/models ljm625/syntaxnet-rest-api:dragnn
    
    test
    
        http://Yourip:9001/api/v1/use/Chinse
    
        curl -X POST -d '{ "strings": ["今天 天气 很好","猴子 爱  吃 桃子"],"tree":true }' -H "Content-Type: application/json" http://xxx.xx.xx.xx:9001/api/v1/query
    

    4.test results and problems

From my testing with Chinese model, the syntaxnet is slow , it spend 3 seconds to process one query, and 9 seconds for a batch of 50 queries. There is a fixed cost for loading model.

For the dragnn model, it's fast, but I'm not satisfied with the parsing result (only test with Chinese).

PS: I don't like the way synataxnet works, like using bazel and reading data from stdin, if you want to customize it, you could find some info here

Other resource that help https://github.com/dsindex/syntaxnet/blob/master/README_api.md

Vaporetto answered 4/7, 2017 at 10:7 Comment(1)
Does you server loads into memory the neural net every time you query, or does it stay in memory?Anthocyanin
G
2

The best way to integrate SyntaxNet with your own code is to have it as a web service. I did that to parse Portuguese text.

I started by adapting an existing Docker Container with SyntaxNet and Tensorflow serving, to run only for Portuguese, to keep memory low. It runs really fast and it's easy to integrate with your code.

I did a blog post about it, and you can easily adapt it to any other language:

http://davidsbatista.net/blog/2017/07/22/SyntaxNet-API-Portuguese/

Gibbeon answered 5/9, 2017 at 9:55 Comment(0)
G
1

From what I can tell, the currently recommended way to use syntaxnet from python is via DRAGNN.

Gravitate answered 1/5, 2017 at 2:2 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.