How to use Stanford Parser in NLTK using Python
Asked Answered
S

18

99

Is it possible to use Stanford Parser in NLTK? (I am not talking about Stanford POS.)

Southwestwardly answered 14/12, 2012 at 17:12 Comment(4)
See also: gist.github.com/alvations/e1df0ba227e542955a8aCharter
This link needs to be more visible. Maybe the top answer should be edited to mention this?Lemcke
Just a side note here guys. Make sure your Java is up-to-date for Stanford NLP and JAVA_HOME is set up properly. Sometimes folks might get "weird" errors which might be due to this.Quadrangular
For NLTK v3.3, see https://mcmap.net/q/215937/-how-to-use-stanford-parser-in-nltk-using-pythonCharter
T
96

Note that this answer applies to NLTK v 3.0, and not to more recent versions.

Sure, try the following in Python:

import os
from nltk.parse import stanford
os.environ['STANFORD_PARSER'] = '/path/to/standford/jars'
os.environ['STANFORD_MODELS'] = '/path/to/standford/jars'

parser = stanford.StanfordParser(model_path="/location/of/the/englishPCFG.ser.gz")
sentences = parser.raw_parse_sents(("Hello, My name is Melroy.", "What is your name?"))
print sentences

# GUI
for line in sentences:
    for sentence in line:
        sentence.draw()

Output:

[Tree('ROOT', [Tree('S', [Tree('INTJ', [Tree('UH', ['Hello'])]), Tree(',', [',']), Tree('NP', [Tree('PRP$', ['My']), Tree('NN', ['name'])]), Tree('VP', [Tree('VBZ', ['is']), Tree('ADJP', [Tree('JJ', ['Melroy'])])]), Tree('.', ['.'])])]), Tree('ROOT', [Tree('SBARQ', [Tree('WHNP', [Tree('WP', ['What'])]), Tree('SQ', [Tree('VBZ', ['is']), Tree('NP', [Tree('PRP$', ['your']), Tree('NN', ['name'])])]), Tree('.', ['?'])])])]

Note 1: In this example both the parser & model jars are in the same folder.

Note 2:

  • File name of stanford parser is: stanford-parser.jar
  • File name of stanford models is: stanford-parser-x.x.x-models.jar

Note 3: The englishPCFG.ser.gz file can be found inside the models.jar file (/edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz). Please use come archive manager to 'unzip' the models.jar file.

Note 4: Be sure you are using Java JRE (Runtime Environment) 1.8 also known as Oracle JDK 8. Otherwise you will get: Unsupported major.minor version 52.0.

Installation

  1. Download NLTK v3 from: https://github.com/nltk/nltk. And install NLTK:

    sudo python setup.py install

  2. You can use the NLTK downloader to get Stanford Parser, using Python:

    import nltk
    nltk.download()
    
  3. Try my example! (don't forget the change the jar paths and change the model path to the ser.gz location)

OR:

  1. Download and install NLTK v3, same as above.

  2. Download the latest version from (current version filename is stanford-parser-full-2015-01-29.zip): http://nlp.stanford.edu/software/lex-parser.shtml#Download

  3. Extract the standford-parser-full-20xx-xx-xx.zip.

  4. Create a new folder ('jars' in my example). Place the extracted files into this jar folder: stanford-parser-3.x.x-models.jar and stanford-parser.jar.

    As shown above you can use the environment variables (STANFORD_PARSER & STANFORD_MODELS) to point to this 'jars' folder. I'm using Linux, so if you use Windows please use something like: C://folder//jars.

  5. Open the stanford-parser-3.x.x-models.jar using an Archive manager (7zip).

  6. Browse inside the jar file; edu/stanford/nlp/models/lexparser. Again, extract the file called 'englishPCFG.ser.gz'. Remember the location where you extract this ser.gz file.

  7. When creating a StanfordParser instance, you can provide the model path as parameter. This is the complete path to the model, in our case /location/of/englishPCFG.ser.gz.

  8. Try my example! (don't forget the change the jar paths and change the model path to the ser.gz location)

Thinnish answered 8/3, 2014 at 13:3 Comment(15)
It's giving an error to me. import name stanford not found.Gabble
Which version of the nltk added nltk.parse.stanford? I only have nltk.tag.stanford in NLTK 2.0.4.Spit
AttributeError: 'StanfordParser' object has no attribute 'raw_batch_parse'Fiery
I can't find the module nltk.parse.stanford either.Commonly
@alexis: download nltk 3.0 from here @Nick Retallack: it should be changed to raw_parse_sents()Bema
It gives me the following: "Error: Could not find or load main class edu.stanford.nlp.parser.lexparser.LexicalizedParser"Parting
I added a more explaining how-to. See the updated version above. You don't need to go to the github site. And I still can use the raw_batch_parse, which allows you to parse multiple sentences in one call.Thinnish
@danger89 where would you suppose to find the draw method? I'm using the latest version of NLTK but it doesn't seem to have this one implemented. Do you know of any alternatives?Multinational
Ok, you are right. NLTK changes the function to: raw_parse_sents(). See Documentation: nltk.org/_modules/nltk/parse/stanford.html If you using the raw_parse() you'll retrieve an iter(Tree) as return value. Meaning the above sample of draw() should work. If you using the raw_parse_sents(), you need a double loop apparently ; it's returning an iter(iter(Tree)). So code example: for line in sentences: for sentence in line: sentence.draw() You can only execute draw() on a Tree object ;)Thinnish
@danger89, sorry for overwriting your answer with the EDITED note. Recently people have been complaining about the Stanford Dependency parser is only recently added since NLTK v3.1 and i think they were duplicating some snippets of code here and there from the deprecated answers here. So to minimize confusion, i thought it's best to add disclaimers to all the answers here with regards to following the instructions from NLTK official 3rd party tools documentation.Charter
Maybe I should update the answer accordantly, it now uses environment variables?Thinnish
Yes, it's using environment variables but they're different. For the parser, it needs STANFORDTOOLSDIR to be in CLASSPATH for the parser jarfile and the parser_model jarfile, e.g. export CLASSPATH=$STANFORDTOOLSDIR/stanford-parser-full-2015-04-20/stanford-parser.jar:$STANFORDTOOLSDIR/stanford-parser-full-2015-04-20/stanford-parser-3.5.2-models.jarCharter
This is the answer which actually worked! I dont know why its not the accepted Answer Thanks @danger89Medic
Hi thanks maq, I also don't know why this is not the accepted answer ><Thinnish
if stanford.py is not able to pick englishPCFG.ser.gz even after providing model_path, then try passing the argument path_to_models_jar instead of model_path with absolute path to englishPCFG.ser.gz i.e. parser = StanfordParser(path_to_models_jar="/path/to/englishPCFG.ser.gz")Navigable
C
78

Deprecated Answer

The answer below is deprecated, please use the solution on https://mcmap.net/q/215937/-how-to-use-stanford-parser-in-nltk-using-python for NLTK v3.3 and above.


EDITED

Note: The following answer will only work on:

  • NLTK version >=3.2.4
  • Stanford Tools compiled since 2015-04-20
  • Python 2.7, 3.4 and 3.5 (Python 3.6 is not yet officially supported)

As both tools changes rather quickly and the API might look very different 3-6 months later. Please treat the following answer as temporal and not an eternal fix.

Always refer to https://github.com/nltk/nltk/wiki/Installing-Third-Party-Software for the latest instruction on how to interface Stanford NLP tools using NLTK!!


TL;DR

cd $HOME

# Update / Install NLTK
pip install -U nltk

# Download the Stanford NLP tools
wget http://nlp.stanford.edu/software/stanford-ner-2015-04-20.zip
wget http://nlp.stanford.edu/software/stanford-postagger-full-2015-04-20.zip
wget http://nlp.stanford.edu/software/stanford-parser-full-2015-04-20.zip
# Extract the zip file.
unzip stanford-ner-2015-04-20.zip 
unzip stanford-parser-full-2015-04-20.zip 
unzip stanford-postagger-full-2015-04-20.zip


export STANFORDTOOLSDIR=$HOME

export CLASSPATH=$STANFORDTOOLSDIR/stanford-postagger-full-2015-04-20/stanford-postagger.jar:$STANFORDTOOLSDIR/stanford-ner-2015-04-20/stanford-ner.jar:$STANFORDTOOLSDIR/stanford-parser-full-2015-04-20/stanford-parser.jar:$STANFORDTOOLSDIR/stanford-parser-full-2015-04-20/stanford-parser-3.5.2-models.jar

export STANFORD_MODELS=$STANFORDTOOLSDIR/stanford-postagger-full-2015-04-20/models:$STANFORDTOOLSDIR/stanford-ner-2015-04-20/classifiers

Then:

>>> from nltk.tag.stanford import StanfordPOSTagger
>>> st = StanfordPOSTagger('english-bidirectional-distsim.tagger')
>>> st.tag('What is the airspeed of an unladen swallow ?'.split())
[(u'What', u'WP'), (u'is', u'VBZ'), (u'the', u'DT'), (u'airspeed', u'NN'), (u'of', u'IN'), (u'an', u'DT'), (u'unladen', u'JJ'), (u'swallow', u'VB'), (u'?', u'.')]

>>> from nltk.tag import StanfordNERTagger
>>> st = StanfordNERTagger('english.all.3class.distsim.crf.ser.gz') 
>>> st.tag('Rami Eid is studying at Stony Brook University in NY'.split())
[(u'Rami', u'PERSON'), (u'Eid', u'PERSON'), (u'is', u'O'), (u'studying', u'O'), (u'at', u'O'), (u'Stony', u'ORGANIZATION'), (u'Brook', u'ORGANIZATION'), (u'University', u'ORGANIZATION'), (u'in', u'O'), (u'NY', u'O')]


>>> from nltk.parse.stanford import StanfordParser
>>> parser=StanfordParser(model_path="edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz")
>>> list(parser.raw_parse("the quick brown fox jumps over the lazy dog"))
[Tree('ROOT', [Tree('NP', [Tree('NP', [Tree('DT', ['the']), Tree('JJ', ['quick']), Tree('JJ', ['brown']), Tree('NN', ['fox'])]), Tree('NP', [Tree('NP', [Tree('NNS', ['jumps'])]), Tree('PP', [Tree('IN', ['over']), Tree('NP', [Tree('DT', ['the']), Tree('JJ', ['lazy']), Tree('NN', ['dog'])])])])])])]

>>> from nltk.parse.stanford import StanfordDependencyParser
>>> dep_parser=StanfordDependencyParser(model_path="edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz")
>>> print [parse.tree() for parse in dep_parser.raw_parse("The quick brown fox jumps over the lazy dog.")]
[Tree('jumps', [Tree('fox', ['The', 'quick', 'brown']), Tree('dog', ['over', 'the', 'lazy'])])]

In Long:


Firstly, one must note that the Stanford NLP tools are written in Java and NLTK is written in Python. The way NLTK is interfacing the tool is through the call the Java tool through the command line interface.

Secondly, the NLTK API to the Stanford NLP tools have changed quite a lot since the version 3.1. So it is advisable to update your NLTK package to v3.1.

Thirdly, the NLTK API to Stanford NLP Tools wraps around the individual NLP tools, e.g. Stanford POS tagger, Stanford NER Tagger, Stanford Parser.

For the POS and NER tagger, it DOES NOT wrap around the Stanford Core NLP package.

For the Stanford Parser, it's a special case where it wraps around both the Stanford Parser and the Stanford Core NLP (personally, I have not used the latter using NLTK, i would rather follow @dimazest's demonstration on http://www.eecs.qmul.ac.uk/~dm303/stanford-dependency-parser-nltk-and-anaconda.html )

Note that as of NLTK v3.1, the STANFORD_JAR and STANFORD_PARSER variables is deprecated and NO LONGER used


In Longer:


STEP 1

Assuming that you have installed Java appropriately on your OS.

Now, install/update your NLTK version (see http://www.nltk.org/install.html):

  • Using pip: sudo pip install -U nltk
  • Debian distro (using apt-get): sudo apt-get install python-nltk

For Windows (Use the 32-bit binary installation):

  1. Install Python 3.4: http://www.python.org/downloads/ (avoid the 64-bit versions)
  2. Install Numpy (optional): http://sourceforge.net/projects/numpy/files/NumPy/ (the version that specifies pythnon3.4)
  3. Install NLTK: http://pypi.python.org/pypi/nltk
  4. Test installation: Start>Python34, then type import nltk

(Why not 64 bit? See https://github.com/nltk/nltk/issues/1079)


Then out of paranoia, recheck your nltk version inside python:

from __future__ import print_function
import nltk
print(nltk.__version__)

Or on the command line:

python3 -c "import nltk; print(nltk.__version__)"

Make sure that you see 3.1 as the output.

For even more paranoia, check that all your favorite Stanford NLP tools API are available:

from nltk.parse.stanford import StanfordParser
from nltk.parse.stanford import StanfordDependencyParser
from nltk.parse.stanford import StanfordNeuralDependencyParser
from nltk.tag.stanford import StanfordPOSTagger, StanfordNERTagger
from nltk.tokenize.stanford import StanfordTokenizer

(Note: The imports above will ONLY ensure that you are using a correct NLTK version that contains these APIs. Not seeing errors in the import doesn't mean that you have successfully configured the NLTK API to use the Stanford Tools)


STEP 2

Now that you have checked that you have the correct version of NLTK that contains the necessary Stanford NLP tools interface. You need to download and extract all the necessary Stanford NLP tools.

TL;DR, in Unix:

cd $HOME

# Download the Stanford NLP tools
wget http://nlp.stanford.edu/software/stanford-ner-2015-04-20.zip
wget http://nlp.stanford.edu/software/stanford-postagger-full-2015-04-20.zip
wget http://nlp.stanford.edu/software/stanford-parser-full-2015-04-20.zip
# Extract the zip file.
unzip stanford-ner-2015-04-20.zip 
unzip stanford-parser-full-2015-04-20.zip 
unzip stanford-postagger-full-2015-04-20.zip

In Windows / Mac:


STEP 3

Setup the environment variables such that NLTK can find the relevant file path automatically. You have to set the following variables:

  • Add the appropriate Stanford NLP .jar file to the CLASSPATH environment variable.

    • e.g. for the NER, it will be stanford-ner-2015-04-20/stanford-ner.jar
    • e.g. for the POS, it will be stanford-postagger-full-2015-04-20/stanford-postagger.jar
    • e.g. for the parser, it will be stanford-parser-full-2015-04-20/stanford-parser.jar and the parser model jar file, stanford-parser-full-2015-04-20/stanford-parser-3.5.2-models.jar
  • Add the appropriate model directory to the STANFORD_MODELS variable (i.e. the directory where you can find where the pre-trained models are saved)

    • e.g. for the NER, it will be in stanford-ner-2015-04-20/classifiers/
    • e.g. for the POS, it will be in stanford-postagger-full-2015-04-20/models/
    • e.g. for the Parser, there won't be a model directory.

In the code, see that it searches for the STANFORD_MODELS directory before appending the model name. Also see that, the API also automatically tries to search the OS environments for the `CLASSPATH)

Note that as of NLTK v3.1, the STANFORD_JAR variables is deprecated and NO LONGER used. Code snippets found in the following Stackoverflow questions might not work:

TL;DR for STEP 3 on Ubuntu

export STANFORDTOOLSDIR=/home/path/to/stanford/tools/

export CLASSPATH=$STANFORDTOOLSDIR/stanford-postagger-full-2015-04-20/stanford-postagger.jar:$STANFORDTOOLSDIR/stanford-ner-2015-04-20/stanford-ner.jar:$STANFORDTOOLSDIR/stanford-parser-full-2015-04-20/stanford-parser.jar:$STANFORDTOOLSDIR/stanford-parser-full-2015-04-20/stanford-parser-3.5.2-models.jar

export STANFORD_MODELS=$STANFORDTOOLSDIR/stanford-postagger-full-2015-04-20/models:$STANFORDTOOLSDIR/stanford-ner-2015-04-20/classifiers

(For Windows: See https://mcmap.net/q/79960/-how-to-add-to-the-pythonpath-in-windows-so-it-finds-my-modules-packages-duplicate for instructions for setting environment variables)

You MUST set the variables as above before starting python, then:

>>> from nltk.tag.stanford import StanfordPOSTagger
>>> st = StanfordPOSTagger('english-bidirectional-distsim.tagger')
>>> st.tag('What is the airspeed of an unladen swallow ?'.split())
[(u'What', u'WP'), (u'is', u'VBZ'), (u'the', u'DT'), (u'airspeed', u'NN'), (u'of', u'IN'), (u'an', u'DT'), (u'unladen', u'JJ'), (u'swallow', u'VB'), (u'?', u'.')]

>>> from nltk.tag import StanfordNERTagger
>>> st = StanfordNERTagger('english.all.3class.distsim.crf.ser.gz') 
>>> st.tag('Rami Eid is studying at Stony Brook University in NY'.split())
[(u'Rami', u'PERSON'), (u'Eid', u'PERSON'), (u'is', u'O'), (u'studying', u'O'), (u'at', u'O'), (u'Stony', u'ORGANIZATION'), (u'Brook', u'ORGANIZATION'), (u'University', u'ORGANIZATION'), (u'in', u'O'), (u'NY', u'O')]


>>> from nltk.parse.stanford import StanfordParser
>>> parser=StanfordParser(model_path="edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz")
>>> list(parser.raw_parse("the quick brown fox jumps over the lazy dog"))
[Tree('ROOT', [Tree('NP', [Tree('NP', [Tree('DT', ['the']), Tree('JJ', ['quick']), Tree('JJ', ['brown']), Tree('NN', ['fox'])]), Tree('NP', [Tree('NP', [Tree('NNS', ['jumps'])]), Tree('PP', [Tree('IN', ['over']), Tree('NP', [Tree('DT', ['the']), Tree('JJ', ['lazy']), Tree('NN', ['dog'])])])])])])]

Alternatively, you could try add the environment variables inside python, as the previous answers have suggested but you can also directly tell the parser/tagger to initialize to the direct path where you kept the .jar file and your models.

There is NO need to set the environment variables if you use the following method BUT when the API changes its parameter names, you will need to change accordingly. That is why it is MORE advisable to set the environment variables than to modify your python code to suit the NLTK version.

For example (without setting any environment variables):

# POS tagging:

from nltk.tag import StanfordPOSTagger

stanford_pos_dir = '/home/alvas/stanford-postagger-full-2015-04-20/'
eng_model_filename= stanford_pos_dir + 'models/english-left3words-distsim.tagger'
my_path_to_jar= stanford_pos_dir + 'stanford-postagger.jar'

st = StanfordPOSTagger(model_filename=eng_model_filename, path_to_jar=my_path_to_jar) 
st.tag('What is the airspeed of an unladen swallow ?'.split())


# NER Tagging:
from nltk.tag import StanfordNERTagger

stanford_ner_dir = '/home/alvas/stanford-ner/'
eng_model_filename= stanford_ner_dir + 'classifiers/english.all.3class.distsim.crf.ser.gz'
my_path_to_jar= stanford_ner_dir + 'stanford-ner.jar'

st = StanfordNERTagger(model_filename=eng_model_filename, path_to_jar=my_path_to_jar) 
st.tag('Rami Eid is studying at Stony Brook University in NY'.split())

# Parsing:
from nltk.parse.stanford import StanfordParser

stanford_parser_dir = '/home/alvas/stanford-parser/'
eng_model_path = stanford_parser_dir  + "edu/stanford/nlp/models/lexparser/englishRNN.ser.gz"
my_path_to_models_jar = stanford_parser_dir  + "stanford-parser-3.5.2-models.jar"
my_path_to_jar = stanford_parser_dir  + "stanford-parser.jar"

parser=StanfordParser(model_path=eng_model_path, path_to_models_jar=my_path_to_models_jar, path_to_jar=my_path_to_jar)
Charter answered 6/12, 2015 at 0:45 Comment(0)
C
27

As of NLTK v3.3, users should avoid the Stanford NER or POS taggers from nltk.tag, and avoid Stanford tokenizer/segmenter from nltk.tokenize.

Instead use the new nltk.parse.corenlp.CoreNLPParser API.

Please see https://github.com/nltk/nltk/wiki/Stanford-CoreNLP-API-in-NLTK


(Avoiding link only answer, I've pasted the docs from NLTK github wiki below)

First, update your NLTK

pip3 install -U nltk # Make sure is >=3.3

Then download the necessary CoreNLP packages:

cd ~
wget http://nlp.stanford.edu/software/stanford-corenlp-full-2018-02-27.zip
unzip stanford-corenlp-full-2018-02-27.zip
cd stanford-corenlp-full-2018-02-27

# Get the Chinese model 
wget http://nlp.stanford.edu/software/stanford-chinese-corenlp-2018-02-27-models.jar
wget https://raw.githubusercontent.com/stanfordnlp/CoreNLP/master/src/edu/stanford/nlp/pipeline/StanfordCoreNLP-chinese.properties 

# Get the Arabic model
wget http://nlp.stanford.edu/software/stanford-arabic-corenlp-2018-02-27-models.jar
wget https://raw.githubusercontent.com/stanfordnlp/CoreNLP/master/src/edu/stanford/nlp/pipeline/StanfordCoreNLP-arabic.properties 

# Get the French model
wget http://nlp.stanford.edu/software/stanford-french-corenlp-2018-02-27-models.jar
wget https://raw.githubusercontent.com/stanfordnlp/CoreNLP/master/src/edu/stanford/nlp/pipeline/StanfordCoreNLP-french.properties 

# Get the German model
wget http://nlp.stanford.edu/software/stanford-german-corenlp-2018-02-27-models.jar
wget https://raw.githubusercontent.com/stanfordnlp/CoreNLP/master/src/edu/stanford/nlp/pipeline/StanfordCoreNLP-german.properties 


# Get the Spanish model
wget http://nlp.stanford.edu/software/stanford-spanish-corenlp-2018-02-27-models.jar
wget https://raw.githubusercontent.com/stanfordnlp/CoreNLP/master/src/edu/stanford/nlp/pipeline/StanfordCoreNLP-spanish.properties 

English

Still in the stanford-corenlp-full-2018-02-27 directory, start the server:

java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer \
-preload tokenize,ssplit,pos,lemma,ner,parse,depparse \
-status_port 9000 -port 9000 -timeout 15000 & 

Then in Python:

>>> from nltk.parse import CoreNLPParser

# Lexical Parser
>>> parser = CoreNLPParser(url='http://localhost:9000')

# Parse tokenized text.
>>> list(parser.parse('What is the airspeed of an unladen swallow ?'.split()))
[Tree('ROOT', [Tree('SBARQ', [Tree('WHNP', [Tree('WP', ['What'])]), Tree('SQ', [Tree('VBZ', ['is']), Tree('NP', [Tree('NP', [Tree('DT', ['the']), Tree('NN', ['airspeed'])]), Tree('PP', [Tree('IN', ['of']), Tree('NP', [Tree('DT', ['an']), Tree('JJ', ['unladen'])])]), Tree('S', [Tree('VP', [Tree('VB', ['swallow'])])])])]), Tree('.', ['?'])])])]

# Parse raw string.
>>> list(parser.raw_parse('What is the airspeed of an unladen swallow ?'))
[Tree('ROOT', [Tree('SBARQ', [Tree('WHNP', [Tree('WP', ['What'])]), Tree('SQ', [Tree('VBZ', ['is']), Tree('NP', [Tree('NP', [Tree('DT', ['the']), Tree('NN', ['airspeed'])]), Tree('PP', [Tree('IN', ['of']), Tree('NP', [Tree('DT', ['an']), Tree('JJ', ['unladen'])])]), Tree('S', [Tree('VP', [Tree('VB', ['swallow'])])])])]), Tree('.', ['?'])])])]

# Neural Dependency Parser
>>> from nltk.parse.corenlp import CoreNLPDependencyParser
>>> dep_parser = CoreNLPDependencyParser(url='http://localhost:9000')
>>> parses = dep_parser.parse('What is the airspeed of an unladen swallow ?'.split())
>>> [[(governor, dep, dependent) for governor, dep, dependent in parse.triples()] for parse in parses]
[[(('What', 'WP'), 'cop', ('is', 'VBZ')), (('What', 'WP'), 'nsubj', ('airspeed', 'NN')), (('airspeed', 'NN'), 'det', ('the', 'DT')), (('airspeed', 'NN'), 'nmod', ('swallow', 'VB')), (('swallow', 'VB'), 'case', ('of', 'IN')), (('swallow', 'VB'), 'det', ('an', 'DT')), (('swallow', 'VB'), 'amod', ('unladen', 'JJ')), (('What', 'WP'), 'punct', ('?', '.'))]]


# Tokenizer
>>> parser = CoreNLPParser(url='http://localhost:9000')
>>> list(parser.tokenize('What is the airspeed of an unladen swallow?'))
['What', 'is', 'the', 'airspeed', 'of', 'an', 'unladen', 'swallow', '?']

# POS Tagger
>>> pos_tagger = CoreNLPParser(url='http://localhost:9000', tagtype='pos')
>>> list(pos_tagger.tag('What is the airspeed of an unladen swallow ?'.split()))
[('What', 'WP'), ('is', 'VBZ'), ('the', 'DT'), ('airspeed', 'NN'), ('of', 'IN'), ('an', 'DT'), ('unladen', 'JJ'), ('swallow', 'VB'), ('?', '.')]

# NER Tagger
>>> ner_tagger = CoreNLPParser(url='http://localhost:9000', tagtype='ner')
>>> list(ner_tagger.tag(('Rami Eid is studying at Stony Brook University in NY'.split())))
[('Rami', 'PERSON'), ('Eid', 'PERSON'), ('is', 'O'), ('studying', 'O'), ('at', 'O'), ('Stony', 'ORGANIZATION'), ('Brook', 'ORGANIZATION'), ('University', 'ORGANIZATION'), ('in', 'O'), ('NY', 'STATE_OR_PROVINCE')]

Chinese

Start the server a little differently, still from the `stanford-corenlp-full-2018-02-27 directory:

java -Xmx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer \
-serverProperties StanfordCoreNLP-chinese.properties \
-preload tokenize,ssplit,pos,lemma,ner,parse \
-status_port 9001  -port 9001 -timeout 15000

In Python:

>>> parser = CoreNLPParser('http://localhost:9001')
>>> list(parser.tokenize(u'我家没有电脑。'))
['我家', '没有', '电脑', '。']

>>> list(parser.parse(parser.tokenize(u'我家没有电脑。')))
[Tree('ROOT', [Tree('IP', [Tree('IP', [Tree('NP', [Tree('NN', ['我家'])]), Tree('VP', [Tree('VE', ['没有']), Tree('NP', [Tree('NN', ['电脑'])])])]), Tree('PU', ['。'])])])]

Arabic

Start the server:

java -Xmx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer \
-serverProperties StanfordCoreNLP-arabic.properties \
-preload tokenize,ssplit,pos,parse \
-status_port 9005  -port 9005 -timeout 15000

In Python:

>>> from nltk.parse import CoreNLPParser
>>> parser = CoreNLPParser('http://localhost:9005')
>>> text = u'انا حامل'

# Parser.
>>> parser.raw_parse(text)
<list_iterator object at 0x7f0d894c9940>
>>> list(parser.raw_parse(text))
[Tree('ROOT', [Tree('S', [Tree('NP', [Tree('PRP', ['انا'])]), Tree('NP', [Tree('NN', ['حامل'])])])])]
>>> list(parser.parse(parser.tokenize(text)))
[Tree('ROOT', [Tree('S', [Tree('NP', [Tree('PRP', ['انا'])]), Tree('NP', [Tree('NN', ['حامل'])])])])]

# Tokenizer / Segmenter.
>>> list(parser.tokenize(text))
['انا', 'حامل']

# POS tagg
>>> pos_tagger = CoreNLPParser('http://localhost:9005', tagtype='pos')
>>> list(pos_tagger.tag(parser.tokenize(text)))
[('انا', 'PRP'), ('حامل', 'NN')]


# NER tag
>>> ner_tagger = CoreNLPParser('http://localhost:9005', tagtype='ner')
>>> list(ner_tagger.tag(parser.tokenize(text)))
[('انا', 'O'), ('حامل', 'O')]

French

Start the server:

java -Xmx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer \
-serverProperties StanfordCoreNLP-french.properties \
-preload tokenize,ssplit,pos,parse \
-status_port 9004  -port 9004 -timeout 15000

In Python:

>>> parser = CoreNLPParser('http://localhost:9004')
>>> list(parser.parse('Je suis enceinte'.split()))
[Tree('ROOT', [Tree('SENT', [Tree('NP', [Tree('PRON', ['Je']), Tree('VERB', ['suis']), Tree('AP', [Tree('ADJ', ['enceinte'])])])])])]
>>> pos_tagger = CoreNLPParser('http://localhost:9004', tagtype='pos')
>>> pos_tagger.tag('Je suis enceinte'.split())
[('Je', 'PRON'), ('suis', 'VERB'), ('enceinte', 'ADJ')]

German

Start the server:

java -Xmx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer \
-serverProperties StanfordCoreNLP-german.properties \
-preload tokenize,ssplit,pos,ner,parse \
-status_port 9002  -port 9002 -timeout 15000

In Python:

>>> parser = CoreNLPParser('http://localhost:9002')
>>> list(parser.raw_parse('Ich bin schwanger'))
[Tree('ROOT', [Tree('NUR', [Tree('S', [Tree('PPER', ['Ich']), Tree('VAFIN', ['bin']), Tree('AP', [Tree('ADJD', ['schwanger'])])])])])]
>>> list(parser.parse('Ich bin schwanger'.split()))
[Tree('ROOT', [Tree('NUR', [Tree('S', [Tree('PPER', ['Ich']), Tree('VAFIN', ['bin']), Tree('AP', [Tree('ADJD', ['schwanger'])])])])])]


>>> pos_tagger = CoreNLPParser('http://localhost:9002', tagtype='pos')
>>> pos_tagger.tag('Ich bin schwanger'.split())
[('Ich', 'PPER'), ('bin', 'VAFIN'), ('schwanger', 'ADJD')]

>>> pos_tagger = CoreNLPParser('http://localhost:9002', tagtype='pos')
>>> pos_tagger.tag('Ich bin schwanger'.split())
[('Ich', 'PPER'), ('bin', 'VAFIN'), ('schwanger', 'ADJD')]

>>> ner_tagger = CoreNLPParser('http://localhost:9002', tagtype='ner')
>>> ner_tagger.tag('Donald Trump besuchte Angela Merkel in Berlin.'.split())
[('Donald', 'PERSON'), ('Trump', 'PERSON'), ('besuchte', 'O'), ('Angela', 'PERSON'), ('Merkel', 'PERSON'), ('in', 'O'), ('Berlin', 'LOCATION'), ('.', 'O')]

Spanish

Start the server:

java -Xmx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer \
-serverProperties StanfordCoreNLP-spanish.properties \
-preload tokenize,ssplit,pos,ner,parse \
-status_port 9003  -port 9003 -timeout 15000

In Python:

>>> pos_tagger = CoreNLPParser('http://localhost:9003', tagtype='pos')
>>> pos_tagger.tag(u'Barack Obama salió con Michael Jackson .'.split())
[('Barack', 'PROPN'), ('Obama', 'PROPN'), ('salió', 'VERB'), ('con', 'ADP'), ('Michael', 'PROPN'), ('Jackson', 'PROPN'), ('.', 'PUNCT')]
>>> ner_tagger = CoreNLPParser('http://localhost:9003', tagtype='ner')
>>> ner_tagger.tag(u'Barack Obama salió con Michael Jackson .'.split())
[('Barack', 'PERSON'), ('Obama', 'PERSON'), ('salió', 'O'), ('con', 'O'), ('Michael', 'PERSON'), ('Jackson', 'PERSON'), ('.', 'O')]
Charter answered 23/8, 2018 at 8:39 Comment(6)
Excellent answer. Thank youLy
Thanks, this is very useful. The arabic parsing is not correct though. It is splitting the text to letters instead of wordsTerrorize
Use list(parser.raw_parse(text)) or list(parser.parse(parser.tokenize(text)). Corrected the example ;)Charter
Can't believe this isn't advertised more!!Zootechnics
Sadly, NLTK don't have enough people going around meetups to give talks or have the resources to host snazzy dev conference to promote the tool =( Feel free to introduce this feature or NLTK to the community around you.Charter
How can we specify the parser file in this API?Conduplicate
C
23

Deprecated Answer

The answer below is deprecated, please use the solution on https://mcmap.net/q/215937/-how-to-use-stanford-parser-in-nltk-using-python for NLTK v3.3 and above.


Edited

As of the current Stanford parser (2015-04-20), the default output for the lexparser.sh has changed so the script below will not work.

But this answer is kept for legacy sake, it will still work with http://nlp.stanford.edu/software/stanford-parser-2012-11-12.zip though.


Original Answer

I suggest you don't mess with Jython, JPype. Let python do python stuff and let java do java stuff, get the Stanford Parser output through the console.

After you've installed the Stanford Parser in your home directory ~/, just use this python recipe to get the flat bracketed parse:

import os
sentence = "this is a foo bar i want to parse."

os.popen("echo '"+sentence+"' > ~/stanfordtemp.txt")
parser_out = os.popen("~/stanford-parser-2012-11-12/lexparser.sh ~/stanfordtemp.txt").readlines()

bracketed_parse = " ".join( [i.strip() for i in parser_out if i.strip()[0] == "("] )
print bracketed_parse
Charter answered 17/1, 2013 at 9:58 Comment(3)
This worked for me except I needed to add a condition to check len(i.strip()) > 0 otherwise I got an index error. I guess my parser output had at least one line that was purely whitespace.Cargile
alternatively, use this python wrapper for stanford corenlp tools, bitbucket.org/torotoki/corenlp-pythonCharter
Be careful with this. If your input contains any 's, you will get some strange errors. There are better ways to call things on the command linePademelon
C
7

There is python interface for stanford parser

http://projects.csail.mit.edu/spatial/Stanford_Parser

Cotswold answered 18/12, 2012 at 18:16 Comment(0)
A
7

The Stanford Core NLP software page has a list of python wrappers:

http://nlp.stanford.edu/software/corenlp.shtml#Extensions

Annoy answered 21/8, 2013 at 19:28 Comment(0)
S
6

If I remember well, the Stanford parser is a java library, therefore you must have a Java interpreter running on your server/computer.

I used it once a server, combined with a php script. The script used php's exec() function to make a command-line call to the parser like so:

<?php

exec( "java -cp /pathTo/stanford-parser.jar -mx100m edu.stanford.nlp.process.DocumentPreprocessor /pathTo/fileToParse > /pathTo/resultFile 2>/dev/null" );

?>

I don't remember all the details of this command, it basically opened the fileToParse, parsed it, and wrote the output in the resultFile. PHP would then open the result file for further use.

The end of the command directs the parser's verbose to NULL, to prevent unnecessary command line information from disturbing the script.

I don't know much about Python, but there might be a way to make command line calls.

It might not be the exact route you were hoping for, but hopefully it'll give you some inspiration. Best of luck.

Sibling answered 14/12, 2012 at 17:25 Comment(0)
C
6

Note that this answer applies to NLTK v 3.0, and not to more recent versions.

Here is an adaptation of danger98's code that works with nltk3.0.0 on windoze, and presumably the other platforms as well, adjust directory names as appropriate for your setup:

import os
from nltk.parse import stanford
os.environ['STANFORD_PARSER'] = 'd:/stanford-parser'
os.environ['STANFORD_MODELS'] = 'd:/stanford-parser'
os.environ['JAVAHOME'] = 'c:/Program Files/java/jre7/bin'

parser = stanford.StanfordParser(model_path="d:/stanford-grammars/englishPCFG.ser.gz")
sentences = parser.raw_parse_sents(("Hello, My name is Melroy.", "What is your name?"))
print sentences

Note that the parsing command has changed (see the source code at www.nltk.org/_modules/nltk/parse/stanford.html), and that you need to define the JAVAHOME variable. I tried to get it to read the grammar file in situ in the jar, but have so far failed to do that.

Cottonseed answered 11/9, 2014 at 2:54 Comment(1)
I'm from 1989 not 98, but thanks for your example ;)Thinnish
L
4

You can use the Stanford Parsers output to create a Tree in nltk (nltk.tree.Tree).

Assuming the stanford parser gives you a file in which there is exactly one parse tree for every sentence. Then this example works, though it might not look very pythonic:

f = open(sys.argv[1]+".output"+".30"+".stp", "r")
parse_trees_text=[]
tree = ""
for line in f:
  if line.isspace():
    parse_trees_text.append(tree)
tree = ""
  elif "(. ...))" in line:
#print "YES"
tree = tree+')'
parse_trees_text.append(tree)
tree = ""
  else:
tree = tree + line

parse_trees=[]
for t in parse_trees_text:
  tree = nltk.Tree(t)
  tree.__delitem__(len(tree)-1) #delete "(. .))" from tree (you don't need that)
  s = traverse(tree)
  parse_trees.append(tree)
Laticialaticiferous answered 1/8, 2013 at 16:11 Comment(1)
+1 for letting java do java stuff and python do python stuff. Depending on how you call the java command and which options, parsing the output file from stanford parser might be different. It would be good if you also added details on how you called the Stanford Parse to get your output file.Charter
P
4

Note that this answer applies to NLTK v 3.0, and not to more recent versions.

Since nobody really mentioned and it's somehow troubled me a lot, here is an alternative way to use Stanford parser in python:

stanford_parser_jar = '../lib/stanford-parser-full-2015-04-20/stanford-parser.jar'
stanford_model_jar = '../lib/stanford-parser-full-2015-04-20/stanford-parser-3.5.2-models.jar'    
parser = StanfordParser(path_to_jar=stanford_parser_jar, 
                        path_to_models_jar=stanford_model_jar)

in this way, you don't need to worry about the path thing anymore.

For those who cannot use it properly on Ubuntu or run the code in Eclipse.

Possibly answered 8/4, 2016 at 16:41 Comment(0)
N
3

Note that this answer applies to NLTK v 3.0, and not to more recent versions.

Here is the windows version of alvas's answer

sentences = ('. '.join(['this is sentence one without a period','this is another foo bar sentence '])+'.').encode('ascii',errors = 'ignore')
catpath =r"YOUR CURRENT FILE PATH"

f = open('stanfordtemp.txt','w')
f.write(sentences)
f.close()

parse_out = os.popen(catpath+r"\nlp_tools\stanford-parser-2010-08-20\lexparser.bat "+catpath+r"\stanfordtemp.txt").readlines()

bracketed_parse = " ".join( [i.strip() for i in parse_out if i.strip() if i.strip()[0] == "("] )
bracketed_parse = "\n(ROOT".join(bracketed_parse.split(" (ROOT")).split('\n')
aa = map(lambda x :ParentedTree.fromstring(x),bracketed_parse)

NOTES:

  • In lexparser.bat you need to change all the paths into absolute path to avoid java errors such as "class not found"

  • I strongly recommend you to apply this method under windows since I Tried several answers on the page and all the methods communicates python with Java fails.

  • wish to hear from you if you succeed on windows and wish you can tell me how you overcome all these problems.

  • search python wrapper for stanford coreNLP to get the python version


Newly answered 11/12, 2014 at 3:46 Comment(0)
C
3

I am on a windows machine and you can simply run the parser normally as you do from the command like but as in a different directory so you don't need to edit the lexparser.bat file. Just put in the full path.

cmd = r'java -cp \Documents\stanford_nlp\stanford-parser-full-2015-01-30 edu.stanford.nlp.parser.lexparser.LexicalizedParser -outputFormat "typedDependencies" \Documents\stanford_nlp\stanford-parser-full-2015-01-30\stanford-parser-3.5.1-models\edu\stanford\nlp\models\lexparser\englishFactored.ser.gz stanfordtemp.txt'
parse_out = os.popen(cmd).readlines()

The tricky part for me was realizing how to run a java program from a different path. There must be a better way but this works.

Cabbage answered 6/3, 2015 at 16:30 Comment(0)
C
3

Note that this answer applies to NLTK v 3.0, and not to more recent versions.

A slight update (or simply alternative) on danger89's comprehensive answer on using Stanford Parser in NLTK and Python

With stanford-parser-full-2015-04-20, JRE 1.8 and nltk 3.0.4 (python 2.7.6), it appears that you no longer need to extract the englishPCFG.ser.gz from stanford-parser-x.x.x-models.jar or setting up any os.environ

from nltk.parse.stanford import StanfordParser

english_parser = StanfordParser('path/stanford-parser.jar', 'path/stanford-parser-3.5.2-models.jar')

s = "The real voyage of discovery consists not in seeking new landscapes, but in having new eyes."

sentences = english_parser.raw_parse_sents((s,))
print sentences #only print <listiterator object> for this version

#draw the tree
for line in sentences:
    for sentence in line:
        sentence.draw()
Carduaceous answered 28/8, 2015 at 10:31 Comment(0)
E
2

Note that this answer applies to NLTK v 3.0, and not to more recent versions.

I cannot leave this as a comment because of reputation, but since I spent (wasted?) some time solving this I would rather share my problem/solution to get this parser to work in NLTK.

In the excellent answer from alvas, it is mentioned that:

e.g. for the Parser, there won't be a model directory.

This led me wrongly to:

  • not be careful to the value I put to STANFORD_MODELS (and only care about my CLASSPATH)
  • leave ../path/tostanford-parser-full-2015-2012-09/models directory * virtually empty* (or with a jar file whose name did not match nltk regex)!

If the OP, like me, just wanted to use the parser, it may be confusing that when not downloading anything else (no POStagger, no NER,...) and following all these instructions, we still get an error.

Eventually, for any CLASSPATH given (following examples and explanations in answers from this thread) I would still get the error:

NLTK was unable to find stanford-parser-(\d+)(.(\d+))+-models.jar! Set the CLASSPATH environment variable. For more information, on stanford-parser-(\d+)(.(\d+))+-models.jar,

see: http://nlp.stanford.edu/software/lex-parser.shtml

OR:

NLTK was unable to find stanford-parser.jar! Set the CLASSPATH environment variable. For more information, on stanford-parser.jar, see: http://nlp.stanford.edu/software/lex-parser.shtml

Though, importantly, I could correctly load and use the parser if I called the function with all arguments and path fully specified, as in:

stanford_parser_jar = '../lib/stanford-parser-full-2015-04-20/stanford-parser.jar'
stanford_model_jar = '../lib/stanford-parser-full-2015-04-20/stanfor-parser-3.5.2-models.jar'    
parser = StanfordParser(path_to_jar=stanford_parser_jar, 
                    path_to_models_jar=stanford_model_jar)

Solution for Parser alone:

Therefore the error came from NLTK and how it is looking for jars using the supplied STANFORD_MODELS and CLASSPATH environment variables. To solve this, the *-models.jar, with the correct formatting (to match the regex in NLTK code, so no -corenlp-....jar) must be located in the folder designated by STANFORD_MODELS.

Namely, I first created:

mkdir stanford-parser-full-2015-12-09/models

Then added in .bashrc:

export STANFORD_MODELS=/path/to/stanford-parser-full-2015-12-09/models

And finally, by copying stanford-parser-3.6.0-models.jar (or corresponding version), into:

path/to/stanford-parser-full-2015-12-09/models/

I could get StanfordParser to load smoothly in python with the classic CLASSPATH that points to stanford-parser.jar. Actually, as such, you can call StanfordParser with no parameters, the default will just work.

Eben answered 18/10, 2016 at 0:45 Comment(0)
A
2

I took many hours and finally found a simple solution for Windows users. Basically its summarized version of an existing answer by alvas, but made easy to follow(hopefully) for those who are new to stanford NLP and are Window users.

1) Download the module you want to use, such as NER, POS etc. In my case i wanted to use NER, so i downloaded the module from http://nlp.stanford.edu/software/stanford-ner-2015-04-20.zip

2) Unzip the file.

3) Set the environment variables(classpath and stanford_modules) from the unzipped folder.

import os
os.environ['CLASSPATH'] = "C:/Users/Downloads/stanford-ner-2015-04-20/stanford-ner.jar"
os.environ['STANFORD_MODELS'] = "C:/Users/Downloads/stanford-ner-2015-04-20/classifiers/"

4) set the environment variables for JAVA, as in where you have JAVA installed. for me it was below

os.environ['JAVAHOME'] = "C:/Program Files/Java/jdk1.8.0_102/bin/java.exe"

5) import the module you want

from nltk.tag import StanfordNERTagger

6) call the pretrained model which is present in classifier folder in the unzipped folder. add ".gz" in the end for file extension. for me the model i wanted to use was english.all.3class.distsim.crf.ser

st = StanfordNERTagger('english.all.3class.distsim.crf.ser.gz')

7) Now execute the parser!! and we are done!!

st.tag('Rami Eid is studying at Stony Brook University in NY'.split())
Apparition answered 16/12, 2016 at 9:20 Comment(1)
F
2

I am using nltk version 3.2.4. And following code worked for me.

from nltk.internals import find_jars_within_path
from nltk.tag import StanfordPOSTagger
from nltk import word_tokenize

# Alternatively to setting the CLASSPATH add the jar and model via their 
path:
jar = '/home/ubuntu/stanford-postagger-full-2017-06-09/stanford-postagger.jar'
model = '/home/ubuntu/stanford-postagger-full-2017-06-09/models/english-left3words-distsim.tagger'

pos_tagger = StanfordPOSTagger(model, jar)

# Add other jars from Stanford directory
stanford_dir = pos_tagger._stanford_jar.rpartition('/')[0]
stanford_jars = find_jars_within_path(stanford_dir)
pos_tagger._stanford_jar = ':'.join(stanford_jars)

text = pos_tagger.tag(word_tokenize("Open app and play movie"))
print(text)

Output:

[('Open', 'VB'), ('app', 'NN'), ('and', 'CC'), ('play', 'VB'), ('movie', 'NN')]
Fixed answered 14/9, 2017 at 9:39 Comment(1)
I think this is the tagger and not the parserNingpo
C
2

Deprecated Answer

The answer below is deprecated, please use the solution on https://mcmap.net/q/215937/-how-to-use-stanford-parser-in-nltk-using-python for NLTK v3.3 and above.


EDITED

Note: The following answer will only work on:

  • NLTK version ==3.2.5
  • Stanford Tools compiled since 2016-10-31
  • Python 2.7, 3.5 and 3.6

As both tools changes rather quickly and the API might look very different 3-6 months later. Please treat the following answer as temporal and not an eternal fix.

Always refer to https://github.com/nltk/nltk/wiki/Installing-Third-Party-Software for the latest instruction on how to interface Stanford NLP tools using NLTK!!

TL;DR

The follow code comes from https://github.com/nltk/nltk/pull/1735#issuecomment-306091826

In terminal:

wget http://nlp.stanford.edu/software/stanford-corenlp-full-2016-10-31.zip
unzip stanford-corenlp-full-2016-10-31.zip && cd stanford-corenlp-full-2016-10-31

java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer \
-preload tokenize,ssplit,pos,lemma,parse,depparse \
-status_port 9000 -port 9000 -timeout 15000

In Python:

>>> from nltk.tag.stanford import CoreNLPPOSTagger, CoreNLPNERTagger
>>> from nltk.parse.corenlp import CoreNLPParser

>>> stpos, stner = CoreNLPPOSTagger(), CoreNLPNERTagger()

>>> stpos.tag('What is the airspeed of an unladen swallow ?'.split())
[(u'What', u'WP'), (u'is', u'VBZ'), (u'the', u'DT'), (u'airspeed', u'NN'), (u'of', u'IN'), (u'an', u'DT'), (u'unladen', u'JJ'), (u'swallow', u'VB'), (u'?', u'.')]

>>> stner.tag('Rami Eid is studying at Stony Brook University in NY'.split())
[(u'Rami', u'PERSON'), (u'Eid', u'PERSON'), (u'is', u'O'), (u'studying', u'O'), (u'at', u'O'), (u'Stony', u'ORGANIZATION'), (u'Brook', u'ORGANIZATION'), (u'University', u'ORGANIZATION'), (u'in', u'O'), (u'NY', u'O')]


>>> parser = CoreNLPParser(url='http://localhost:9000')

>>> next(
...     parser.raw_parse('The quick brown fox jumps over the lazy dog.')
... ).pretty_print()  # doctest: +NORMALIZE_WHITESPACE
                     ROOT
                      |
                      S
       _______________|__________________________
      |                         VP               |
      |                _________|___             |
      |               |             PP           |
      |               |     ________|___         |
      NP              |    |            NP       |
  ____|__________     |    |     _______|____    |
 DT   JJ    JJ   NN  VBZ   IN   DT      JJ   NN  .
 |    |     |    |    |    |    |       |    |   |
The quick brown fox jumps over the     lazy dog  .

>>> (parse_fox, ), (parse_wolf, ) = parser.raw_parse_sents(
...     [
...         'The quick brown fox jumps over the lazy dog.',
...         'The quick grey wolf jumps over the lazy fox.',
...     ]
... )

>>> parse_fox.pretty_print()  # doctest: +NORMALIZE_WHITESPACE
                     ROOT
                      |
                      S
       _______________|__________________________
      |                         VP               |
      |                _________|___             |
      |               |             PP           |
      |               |     ________|___         |
      NP              |    |            NP       |
  ____|__________     |    |     _______|____    |
 DT   JJ    JJ   NN  VBZ   IN   DT      JJ   NN  .
 |    |     |    |    |    |    |       |    |   |
The quick brown fox jumps over the     lazy dog  .

>>> parse_wolf.pretty_print()  # doctest: +NORMALIZE_WHITESPACE
                     ROOT
                      |
                      S
       _______________|__________________________
      |                         VP               |
      |                _________|___             |
      |               |             PP           |
      |               |     ________|___         |
      NP              |    |            NP       |
  ____|_________      |    |     _______|____    |
 DT   JJ   JJ   NN   VBZ   IN   DT      JJ   NN  .
 |    |    |    |     |    |    |       |    |   |
The quick grey wolf jumps over the     lazy fox  .

>>> (parse_dog, ), (parse_friends, ) = parser.parse_sents(
...     [
...         "I 'm a dog".split(),
...         "This is my friends ' cat ( the tabby )".split(),
...     ]
... )

>>> parse_dog.pretty_print()  # doctest: +NORMALIZE_WHITESPACE
        ROOT
         |
         S
  _______|____
 |            VP
 |    ________|___
 NP  |            NP
 |   |         ___|___
PRP VBP       DT      NN
 |   |        |       |
 I   'm       a      dog

Please take a look at http://www.nltk.org/_modules/nltk/parse/corenlp.html for more information on of the Stanford API. Take a look at the docstrings!

Charter answered 18/3, 2018 at 8:21 Comment(0)
A
1

A new development of the Stanford parser based on a neural model, trained using Tensorflow is very recently made available to be used as a python API. This model is supposed to be far more accurate than the Java-based moel. You can certainly integrate with an NLTK pipeline.

Link to the parser. Ther repository contains pre-trained parser models for 53 languages.

Arboreous answered 3/9, 2019 at 8:4 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.