How do I do dependency parsing in NLTK?
Asked Answered
G

7

39

Going through the NLTK book, it's not clear how to generate a dependency tree from a given sentence.

The relevant section of the book: sub-chapter on dependency grammar gives an example figure but it doesn't show how to parse a sentence to come up with those relationships - or maybe I'm missing something fundamental in NLP?

EDIT: I want something similar to what the stanford parser does: Given a sentence "I shot an elephant in my sleep", it should return something like:

nsubj(shot-2, I-1)
det(elephant-4, an-3)
dobj(shot-2, elephant-4)
prep(shot-2, in-5)
poss(sleep-7, my-6)
pobj(in-5, sleep-7)
Garage answered 16/9, 2011 at 10:26 Comment(1)
You can check this example. NLTK does not support type of dependency.Utilitarian
C
80

We can use Stanford Parser from NLTK.

Requirements

You need to download two things from their website:

  1. The Stanford CoreNLP parser.
  2. Language model for your desired language (e.g. english language model)

Warning!

Make sure that your language model version matches your Stanford CoreNLP parser version!

The current CoreNLP version as of May 22, 2018 is 3.9.1.

After downloading the two files, extract the zip file anywhere you like.

Python Code

Next, load the model and use it through NLTK

from nltk.parse.stanford import StanfordDependencyParser

path_to_jar = 'path_to/stanford-parser-full-2014-08-27/stanford-parser.jar'
path_to_models_jar = 'path_to/stanford-parser-full-2014-08-27/stanford-parser-3.4.1-models.jar'

dependency_parser = StanfordDependencyParser(path_to_jar=path_to_jar, path_to_models_jar=path_to_models_jar)

result = dependency_parser.raw_parse('I shot an elephant in my sleep')
dep = result.next()

list(dep.triples())

Output

The output of the last line is:

[((u'shot', u'VBD'), u'nsubj', (u'I', u'PRP')),
 ((u'shot', u'VBD'), u'dobj', (u'elephant', u'NN')),
 ((u'elephant', u'NN'), u'det', (u'an', u'DT')),
 ((u'shot', u'VBD'), u'prep', (u'in', u'IN')),
 ((u'in', u'IN'), u'pobj', (u'sleep', u'NN')),
 ((u'sleep', u'NN'), u'poss', (u'my', u'PRP$'))]

I think this is what you want.

Cambrian answered 19/11, 2015 at 15:36 Comment(11)
Should be accepted answer, works for me, thanks ywatTotally
If you are using Python 3, use result.__next__() instead of result.next()Theressa
next() on listiterater throws an error, installing graphviz solved the probelmClothesbasket
@Cambrian AttributeError: 'Tree' object has no attribute 'triples'Springfield
@Cambrian I also get the error 'Tree' object has no attribute 'triples'Rauch
@ywat, Is there a way to maintain the word order in the results?Forborne
@Cambrian There is no parser related file (stanford-parser.jar or stanford-parser-3.4.1-models.jar) in downloaded zip from nlp.stanford.edu/software/stanford-corenlp-full-2018-02-27.zip. Am I using incorrect link ?Bookbinding
@Bookbinding I clarified which files you needed (parser jar and models) and where you should download them from.Evert
You should add you need 3 things: the Stanford Parser, the Language Model and also the JDKBalough
It can not regconized punct such as "?"Continuance
Worth noting is that StanfordDependencyParser seems to be deprecated and they recommend using the nltk.parse.corenlp.CoreNLPDependencyParser insteadPoirier
D
8

I think you could use a corpus-based dependency parser instead of the grammar-based one NLTK provides.

Doing corpus-based dependency parsing on a even a small amount of text in Python is not ideal performance-wise. So in NLTK they do provide a wrapper to MaltParser, a corpus based dependency parser.

You might find this other question about RDF representation of sentences relevant.

Difficulty answered 16/9, 2011 at 14:59 Comment(0)
A
7

If you need better performance, then spacy (https://spacy.io/) is the best choice. Usage is very simple:

import spacy

nlp = spacy.load('en')
sents = nlp(u'A woman is walking through the door.')

You'll get a dependency tree as output, and you can dig out very easily every information you need. You can also define your own custom pipelines. See more on their website.

https://spacy.io/docs/usage/

Abseil answered 22/3, 2017 at 11:9 Comment(2)
@Aleksander Jovanovic, I tried this briefly, and the accuracy was terrible.Forborne
@Forborne Accuracy on what? I found spacy to be pretty satsfying with not too complex sentences, i.e. most of the sentences you can expect to work with. The models are also improving from time to time, so you could give it a shot again.Abseil
V
4

To use Stanford Parser from NLTK

1) Run CoreNLP Server at localhost
Download Stanford CoreNLP here (and also model file for your language). The server can be started by running the following command (more details here)

# Run the server using all jars in the current directory (e.g., the CoreNLP home directory)
java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 15000

or by NLTK API (need to configure the CORENLP_HOME environment variable first)

os.environ["CORENLP_HOME"] = "dir"
client = corenlp.CoreNLPClient()
# do something
client.stop()

2) Call the dependency parser from NLTK

>>> from nltk.parse.corenlp import CoreNLPDependencyParser
>>> dep_parser = CoreNLPDependencyParser(url='http://localhost:9000')
>>> parse, = dep_parser.raw_parse(
...     'The quick brown fox jumps over the lazy dog.'
... )
>>> print(parse.to_conll(4))  
The     DT      4       det
quick   JJ      4       amod
brown   JJ      4       amod
fox     NN      5       nsubj
jumps   VBZ     0       ROOT
over    IN      9       case
the     DT      9       det
lazy    JJ      9       amod
dog     NN      5       nmod
.       .       5       punct

See detail documentation here, also this question NLTK CoreNLPDependencyParser: Failed to establish connection.

Vasodilator answered 13/6, 2018 at 8:28 Comment(0)
D
3

If you want to be serious about dependance parsing don't use the NLTK, all the algorithms are dated, and slow. Try something like this: https://spacy.io/

Distrustful answered 6/3, 2015 at 2:44 Comment(1)
404 for me - must be really fast, and zoomed out of thereTotally
E
2

From the Stanford Parser documentation: "the dependencies can be obtained using our software [...] on phrase-structure trees using the EnglishGrammaticalStructure class available in the parser package." http://nlp.stanford.edu/software/stanford-dependencies.shtml

The dependencies manual also mentions: "Or our conversion tool can convert the output of other constituency parsers to the Stanford Dependencies representation." http://nlp.stanford.edu/software/dependencies_manual.pdf

Neither functionality seem to be implemented in NLTK currently.

Epigeous answered 24/7, 2013 at 22:51 Comment(0)
S
2

A little late to the party, but I wanted to add some example code with SpaCy that gets you your desired output:

import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp("I shot an elephant in my sleep")
for token in doc:
    print("{2}({3}-{6}, {0}-{5})".format(token.text, token.tag_, token.dep_, token.head.text, token.head.tag_, token.i+1, token.head.i+1))

And here's the output, very similar to your desired output:

nsubj(shot-2, I-1)
ROOT(shot-2, shot-2)
det(elephant-4, an-3)
dobj(shot-2, elephant-4)
prep(shot-2, in-5)
poss(sleep-7, my-6)
pobj(in-5, sleep-7)

Hope that helps!

Springclean answered 30/4, 2019 at 1:48 Comment(1)
For better readability, we can rearrange the print statement print(f"{token.dep_}({token.head.text}-{token.head.i+1}, {token.text}-{token.head.tag_})")Jarnagin

© 2022 - 2024 — McMap. All rights reserved.