How to read constituency based parse tree
Asked Answered
S

3

6

I have a corpus of sentences that were preprocessed by Stanford's CoreNLP systems. One of the things it provides is the sentence's Parse Tree (Constituency-based). While I can understand a parse tree when it's drawn (like a tree), I'm not sure how to read it in this format:

E.g.:

          (ROOT
          (FRAG
          (NP (NN sent28))
          (: :)
          (S
          (NP (NNP Rome))
          (VP (VBZ is)
          (PP (IN in)
          (NP
          (NP (NNP Lazio) (NN province))
          (CC and)
          (NP
          (NP (NNP Naples))
          (PP (IN in)
          (NP (NNP Campania))))))))
          (. .)))

The original sentence is:

sent28: Rome is in Lazio province and Naples in Campania .

How am I supposed to read this tree, or alternatively, is there a code (in python) that does it properly? Thanks.

Stephanstephana answered 23/2, 2015 at 13:0 Comment(0)
N
11

NLTK has a class for reading parse trees: nltk.tree.Tree. The relevant method is called fromstring. You can then iterate its subtrees, leaves, etc...

As an aside: you might want to remove the bit that says sent28: as it confuses the parser (it's also not a part of the sentence). You are not getting a full parse tree, but just a sentence fragment.

Norge answered 23/2, 2015 at 13:14 Comment(2)
Thank you! and I added from nltk.draw.tree import draw_trees >>> draw_trees(tree) to visualize it as a real tree :-) [Oh and I can't take off sent28, it's part of an assignment...]Stephanstephana
Forgot to say: Tree.pprint is also very handyNorge
W
0

I know that this post is quite old, but I am convinced that my solution could be relevant for others as well.

I have written a library called Constituent Treelib that offers a convenient way to parse sentences into constituent trees, modify them according to their structure, as well as visualize and export them into various file formats. In addition, one can extract phrases according to their phrasal categories (which can be used e.g., as features for various NLP tasks), validate already parsed sentences in bracket notation or convert them back into sentences. The latter is what the OP asked for. Here are the steps to achieve this:

First, install the library via:

pip install constituent-treelib

Next, load the respective components from library and create the constituent tree given the sentence in a bracketed tree representation:

from constituent_treelib import ConstituentTree, BracketedTree, Language

# Define the language for the sentence as well as for the spaCy and benepar models
language = Language.English

# Define which specific SpaCy model should be used (default is Medium)
spacy_model_size = ConstituentTree.SpacyModelSize.Medium

# Create the pipeline (note, the required models will be downloaded and installed automatically)
nlp = ConstituentTree.create_pipeline(language, spacy_model_size)

# Your sentence
bracketed_tree_string = """(ROOT
(FRAG
(NP (NN sent28))
(: :)
(S
(NP (NNP Rome))
(VP (VBZ is)
(PP (IN in)
(NP
(NP (NNP Lazio) (NN province))
(CC and)
(NP
(NP (NNP Naples))
(PP (IN in)
(NP (NNP Campania))))))))
(. .)))""".splitlines()

bracketed_tree_string = " ".join(bracketed_tree_string)
sentence = BracketedTree(bracketed_tree_string)

# Create a constituent tree from which the original sentence will be recovered
tree = ConstituentTree(sentence, nlp) 

Finally, we recover the original sentence from the constituent tree using the followng:

tree.leaves(tree.nltk_tree, ConstituentTree.NodeContent.Text)

Result:

'sent28 : Rome is in Lazio province and Naples in Campania .'
Wrapper answered 10/4 at 13:33 Comment(0)
D
-3

You can just use stanford parser like:

sentences = parser.raw_parse_sents(["Hello, My name is Melroy.", "What is your name?"])  #probably raw_parse(just a string) or parse_sents(list but has been splited)
for line in sentences:
    for sentence in line:
        ***sentence.draw()***
Demavend answered 21/4, 2017 at 13:40 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.