I am trying to figure out how to use NLTK's cascading chunker as per Chapter 7 of the NLTK book. Unfortunately, I'm running into a few issues when performing non-trivial chunking measures.
Let's start with this phrase:
"adventure movies between 2000 and 2015 featuring performances by daniel craig"
I am able to find all the relevant NPs when I use the following grammar:
grammar = "NP: {<DT>?<JJ>*<NN.*>+}"
However, I am not sure how to build nested structures with NLTK. The book gives the following format, but there are clearly a few things missing (e.g. How does one actually specify multiple rules?):
grammar = r"""
NP: {<DT|JJ|NN.*>+} # Chunk sequences of DT, JJ, NN
PP: {<IN><NP>} # Chunk prepositions followed by NP
VP: {<VB.*><NP|PP|CLAUSE>+$} # Chunk verbs and their arguments
CLAUSE: {<NP><VP>} # Chunk NP, VP
"""
In my case, I'd like to do something like the following:
grammar = r"""
MEDIA: {<DT>?<JJ>*<NN.*>+}
RELATION: {<V.*>}{<DT>?<JJ>*<NN.*>+}
ENTITY: {<NN.*>}
"""
Assuming that I'd like to use a cascaded chunker for my task, what syntax would I need to use? Additionally, is it possible for me to specify specific words (e.g. "directed" or "acted") when using a chunker?