Chunking with nltk

Asked 4/2, 2013 at 17:51 Answered 5/2, 2013 at 0:11

How can I obtain all the chunk from a sentence given a pattern. Exemple

NP:{<NN><NN>}

Sentence tagged:

[("money", "NN"), ("market", "NN") ("fund", "NN")]

If I parse I obtain

(S (NP money/NN market/NN) fund/NN)

I would like to have also the other alternative that is

(S money/NN (NP market/NN fund/NN))

Phototelegraphy answered 4/2, 2013 at 17:51 Comment(3)

this isn't chunking, it's called parsing – Eats 4/2, 2013 at 17:53

Is not parsing still more computationaly consuming of chunking even if I look for all possible chuking? – Phototelegraphy 5/2, 2013 at 8:54

Chunking is also known as shallow parsing. Shallow parsing is when you are concerned with big NPs and disregard what the orders and POS of what is inside the NPs, then a normal regex chunker might work. But your question wants the intricate order of the NPs (i.e. deep parsing), so a parser would be necessary. – Juarez 5/2, 2013 at 10:8

I think your question is about getting the n most likely parses of a sentence. Am I right? If yes, see the nbest_parse(sent, n=None) function in the 2.0 documentation.

Barbel answered 4/2, 2013 at 17:56 Comment(1)

it's seam give the same ansewr for RegexpParser even if I parse with iter_parse. – Phototelegraphy 5/2, 2013 at 8:59

@mbatchkarov is right about the nbest_parse documentation. For the sake of code example see:

import nltk
# Define the cfg grammar.
grammar = nltk.parse_cfg("""
S -> NP
S -> NN NP
S -> NP NN
NP -> NN NN
NN -> 'market'
NN -> 'money'
NN -> 'fund'
""")

# Make your string into a list of tokens.
sentence = "money market fund".split(" ")

# Load the grammar into the ChartParser.
cp = nltk.ChartParser(grammar)

# Generate and print the nbest_parse from the grammar given the sentence tokens.
for tree in cp.nbest_parse(sentence):
    print tree

Juarez answered 5/2, 2013 at 0:11 Comment(2)

First you create the CFG grammar needed. And terminal nodes (i.e. your vocabulary/words) also needs to be in the grammar. Then you call the ChartParser to load your grammar that you've defined. Then you try to get the best parse given the sentence list you pass into the nbest_parse. – Juarez 5/2, 2013 at 10:3

I was thinking just to use regular expression without grammar....in the case of regexp if you have a string like nnn and you are looking for the expression nn the rexep allow you to have the list of indexes that match the pattern in this case (0, 2) and (1, 3). – Phototelegraphy 5/2, 2013 at 10:51

I think your question is about getting the n most likely parses of a sentence. Am I right? If yes, see the nbest_parse(sent, n=None) function in the 2.0 documentation.

Barbel answered 4/2, 2013 at 17:56 Comment(1)

it's seam give the same ansewr for RegexpParser even if I parse with iter_parse. – Phototelegraphy 5/2, 2013 at 8:59

Recommended topics

Hot tags