What is the default chunker for NLTK toolkit in Python?

About

Asked 6/11, 2009 at 13:10 Answered 7/11, 2009 at 4:10

I am using their default POS tagging and default tokenization..and it seems sufficient. I'd like their default chunker too.

I am reading the NLTK toolkit book, but it does not seem like they have a default chunker?

Broomrape answered 6/11, 2009 at 13:10 Comment(0)

You can get out of the box named entity chunking with the nltk.ne_chunk() method. It takes a list of POS tagged tuples:

nltk.ne_chunk([('Barack', 'NNP'), ('Obama', 'NNP'), ('lives', 'NNS'), ('in', 'IN'), ('Washington', 'NNP')])

results in:

Tree('S', [Tree('PERSON', [('Barack', 'NNP')]), Tree('ORGANIZATION', [('Obama', 'NNP')]), ('lives', 'NNS'), ('in', 'IN'), Tree('GPE', [('Washington', 'NNP')])])

It identifies Barack as a person, but Obama as an organization. So, not perfect.

Gaullist answered 6/11, 2009 at 13:49 Comment(2)

What if I am not very concerned about named_entities, but chunking in general. For example, "the yellow dog" is a chunk, and "is running" is a chunk. – Broomrape 6/11, 2009 at 20:35

Yeah for that, there's no default to my knowledge (though I don't know everything about nltk, to be sure). You could use a RegexpChunkParser, though you'll have to develop the rules yourself. There's an example here: gnosis.cx/publish/programming/charming_python_b18.txt – Gaullist 7/11, 2009 at 3:1

I couldn't find a default chunker/shallow parser either. Although the book describes how to build and train one with example features. Coming up with additional features to get good performance shouldn't be too difficult.

See Chapter 7's section on Training Classifier-based Chunkers.

Giffin answered 7/11, 2009 at 4:10 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags