Strategies for recognizing proper nouns in NLP
Asked Answered
A

8

14

I'm interested in learning more about Natural Language Processing (NLP) and am curious if there are currently any strategies for recognizing proper nouns in a text that aren't based on dictionary recognition? Also, could anyone explain or link to resources that explain the current dictionary-based methods? Who are the authoritative experts on NLP or what are the definitive resources on the subject?

Asher answered 3/3, 2009 at 23:56 Comment(0)
U
12

The task of determining the proper part of speech for a word in a text is called Part of Speech Tagging. The Brill tagger, for example, uses a mixture of dictionary(vocabulary) words and contextual rules. I believe that some of the important initial dictionary words for this task are the stop words. Once you have (mostly correct) parts of speech for your words, you can start building larger structures. This industry-oriented book differentiates between recognizing noun phrases (NPs) and recognizing named entities. About textbooks: Allen's Natural Language Understanding is a good, but a bit dated, book. Foundations of Statistical Natural Language Processing is a nice introduction to statistical NLP. Speech and Language Processing is a bit more rigorous and maybe more authoritative. The Association for Computational Linguistics is a leading scientific community on computational linguistics.

Undertrick answered 4/3, 2009 at 10:10 Comment(0)
V
6

Besides the dictionary-based approach, two others come to my mind:

  • Pattern-based approaches (in a simple form: anything that is capitalized is a proper noun)
  • Machine learning approaches (mark proper nouns in a training corpus and train a classifier)

The field is mostly called named-entity extraction and often considered a subfield of information extraction. A good starting point for the different fields of NLP is usually the according chapter in the Oxford Handbook of Computational Linguistics:

Oxford Handbook of Computational Linguistics
(source: oup.com)

Vallievalliere answered 4/3, 2009 at 0:8 Comment(1)
Ah, thanks for the "named-entity extraction" term. Sometimes figuring out the correct terms is the hardest part when you're just starting to learn about something.Asher
D
5

Try searching for "named entity recognition"--that's the term that's used in the NLP literature for this sort of thing.

Delacruz answered 16/3, 2009 at 5:53 Comment(0)
G
2

It depends on what you mean by dictionary-based.

For example, one strategy would be to take things that aren't in a dictionary and try to proceed on the assumption that they're proper nouns. If this leads to a sensible parse, consider the assumption provisionally validated and keep going, otherwise conclude that they aren't.

Other ideas:

  • In subject position, any simple subject without a determiner is a good candidate.
  • Ditto in prepositional phrases
  • In any position, the basis of a possessive determiner (e.g. Bob in "Bob's sister") is a good candidate

-- MarkusQ

Grenadier answered 4/3, 2009 at 0:21 Comment(2)
Interesting idea about things that aren't in the dictionary.Asher
Things that arent in the dictionary can be gibberish as well eg.: "jbbdsabf", "12jbbsajf"Ronrona
B
0

some toolkits suggested: 1. Opennlp: there is a Named Entity Recognition component for your task 2. LingPipe: also a NER component for it 3. Stanford NLP package: excellent package for academic usage, maybe not commercial friendly. 4. nltk: a Python NLP package

Bertine answered 18/12, 2012 at 18:0 Comment(0)
H
0

if you have sentence such as "who is bill gates" And if you apply part of speech tagger to it. It will give answer as

"who/WP is/VBZ bill/NN gates/NNS ?/. "

U can try this online on http://cst.dk/online/pos_tagger/uk/

So you are getting what are all the nouns in this sentence. Now you can easily extract this nouns with some algorithm. I suggest to use python if you are using natural language processing. It has NLTK(Natural language toolkit) with which you can work.

Halfwit answered 29/8, 2013 at 3:45 Comment(0)
A
0

If you're interested in the implementation of natural language processing and python is your programming language, then this can be a very informative resource: http://www.youtube.com/watch?v=kKe4M4iSclc

Anissa answered 11/10, 2013 at 11:27 Comment(0)
W
0

Though this is for Bengali language, but it can draw a common procedure identified proper noun. So I hope this will be helpful for you. Please check the folowing link: http://www.mecs-press.org/ijmecs/ijmecs-v6-n8/v6n8-1.html

Waring answered 18/1, 2015 at 18:16 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.