Algorithms for named entity recognition
Asked Answered
H

6

22

I would like to use named entity recognition (NER) to find adequate tags for texts in a database.

I know there is a Wikipedia article about this and lots of other pages describing NER, I would preferably hear something about this topic from you:

  • What experiences did you make with the various algorithms?
  • Which algorithm would you recommend?
  • Which algorithm is the easiest to implement (PHP/Python)?
  • How to the algorithms work? Is manual training necessary?

Example:

"Last year, I was in London where I saw Barack Obama." => Tags: London, Barack Obama

I hope you can help me. Thank you very much in advance!

Hindi answered 22/6, 2009 at 12:26 Comment(0)
L
12

To start with check out http://www.nltk.org/ if you plan working with python although as far as I know the code isn't "industrial strength" but it will get you started.

Check out section 7.5 from http://nltk.googlecode.com/svn/trunk/doc/book/ch07.html but to understand the algorithms you probably will have to read through a lot of the book.

Also check this out http://nlp.stanford.edu/software/CRF-NER.shtml. It's done with java,

NER isn't an easy subject and probably nobody will tell you "this is the best algorithm", most of them have their pro/cons.

My 0.05 of a dollar.

Cheers,

Leery answered 22/6, 2009 at 13:53 Comment(4)
NLTK sounds good but it requires installation via shell, doesn't it? I can't install anything via shell.Hindi
What do you mean by installation via shell? Check out nltk.org/download, it's enough if you just add nltk to your PYTHONPATH.Leery
I battled with Standord's NER (and may have to return to it if I can find nothing else). It's kind of a mess. I would avoid it if possible.Undersexed
I would say Stanford has better results than nltk, at least in my case.Pellegrini
H
2

It depends on whether you want:

To learn about NER: An excellent place to start is with NLTK, and the associated book.

To implement the best solution: Here you're going to need to look for the state of the art. Have a look at publications in TREC. A more specialised meeting is Biocreative (a good example of NER applied to a narrow field).

To implement the easiest solution: In this case you basically just want to do simple tagging, and pull out the words tagged as nouns. You could use a tagger from nltk, or even just look up each word in PyWordnet and tag it with the most common wordsense.


Most algorithms required some sort of training, and perform best when they're trained on content that represents what you're going to be asking it to tag.

Herv answered 22/6, 2009 at 14:5 Comment(3)
I think even the easiest solution would need to do some n-gram analysis to try to find multiword entities.Sagittal
osteele.com/projects/pywordnet says "This is the old version of PyWordNet. PyWordNet was contributed to the NLTK project in 2006."Shaia
@Triptych: You'll find lots of n-grams that are "I love" and "of which"Shaia
H
1

There's a few tools and API's out there.

There's a tool built on top of DBPedia called DBPedia Spotlight (https://github.com/dbpedia-spotlight/dbpedia-spotlight/wiki). You can use their REST interface or download and install your own server. The great thing is it maps entities to their DBPedia presence, which means you can extract interesting linked data.

AlchemyAPI (www.alchemyapi.com) have an API that will do this via REST as well, and they use a freemium model.

I think most techniques rely on a bit of NLP to find entities, then use an underlying database like Wikipedia, DBPedia, Freebase, etc to do disambiguation and relevance (so for instance, trying to decide whether an article that mentions Apple is about the fruit or the company... we would choose the company if the article includes other entities that are linked to Apple the company).

Hb answered 16/7, 2014 at 20:1 Comment(0)
R
0

You may want to try Yahoo Research's latest Fast entity Linking system - the paper also has updated references to new approaches to NER using neural network based embeddings:

https://research.yahoo.com/publications/8810/lightweight-multilingual-entity-extraction-and-linking

Ritzy answered 8/12, 2016 at 21:56 Comment(0)
S
0

One can use artificial neural networks to perform named-entity recognition.

Here is an implementation of a bi-directional LSTM + CRF Network in TensorFlow (python) to perform named-entity recognition: https://github.com/Franck-Dernoncourt/NeuroNER (works on Linux/Mac/Windows).

It gives state-of-the-art results (or close to it) on several named-entity recognition datasets. As Ale mentions, each named-entity recognition algorithm has its own downsides and upsides.

ANN architecture:

enter image description here

As viewed in TensorBoard:

enter image description here

Sharitasharity answered 4/5, 2017 at 23:48 Comment(0)
M
-11

I don't really know about NER, but judging from that example, you could make an algorithm that searched for capital letters in the words or something like that. For that I would recommend regex as the most easy to implement solution if you're thinking small.

Another option is to compare the texts with a database, wich yould match string pre-identified as Tags of interest.

my 5 cents.

Miamiami answered 22/6, 2009 at 12:38 Comment(3)
This doesn't work. First, it only works in correct English texts. In addition to that, it doesn't work if there's no case sensitivity.Hindi
yup ... i wuz n lunden n i sore barrack ohbamaAurore
This is the worse suggestion. NER is a very wide field and there has been a lot a research for the same. I would suggest using Stanford-NER system.Janitajanith

© 2022 - 2024 — McMap. All rights reserved.