Entity Recognition and Sentiment Analysis using NLP
Asked Answered
P

4

7

So, this question might be a little naive, but I thought asking the friendly people of Stackoverflow wouldn't hurt.

My current company has been using a third party API for NLP for a while now. We basically URL encode a string and send it over, and they extract certain entities for us (we have a list of entities that we're looking for) and return a json mapping of entity : sentiment. We've recently decided to bring this project in house instead.

I've been studying NLTK, Stanford NLP and lingpipe for the past 2 days now, and can't figure out if I'm basically reinventing the wheel doing this project.

We already have massive tables containing the original unstructured text and another table containing the extracted entities from that text and their sentiment. The entities are single words. For example:

Unstructured text : Now for the bed. It wasn't the best.

Entity : Bed

Sentiment : Negative

I believe that implies we have training data (unstructured text) as well as entity and sentiments. Now how I can go about using this training data on one of the NLP frameworks and getting what we want? No clue. I've sort of got the steps, but not sure:

  1. Tokenize sentences
  2. Tokenize words
  3. Find the noun in the sentence (POS tagging)
  4. Find the sentiment of that sentence.

But that should fail for the case I mentioned above since it talks about the bed in 2 different sentences?

So the question - Does any one know what the best framework would be for accomplishing the above tasks, and any tutorials on the same (Note: I'm not asking for a solution). If you've done this stuff before, is this task too large to take on? I've looked up some commercial APIs but they're absurdly expensive to use (we're a tiny startup).

Thanks stackoverflow!

Porphyry answered 25/3, 2014 at 20:56 Comment(0)
C
3

OpenNLP may also library to look at. At least they have a small tutuorial to train the name finder and to use the document categorizer to do sentiment analysis. To trtain the name finder you have to prepare training data by taging the entities in your text with SGML tags.

http://opennlp.apache.org/documentation/1.5.3/manual/opennlp.html#tools.namefind.training

Chromogenic answered 29/4, 2014 at 9:13 Comment(1)
Thanks Kai! I actually ended up writing my own library for training with tagged entities in NLTK. It worked, urm, okayish. Trying to do it in Stanford NLP now, since I'm finding their NER and Sentiment Extraction a lot nicer to work with/friendlier. I'll update this question with a link to the codes soon.Porphyry
S
1

NLTK provides a naive NER tagger along with resources. But It doesnt fit into all cases (including finding dates.) But NLTK allows you to modify and customize the NER Tagger according to the requirement. This link might give you some ideas with basic examples on how to customize. Also if you are comfortable with scala and functional programming this is one tool you cannot afford to miss.

Cheers...!

Splore answered 17/12, 2014 at 7:48 Comment(0)
T
1

I have discovered spaCy lately and it's just great ! In the link you can find comparative for performance in term of speed and accuracy compared to NLTK, CoreNLP and it does really well !

Though to solve your problem task is not a matter of a framework. You can have two different system, one for NER and one for Sentiment and they can be completely independent. The hype these days is to use neural network and if you are willing too, you can train a recurrent neural network (which has showed best performance for NLP tasks) with attention mechanism to find the entity and the sentiment too.

There are great demo everywhere on the internet, the last two I have read and found interesting are [1] and [2].

Teddytedeschi answered 21/4, 2017 at 8:36 Comment(0)
A
0

Similar to Spacy, TextBlob is another fast and easy package that can accomplish many of these tasks.

I use NLTK, Spacy, and Textblob frequently. If the corpus is simple, generic, and straightforward, Spacy and Textblob work well OOTB. If the corpus is highly customized, domain-specific, messy (incorrect spelling or grammar), etc. I'll use NLTK and spend more time customizing my NLP text processing pipeline with scrubbing, lemmatizing, etc.

NLTK Tutorial: http://www.nltk.org/book/

Spacy Quickstart: https://spacy.io/usage/

Textblob Quickstart: http://textblob.readthedocs.io/en/dev/quickstart.html

Aluminize answered 31/1, 2018 at 19:52 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.