Good algorithm for sentiment analysis
Asked Answered
A

4

7

I tried naive bayes classifier and it's working very bad. SVM works a little better but still horrible. Most of the papers which i read about SVM and naive bayes with some variations(n-gram, POS etc) but all of them gives results close to 50% (authors of articles talk about 80% and high but i cannt to get same accurate on real data).

Is there any more powerfull methods except lexixal analys? SVM and Bayes suppose that words independet. These approach called "bag of words". What if we suppose that words are associated?

For example: Use apriory algorithm to detect that if sentences contains "bad and horrible" then 70% probality that sentence is negative. Also we can use distance between words and so on.

Is it good idea or i'm inventing bicycle?

Adorne answered 11/6, 2012 at 14:0 Comment(0)
Q
6

You're confusing a couple of concepts here. Neither Naive Bayes nor SVMs are tied to the bag of words approach. Neither SVMs nor the BOW approach have an independence assumption between terms.

Here's some things you can try:

  • include punctuation marks in your bags of words; esp. ! and ? can be helpful for sentiment analysis, while many feature extractors geared toward document classification throw them away
  • same for stop words: words like "I" and "my" may be indicative of subjective text
  • build a two-stage classifier; first determine whether any opinion is expressed, then whether it's positive or negative
  • try a quadratic kernel SVM instead of a linear one to capture interactions between features.
Queensland answered 11/6, 2012 at 14:7 Comment(4)
What do you think about apriory algorithm and assotiation between words?Adorne
@Neir0: I don't immediately see how you'd want to apply it. I've also never seen attempts to do sentiment analysis with it. I know that some people use it to construct approximations to the quadratic kernel (roughly what you call "word associations"), but then I'd try a vanilla kernel SVM first.Queensland
Straightforward way is to input tokens with badge neg or pos. For example: "pos i love my mom". On output i get something like " if we have love and mom in senteces then 70% that we have pos badge". Of course we can modify this approach for better results.Adorne
@Neir0: sure, that's an approach you could try. It does seem overkill, though -- IIUC, Apriori is intended to find arbitrary associations between items in its input, while this is a classification task, where you know which property of the input you want to predict (polarity); it seems like you're throwing away knowledge about the task.Queensland
T
5

Algorithms like SVM, Naive Bayes and maximum entropy ones are supervised machine learning algorithms and the output of your program depends on the training set you have provided. For large scale sentiment analysis I prefer using unsupervised learning method in which one can determine the sentiments of the adjectives by clustering documents into same-oriented parts, and label the clusters positive or negative. More information can be found out from this paper. http://icwsm.org/papers/3--Godbole-Srinivasaiah-Skiena.pdf

Hope this helps you in your work :)

Titustityus answered 24/11, 2012 at 6:7 Comment(0)
D
2

You can find some useful material on Sentimnetal analysis using python. This presentation summarizes Sentiment Analysis as 3 simple steps

  • Labeling data
  • Preprocessing &
  • Model Learning
Deviation answered 4/6, 2015 at 17:38 Comment(0)
C
0

Sentiment Analysis is an area of ongoing research. And there is a lot of research going on right now. For an overview of the most recent, most successful approaches, I would generally advice you to have a look at the shared tasks of SemEval. Usually, every year they run a competition on Sentiment Analysis in Twitter. You can find the paper describing the task, and the results for 2016 here (might be a bit technical though): http://alt.qcri.org/semeval2016/task4/data/uploads/semeval2016_task4_report.pdf

Starting from there, you can have a look in the papers describing the individual systems (as referenced there).

Console answered 5/9, 2016 at 8:15 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.