Sentiment Analysis in R
Asked Answered
U

1

6

I am new in sentiment analysis, and totally have no idea on how to go about it using R. Hence, I would like to seek help and guidance in this.

I have a set of data consisting of opinions, and would like to analyse the the opinions.

Title      Date            Content    
Boy        May 13 2015     "She is pretty", Tom said. 
Animal     June 14 2015    The penguin is cute, lion added.
Human      March 09 2015   Mr Koh predicted that every human is smart..
Monster    Jan 22 2015     Ms May, a student, said that John has $10.80. 

Thank you.

Underside answered 16/9, 2015 at 2:45 Comment(1)
How is it different from your previous question?Ponton
N
4

Sentiment analysis encompasses a broad category of methods designed to measure positive versus negative sentiment from text, so that makes this a fairly difficult question to answer simply. But here is a simple answer: You can apply a dictionary to your document-term matrix and then combine the positive versus negative key categories of your dictionary to create a sentiment measure.

I suggest trying this in the text analysis package quanteda, which handles a variety of existing dictionary formats and allows you to create very flexible custom dictionaries.

For example:

require(quanteda)
mycorpus <- subset(inaugCorpus, Year>1980)
mydict <- dictionary(list(negative = c("detriment*", "bad*", "awful*", "terrib*", "horribl*"),
                          postive = c("good", "great", "super*", "excellent")))
myDfm <- dfm(mycorpus, dictionary = mydict)
## Creating a dfm from a corpus ...
##    ... lowercasing
##    ... tokenizing
##    ... indexing documents: 9 documents
##    ... indexing features: 3,113 feature types
##    ... applying a dictionary consisting of 2 keys
##    ... created a 9 x 2 sparse dfm
##    ... complete. 
## Elapsed time: 0.057 seconds.
myDfm
## Document-feature matrix of: 9 documents, 2 features.
## 9 x 2 sparse Matrix of class "dfmSparse"
##               features
## docs           negative postive
##   1981-Reagan         0       6
##   1985-Reagan         0       6
##   1989-Bush           0      18
##   1993-Clinton        1       2
##   1997-Clinton        2       8
##   2001-Bush           1       6
##   2005-Bush           0       8
##   2009-Obama          2       3
##   2013-Obama          1       3

# use a LIWC dictionary - obviously you need this file
liwcdict <- dictionary(file = "LIWC2001_English.dic", format = "LIWC")
myDfmLIWC <- dfm(mycorpus, dictionary = liwcdict)
## Creating a dfm from a corpus ...
##    ... lowercasing
##    ... tokenizing
##    ... indexing documents: 9 documents
##    ... indexing features: 3,113 feature types
##    ... applying a dictionary consisting of 68 keys
##    ... created a 9 x 68 sparse dfm
##    ... complete. 
## Elapsed time: 1.844 seconds.
myDfmLIWC[, grep("^Pos|^Neg", features(myDfmLIWC))]
## Document-feature matrix of: 9 documents, 4 features.
## 9 x 4 sparse Matrix of class "dfmSparse"
##               features
## docs           Negate Posemo Posfeel Negemo
##   1981-Reagan      46     89       5     24
##   1985-Reagan      28    104       7     33
##   1989-Bush        40    102      10      8
##   1993-Clinton     25     51       3     23
##   1997-Clinton     27     64       5     22
##   2001-Bush        40     80       6     27
##   2005-Bush        25    117       5     31
##   2009-Obama       40     83       5     46
##   2013-Obama       42     80      13     22

For your corpus, assuming that you get it into a data.frame called data, you can create a quanteda corpus using:

mycorpus <- corpus(data$Content, docvars = data[, 1:2])

See also ?textfile for loading in content from files in one easy command. This works with .csv files for instance, although you would have problems with that file because the Content field contains text containing commas.

There are many other ways to measure sentiment of course, but if you are new to sentiment mining and R, that should get you started. You can read more on sentiment mining methods (and apologies if you already have encountered them) from:

Nevus answered 16/9, 2015 at 10:18 Comment(8)
Hi, thanks. I am trying the code, and I received Error in validObject(.Object): invalid class "dfmSprase" object when running the line: dictDFM <- dfm(mycorpus, dictionary=mydict) @KenBenoitUnderside
I am able to do it after downloading the 3.2 version. However, I am not able to open the LIWC dictionary. @KenBenoitUnderside
And also I have read on many sentiment mining methods,but I don't know how to apply them. Could you please guide me along? Thank you so much. @KenBenoitUnderside
The LIWC dictionary is available for purchase from liwc.net. There are a lot of free dictionaries from the Provalis Research website too. On the different methods, if you are more specific, I can try to help.Nevus
Thank you for your response. I am actually doing a project to extract opinions, and would like to analyse it over time and events. So, I would like to take away those that are not opinions such as "Mr Tan", "a student" etc. and would just want to get what they said, feel etc. And after that, I would want to categorize opinions (pos/neg), and also rank the opinions. @KenBenoitUnderside
Also, my data is not "She is happy", "He is extremely sad". My data is like "Mr Tan expects 0.8% growth" etc. @KenBenoitUnderside
You should look at Loughran, T, and Bill McDonald. 2011. “When Is a Liability Not a Liability? Textual Analysis, Dictionaries, and 10‐Ks.” The Journal of Finance 66(1): 35–65. They tried to apply the Harvard-IV pos-neg categories to corporate 10-K filings and found that the dictionary needed significant adjustment for their application. You can download their dictionary as a Wordstat format (which quanteda supports) from the Provalis link above.Nevus
Thank you, I have read it. Can you guide me on how doing the code to do the sentiment analysis? Thanks. @KenBenoitUnderside

© 2022 - 2024 — McMap. All rights reserved.