Sentiment Analysis in R

Sentiment analysis encompasses a broad category of methods designed to measure positive versus negative sentiment from text, so that makes this a fairly difficult question to answer simply. But here is a simple answer: You can apply a dictionary to your document-term matrix and then combine the positive versus negative key categories of your dictionary to create a sentiment measure.

I suggest trying this in the text analysis package quanteda, which handles a variety of existing dictionary formats and allows you to create very flexible custom dictionaries.

For example:

require(quanteda)
mycorpus <- subset(inaugCorpus, Year>1980)
mydict <- dictionary(list(negative = c("detriment*", "bad*", "awful*", "terrib*", "horribl*"),
                          postive = c("good", "great", "super*", "excellent")))
myDfm <- dfm(mycorpus, dictionary = mydict)
## Creating a dfm from a corpus ...
##    ... lowercasing
##    ... tokenizing
##    ... indexing documents: 9 documents
##    ... indexing features: 3,113 feature types
##    ... applying a dictionary consisting of 2 keys
##    ... created a 9 x 2 sparse dfm
##    ... complete. 
## Elapsed time: 0.057 seconds.
myDfm
## Document-feature matrix of: 9 documents, 2 features.
## 9 x 2 sparse Matrix of class "dfmSparse"
##               features
## docs           negative postive
##   1981-Reagan         0       6
##   1985-Reagan         0       6
##   1989-Bush           0      18
##   1993-Clinton        1       2
##   1997-Clinton        2       8
##   2001-Bush           1       6
##   2005-Bush           0       8
##   2009-Obama          2       3
##   2013-Obama          1       3

# use a LIWC dictionary - obviously you need this file
liwcdict <- dictionary(file = "LIWC2001_English.dic", format = "LIWC")
myDfmLIWC <- dfm(mycorpus, dictionary = liwcdict)
## Creating a dfm from a corpus ...
##    ... lowercasing
##    ... tokenizing
##    ... indexing documents: 9 documents
##    ... indexing features: 3,113 feature types
##    ... applying a dictionary consisting of 68 keys
##    ... created a 9 x 68 sparse dfm
##    ... complete. 
## Elapsed time: 1.844 seconds.
myDfmLIWC[, grep("^Pos|^Neg", features(myDfmLIWC))]
## Document-feature matrix of: 9 documents, 4 features.
## 9 x 4 sparse Matrix of class "dfmSparse"
##               features
## docs           Negate Posemo Posfeel Negemo
##   1981-Reagan      46     89       5     24
##   1985-Reagan      28    104       7     33
##   1989-Bush        40    102      10      8
##   1993-Clinton     25     51       3     23
##   1997-Clinton     27     64       5     22
##   2001-Bush        40     80       6     27
##   2005-Bush        25    117       5     31
##   2009-Obama       40     83       5     46
##   2013-Obama       42     80      13     22

For your corpus, assuming that you get it into a data.frame called data, you can create a quanteda corpus using:

mycorpus <- corpus(data$Content, docvars = data[, 1:2])

See also ?textfile for loading in content from files in one easy command. This works with .csv files for instance, although you would have problems with that file because the Content field contains text containing commas.

There are many other ways to measure sentiment of course, but if you are new to sentiment mining and R, that should get you started. You can read more on sentiment mining methods (and apologies if you already have encountered them) from:

Liu, Bing. 2010. "Sentiment Analysis and Subjectivity." Handbook of natural language processing 2: 627–66.
Liu, Bing. 2015. Sentiment Analysis: Mining Opinions, Sentiments, and Emotions. Cambridge University Press.

Recommended topics

Hot tags