Sentiment analysis encompasses a broad category of methods designed to measure positive versus negative sentiment from text, so that makes this a fairly difficult question to answer simply. But here is a simple answer: You can apply a dictionary to your document-term matrix and then combine the positive versus negative key categories of your dictionary to create a sentiment measure.
I suggest trying this in the text analysis package quanteda, which handles a variety of existing dictionary formats and allows you to create very flexible custom dictionaries.
For example:
require(quanteda)
mycorpus <- subset(inaugCorpus, Year>1980)
mydict <- dictionary(list(negative = c("detriment*", "bad*", "awful*", "terrib*", "horribl*"),
postive = c("good", "great", "super*", "excellent")))
myDfm <- dfm(mycorpus, dictionary = mydict)
## Creating a dfm from a corpus ...
## ... lowercasing
## ... tokenizing
## ... indexing documents: 9 documents
## ... indexing features: 3,113 feature types
## ... applying a dictionary consisting of 2 keys
## ... created a 9 x 2 sparse dfm
## ... complete.
## Elapsed time: 0.057 seconds.
myDfm
## Document-feature matrix of: 9 documents, 2 features.
## 9 x 2 sparse Matrix of class "dfmSparse"
## features
## docs negative postive
## 1981-Reagan 0 6
## 1985-Reagan 0 6
## 1989-Bush 0 18
## 1993-Clinton 1 2
## 1997-Clinton 2 8
## 2001-Bush 1 6
## 2005-Bush 0 8
## 2009-Obama 2 3
## 2013-Obama 1 3
# use a LIWC dictionary - obviously you need this file
liwcdict <- dictionary(file = "LIWC2001_English.dic", format = "LIWC")
myDfmLIWC <- dfm(mycorpus, dictionary = liwcdict)
## Creating a dfm from a corpus ...
## ... lowercasing
## ... tokenizing
## ... indexing documents: 9 documents
## ... indexing features: 3,113 feature types
## ... applying a dictionary consisting of 68 keys
## ... created a 9 x 68 sparse dfm
## ... complete.
## Elapsed time: 1.844 seconds.
myDfmLIWC[, grep("^Pos|^Neg", features(myDfmLIWC))]
## Document-feature matrix of: 9 documents, 4 features.
## 9 x 4 sparse Matrix of class "dfmSparse"
## features
## docs Negate Posemo Posfeel Negemo
## 1981-Reagan 46 89 5 24
## 1985-Reagan 28 104 7 33
## 1989-Bush 40 102 10 8
## 1993-Clinton 25 51 3 23
## 1997-Clinton 27 64 5 22
## 2001-Bush 40 80 6 27
## 2005-Bush 25 117 5 31
## 2009-Obama 40 83 5 46
## 2013-Obama 42 80 13 22
For your corpus, assuming that you get it into a data.frame called data
, you can create a quanteda corpus using:
mycorpus <- corpus(data$Content, docvars = data[, 1:2])
See also ?textfile
for loading in content from files in one easy command. This works with .csv files for instance, although you would have problems with that file because the Content field contains text containing commas.
There are many other ways to measure sentiment of course, but if you are new to sentiment mining and R, that should get you started. You can read more on sentiment mining methods (and apologies if you already have encountered them) from: