Sentimental Analysis of review comments using qdap is slow
Asked Answered
E

1

6

Am using qdap package to determine the sentiment of each review comment of a particular application. I read the review comments from a CSV file and pass it to the polarity function of qdap. Everything works fine and I get the polarity for all the review comments but the problem is that it takes 7-8 seconds to calculate the polarity all the sentences (total number of sentences present in the CSV file is 779). I am pasting my code below.

  temp_csv <- filePath()
  attach(temp_csv)
  text_data <- temp_csv[,c('Content')]
  print(Sys.time())
  polterms <- list(neg=c('wtf'))
  POLKEY <- sentiment_frame(positives=c(positive.words),negatives=c(polterms[[1]],negative.words))     
  polarity <- polarity(sentences, polarity.frame = POLKEY) 
  print(Sys.time())

Time taken is as follows:

[1] "2016-04-12 16:43:01 IST"

[1] "2016-04-12 16:43:09 IST"

Can somebody let me know if I am doing something wrong? How can I improve the performance?

Earmark answered 12/4, 2016 at 12:1 Comment(0)
D
12

I am the author of qdap. The polarity function was designed for much smaller data sets. As my role shifted I began to work with larger data sets. I needed fast and accurate (these two things are in opposition to each other) and have since developed a break away package sentimentr. The algorithm is optimized to be faster and more accurate than qdap's polarity.

As it stands now you have 5 dictionary based (or trained alorithm based) approached to sentiment detection. Each has it's drawbacks (-) and pluses (+) and is useful in certain circumstances.

  1. qdap +on CRAN; -slow
  2. syuzhet +on CRAN; +fast; +great plotting; -less accurate on non-literature use
  3. sentimentr +fast; +higher accuracy; -GitHub only
  4. stansent (stanford port) +most accurate; -slower
  5. tm.plugin.sentiment -archived on CRAN; -I couldn't get it working easily

I show time tests on sample data for the first 4 choices from above in the code below.

Install packages and make timing functions

I use pacman because it allows the reader to just run the code; though you can replace with install.packages & library calls.

if (!require("pacman")) install.packages("pacman")
pacman::p_load(qdap, syuzhet, dplyr)
pacman::p_load_current_gh(c("trinker/stansent", "trinker/sentimentr"))

pres_debates2012 #nrow = 2912

tic <- function (pos = 1, envir = as.environment(pos)){
    assign(".tic", Sys.time(), pos = pos, envir = envir)
    Sys.time()
}

toc <- function (pos = 1, envir = as.environment(pos)) {
    difftime(Sys.time(), get(".tic", , pos = pos, envir = envir))
}

id <- 1:2912

Timings

## qdap
tic()
qdap_sent <- pres_debates2012 %>%
    with(qdap::polarity(dialogue, id))
toc() # Time difference of 18.14443 secs


## sentimentr
tic()
sentimentr_sent <- pres_debates2012 %>%
    with(sentiment(dialogue, id))
toc() # Time difference of 1.705685 secs


## syuzhet
tic()
syuzhet_sent <- pres_debates2012 %>%
    with(get_sentiment(dialogue, method="bing"))
toc() # Time difference of 1.183647 secs


## stanford
tic()
stanford_sent <- pres_debates2012 %>%
    with(sentiment_stanford(dialogue))
toc() # Time difference of 6.724482 mins

For more on timings and accuracy see my sentimentr README.md and please star the repo if it's useful. The viz below captures one of the tests from the README:

enter image description here

Dripdry answered 12/4, 2016 at 14:38 Comment(6)
I agree with Tyler, I have had good results with syuzhet on very large sets of snippet/review sized text chunks.Doubtless
Thanks @Tyler Rinker for the detailed analysis of various sentiment packages. Just now checked out the sentimentr package created by you. This is exactly what I needed. But I need the list of positive words and negative words from each sentence. I was getting this while using qdap where as sentimentr returns only the word count. Is it possible to get them while using sentimentr?Earmark
@Earmark no it didn't make sense to me any more. Can I ask what you'd want to do with them?Dripdry
@Tyler Rinker apologies for the delay in the response. I was using those words (positive and negative words separately) to form positive and negative word clouds.Earmark
@Earmark The extract_sentiment_terms function in sentimentr will give you the terms but is run as a separate function. This keeps the main sentiment function speedy.Dripdry
Unfortunately, i am unable to use stansent (stanford port) part as it is either not supported by the R 3.4.1 otherwise i am missing something. I hope someone could write some detailed description about using this from scratch on windows.Cowper

© 2022 - 2024 — McMap. All rights reserved.