TextBlob NaiveBayesAnalyzer extremely slow (compared to Pattern)
Asked Answered
C

3

11

I'm using TextBlob for python to do some sentiment analysis on tweets. The default analyzer in TextBlob is the PatternAnalyzer which works resonably well and is appreciably fast.

sent = TextBlob(tweet.decode('utf-8')).sentiment

I have now tried to switch to the NaiveBayesAnalyzer and found the runtime to be impractical for my needs. (Approaching 5 seconds per tweet.)

sent = TextBlob(tweet.decode('utf-8'), analyzer=NaiveBayesAnalyzer()).sentiment

I have used the scikit learn implementation of the Naive Bayes Classifier before and did not find it to be this slow, so I'm wondering if I'm using it right in this case.

I am assuming the analyzer is pretrained, at least the documentation states "Naive Bayes analyzer that is trained on a dataset of movie reviews." But then it also has a function train() which is described as "Train the Naive Bayes classifier on the movie review corpus." Does it internally train the analyzer before each run? I hope not.

Does anyone know of a way to speed this up?

Cirrhosis answered 20/10, 2015 at 16:23 Comment(0)
M
17

Yes, Textblob will train the analyzer before each run. You can use following code to avoid train the analyzer everytime.

from textblob import Blobber
from textblob.sentiments import NaiveBayesAnalyzer
tb = Blobber(analyzer=NaiveBayesAnalyzer())

print tb("sentence you want to test")
Mar answered 27/10, 2015 at 5:14 Comment(7)
This is great! The runtime for my testfile is down from over 5 hours to just under 7 seconds!Cirrhosis
@MattM. Which analyzer did you find to be better? Pattern library or NaiveBayes from nltk?Myocardiograph
@soham.m17 I did not compare them extensively, but I seem to remember that (at least for my purpose) they performed comparably.Cirrhosis
I am using PatternAnalyzer but it takes lot of time. Any way to speed it up ?Estep
@Alan, I am testing this exact thing out now with your code and it seems to be taking a while? How much roughly should I expect it to take?Chivalry
@Alan, I am testing this exact thing out now with your code and it seems to be taking a while? How much roughly should I expect it to take? It took me a while and all it outputted were a bunch of None's with no scores.Chivalry
Is there any other analyser except from NaiveBayesAnalyzer?Hartwig
H
0

Adding to Alan's very useful answer if you have table data in a dataframe and want to use textblob's NaiveBayesAnalyzer then this works. Just change out word_list for your relevant series of strings.

import textblob
import pandas as pd

tb = textblob.Blobber(analyzer=NaiveBayesAnalyzer())
for index, row in df.iterrows():
    sent = tb(row['word_list']).sentiment
    df.loc[index, 'classification'] = sent[0]
    df.loc[index, 'p_pos'] = sent[1]
    df.loc[index, 'p_neg'] = sent[2]

Above splits the tuple that sentiment returns into three separate series.

This works if the series is all strings but if it has mixed datatypes, as can be a problem in pandas with the object datatype then you might want to put a try/except block around it to catch exceptions.

On time it is doing 1000 rows in around 4.7 seconds in my tests.

Hope this is helpful.

Homophonous answered 22/1, 2021 at 18:11 Comment(0)
J
0

In Addition to the above solutions, I tried the above solutions and I faced errors, so if someone found this question, I solved the errors as well as tried using PatternAnalyzer the in below code:

from textblob import Blobber
from textblob.sentiments import NaiveBayesAnalyzer, PatternAnalyzer
import nltk
nltk.download('punkt')
nltk.download('movie_reviews')
tb = Blobber(analyzer=NaiveBayesAnalyzer())
tb1 = Blobber(analyzer=PatternAnalyzer())

print(tb("sentence you want to test").sentiment)
print(tb1("sentence you want to test").sentiment)
print(tb("I love the book").sentiment)
print(tb1("I love the book").sentiment)
Josefajosefina answered 2/11, 2022 at 14:24 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.