sentiwordnet scoring with python
Asked Answered
T

4

6

I have been working on a research in relation with twitter sentiment analysis. I have a little knowledge on how to code on Python. Since my research is related with coding, I have done some research on how to analyze sentiment using Python, and the below is how far I have come to: 1.Tokenization of tweets 2. POS tagging of token and the remaining is calculating Positive and Negative of the sentiment which the issue i am facing now and need your help.

Below is my code example:

import nltk
sentence = "Iphone6 camera is awesome for low light "
token = nltk.word_tokenize(sentence)
tagged = nltk.pos_tag(token)

Therefore, I want to ask if anybody can help me to show/guide the example of using python to code about sentiwordnet to calculate the positive and negative score of the tweeets that has already been POS tagged. thank in advance

Tungstic answered 8/7, 2016 at 9:16 Comment(1)
Hi, I don't know how much this can be helpful therefore adding this as comment. Try this: nltk.org/howto/sentiment.html and nltk.org/api/nltk.sentiment.htmlCornice
C
24

It's a little unclear as to what exactly your question is. Do you need a guide to using Sentiwordnet? If so check out this link,

http://www.nltk.org/howto/sentiwordnet.html

Since you've already tokenized and POS tagged the words, all you need to do now is to use this syntax,

swn.senti_synset('breakdown.n.03')

Breaking down the argument,

  • 'breakdown' = word you need scores for.
  • 'n' = part of speech
  • '03' = Usage (01 for most common usage and a higher number would indicate lesser common usages)

So for each tuple in your tagged array, create a string as above and pass it to the senti_synset function to get the positive, negative and objective score for that word.

Caveat: The POS tagger gives you a different tag than the one senti_synset accepts. Use the following to convert to synset notation.

n - NOUN 
v - VERB 
a - ADJECTIVE 
s - ADJECTIVE SATELLITE 
r - ADVERB 

(Credits to Using Sentiwordnet 3.0 for the above notation)

That being said, it is generally not a great idea to use Sentiwordnet for Twitter sentiment analysis and here's why,

Tweets are filled with typos and non-dictionary words which Sentiwordnet often times does not recognize. To counter this problem, either lemmatize/stem your tweets before you pos tag them or use a Machine Learning classifier such as Naive Bayes for which NLTK has built in functions. As for the training dataset for the classifier, either manually annotate a dataset or use a pre-labelled set such as, as the Sentiment140 corpus.

If you are uninterested in actually performing the sentiment analysis but need a sentiment tag for a given tweet, you can always use the Sentiment140 API for this purpose.

Circus answered 8/7, 2016 at 9:40 Comment(2)
For some good tutorials on using a classifier for this purpose and for the Sentiment140 dataset, check out this link, link For stemming and Lemmatization, check out this link: #772418Circus
First of all, thank indeed to your detail answer, Saravana. I quite understand what you wrote there, yet I generally don't quite know to write code, and hence I was thinking to ask you a favor, if possible, that you can help write the code for me and also run the process after the POS tagging. It would be very much enlightening for me to make a further progress of my research. import nltk sentence = "Iphone6 camera is awesome for low light " token = nltk.word_tokenize(sentence) tagged = nltk.pos_tag(token)Tungstic
M
8

@Saravana Kumar has a wonderful answer.

To add detailed code to it i am writing this. I have referred link https://nlpforhackers.io/sentiment-analysis-intro/

from nltk.corpus import wordnet as wn
from nltk.corpus import sentiwordnet as swn
from nltk.stem import PorterStemmer

def penn_to_wn(tag):
    """
    Convert between the PennTreebank tags to simple Wordnet tags
    """
    if tag.startswith('J'):
        return wn.ADJ
    elif tag.startswith('N'):
        return wn.NOUN
    elif tag.startswith('R'):
        return wn.ADV
    elif tag.startswith('V'):
        return wn.VERB
    return None

from nltk.stem import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()

def get_sentiment(word,tag):
    """ returns list of pos neg and objective score. But returns empty list if not present in senti wordnet. """

    wn_tag = penn_to_wn(tag)
    if wn_tag not in (wn.NOUN, wn.ADJ, wn.ADV):
        return []

    lemma = lemmatizer.lemmatize(word, pos=wn_tag)
    if not lemma:
        return []

    synsets = wn.synsets(word, pos=wn_tag)
    if not synsets:
        return []

    # Take the first sense, the most common
    synset = synsets[0]
    swn_synset = swn.senti_synset(synset.name())

    return [swn_synset.pos_score(),swn_synset.neg_score(),swn_synset.obj_score()]


ps = PorterStemmer()
words_data = ['this','movie','is','wonderful']
# words_data = [ps.stem(x) for x in words_data] # if you want to further stem the word

pos_val = nltk.pos_tag(words_data)
senti_val = [get_sentiment(x,y) for (x,y) in pos_val]

print(f"pos_val is {pos_val}")
print(f"senti_val is {senti_val}")

Output

pos_val is [('this', 'DT'), ('movie', 'NN'), ('is', 'VBZ'), ('wonderful', 'JJ')]
senti_val is [[], [0.0, 0.0, 1.0], [], [0.75, 0.0, 0.25]]
Mammal answered 8/2, 2019 at 8:20 Comment(2)
if it use a big data in csv then I just need to read csv into words_data?Cuisine
hi, nice question. if you have good computing power then direct use is ok. If you face any issues I recommend using languages supporting bigdata like python on spark (pyspark) / scala , etcMammal
U
0

Here is my solution:

from nltk.corpus import sentiwordnet as swn
from nltk.corpus import wordnet
from nltk.tag import pos_tag
from nltk.stem import WordNetLemmatizer

def get_wordnet_pos(word):
    tag = nltk.pos_tag([word])[0][1][0].upper()
    tag_dict = {"J": wordnet.ADJ,
                "N": wordnet.NOUN,
                "V": wordnet.VERB,
                "R": wordnet.ADV}
    return tag_dict.get(tag, wordnet.NOUN)

def get_sentiment_score_of_review(sentence):
    # 1. Tokenize
    tokens = nltk.word_tokenize(sentence)

    lemmatizer = WordNetLemmatizer()

    sentiment_score = 0.0
    for word in tokens:
        tag = get_wordnet_pos(word)
        item_res = lemmatizer.lemmatize(word, tag)
        if not item_res:
            continue
        
        synsets = wn.synsets(item_res, pos=tag)
        if len(synsets) == 0:
            print("Nope!", word)
            continue
        
        # Take the first, the most common
        synset = synsets[0]
        swn_synset = swn.senti_synset(synset.name())
        sentiment_score += swn_synset.pos_score() - swn_synset.neg_score()
        
    return sentiment_score
Unasked answered 15/6, 2022 at 10:13 Comment(1)
Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.Kylstra
I
-3

For Positive and Negative sentiments, first you need to give training and have to train the model. for training model you can use SVM, thiers open library called LibSVM you can use it.

Insolence answered 8/7, 2016 at 9:42 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.