Getting incorrect Score using SentiWordNet
Asked Answered
L

1

3

I'm doing some sentiment analysis using SentiWordNet and I referred to the post here How to use SentiWordNet . However, I'm getting a score of 0.0 despite trying out various inputs. Is there anything I'm doing wrong here? Thanks!

    import java.io.BufferedReader;
    import java.io.File;
    import java.io.FileReader;
    import java.util.HashMap;
    import java.util.Iterator;
    import java.util.Set;
    import java.util.Vector;

    public class SWN3 {
        private String pathToSWN = "C:\\Users\\Malcolm\\Desktop\\SentiWordNet_3.0.0\\home\\swn\\www\\admin\\dump\\SentiWordNet_3.0.0.txt";
        private HashMap<String, Double> _dict;

        public SWN3(){

            _dict = new HashMap<String, Double>();
            HashMap<String, Vector<Double>> _temp = new HashMap<String, Vector<Double>>();
            try{
                BufferedReader csv =  new BufferedReader(new FileReader(pathToSWN));
                String line = "";           
                while((line = csv.readLine()) != null)
                {
                    String[] data = line.split("\t");
                    Double score = Double.parseDouble(data[2])-Double.parseDouble(data[3]);
                    String[] words = data[4].split(" ");
                    for(String w:words)
                    {
                        String[] w_n = w.split("#");
                        w_n[0] += "#"+data[0];
                        int index = Integer.parseInt(w_n[1])-1;
                        if(_temp.containsKey(w_n[0]))
                        {
                            Vector<Double> v = _temp.get(w_n[0]);
                            if(index>v.size())
                                for(int i = v.size();i<index; i++)
                                    v.add(0.0);
                            v.add(index, score);
                            _temp.put(w_n[0], v);
                        }
                        else
                        {
                            Vector<Double> v = new Vector<Double>();
                            for(int i = 0;i<index; i++)
                                v.add(0.0);
                            v.add(index, score);
                            _temp.put(w_n[0], v);
                        }
                    }
                }
                Set<String> temp = _temp.keySet();
                for (Iterator<String> iterator = temp.iterator(); iterator.hasNext();) {
                    String word = (String) iterator.next();
                    Vector<Double> v = _temp.get(word);
                    double score = 0.0;
                    double sum = 0.0;
                    for(int i = 0; i < v.size(); i++)
                        score += ((double)1/(double)(i+1))*v.get(i);
                    for(int i = 1; i<=v.size(); i++)
                        sum += (double)1/(double)i;
                    score /= sum;
                    String sent = "";               
                    if(score>=0.75)
                        sent = "strong_positive";
                    else
                    if(score > 0.25 && score<=0.5)
                        sent = "positive";
                    else
                    if(score > 0 && score>=0.25)
                        sent = "weak_positive";
                    else
                    if(score < 0 && score>=-0.25)
                        sent = "weak_negative";
                    else
                    if(score < -0.25 && score>=-0.5)
                        sent = "negative";
                    else
                    if(score<=-0.75)
                        sent = "strong_negative";
                    _dict.put(word, score);
                }
            }
            catch(Exception e){e.printStackTrace();}        
        }

public Double extract(String word)
{
    Double total = new Double(0);
    if(_dict.get(word+"#n") != null)
         total = _dict.get(word+"#n") + total;
    if(_dict.get(word+"#a") != null)
        total = _dict.get(word+"#a") + total;
    if(_dict.get(word+"#r") != null)
        total = _dict.get(word+"#r") + total;
    if(_dict.get(word+"#v") != null)
        total = _dict.get(word+"#v") + total;
    return total;
}

public static void main(String[] args) {
    SWN3 test = new SWN3();
    String sentence="Hello have a Super awesome great day";
    String[] words = sentence.split("\\s+"); 
    double totalScore = 0;
    for(String word : words) {
        word = word.replaceAll("([^a-zA-Z\\s])", "");
        if (test.extract(word) == null)
            continue;
        totalScore += test.extract(word);
    }
    System.out.println(totalScore);
}

}

Here's the first 10 lines of SentiWordNet.txt

a   00001740    0.125   0   able#1  (usually followed by `to') having the necessary means or skill or know-how or authority to do something; "able to swim"; "she was able to program her computer"; "we were at last able to buy a car"; "able to get a grant for the project"
a   00002098    0   0.75    unable#1    (usually followed by `to') not having the necessary means or skill or know-how; "unable to get to town without a car"; "unable to obtain funds"
a   00002312    0   0   dorsal#2 abaxial#1  facing away from the axis of an organ or organism; "the abaxial surface of a leaf is the underside or side facing away from the stem"
a   00002527    0   0   ventral#2 adaxial#1 nearest to or facing toward the axis of an organ or organism; "the upper side of a leaf is known as the adaxial surface"
a   00002730    0   0   acroscopic#1    facing or on the side toward the apex
a   00002843    0   0   basiscopic#1    facing or on the side toward the base
a   00002956    0   0   abducting#1 abducent#1  especially of muscles; drawing away from the midline of the body or from an adjacent part
a   00003131    0   0   adductive#1 adducting#1 adducent#1  especially of muscles; bringing together or drawing toward the midline of the body or toward an adjacent part
a   00003356    0   0   nascent#1   being born or beginning; "the nascent chicks"; "a nascent insurgency"
a   00003553    0   0   emerging#2 emergent#2   coming into existence; "an emergent republic"
Labiodental answered 6/4, 2013 at 4:31 Comment(9)
I ran your program and I got 0.9087554089349004. Can you post the first 10 lines of your SentiWord.txt file?Congeal
Just posted the first 10 linesLabiodental
Did you remove all the comments from the top of the file?Congeal
Yes Sir, removed the top portion which contained the comments. The above is the top 10 lines of the SentiWordNet.3.0.0.txtLabiodental
You ran the same code I posted without editing it?Labiodental
The same exact code. I only changed the path to my SentiWord.txtCongeal
I ran the code above in eclipse and I got the following:Labiodental
java.lang.NumberFormatException: empty String 0.0 at sun.misc.FloatingDecimal.readJavaFormatString(Unknown Source) at java.lang.Double.parseDouble(Unknown Source) at SWN3.<init>(SWN3.java:23) at SWN3.main(SWN3.java:99)Labiodental
let us continue this discussion in chatCongeal
C
5

Usually the SentiWord.txt file comes with a weird format.

You need to remove the first part of it (which includes comments and instructions) and the last two lines:

#
EMPTY LINE

The parser doesn't know how to handle these situations, if you delete these extra two lines you'll be fine.

Congeal answered 6/4, 2013 at 7:14 Comment(2)
@MarounMaroun what is the meaning of #1, #2 in the sentiwordnet.txt file? Please help me too. for example: a 00002730 0 0 acroscopic#1 facing or on the side toward the apex .... Thanks in advance.Will
HI @Labiodental i got java.lang.ArrayIndexOutOfBoundsException: 2, what went wrong?Chronon

© 2022 - 2024 — McMap. All rights reserved.