Seed data for sentiment analysis [closed]
Asked Answered
U

4

10

I'm playing around with sentiment analysis, and I'm looking for some seed data. Is there a free dictionary around?

It can be really simple: 3 sets of texts/sentences, for "positive", "negative", "neutral". It doesn't have to be huge.

Eventually I'll probably generate my own seed data for my specific use case, but it would be great to have something to play with now while I'm building the thing.

Unifoliate answered 28/7, 2009 at 19:7 Comment(5)
I have The Bing Lui and Minqing Hu dataset (about 7000 reviews from about 9 products on amazon.com) I put them on an Excel Sheet with the combined average score of each one of them. I also added the score of 3 different Free sentiment analysis APIs from the web(ViralHeat, AlchemyAPI, repustate API) if you want that Excel Sheet I can give it to you.Heckelphone
cs.uic.edu/~liub/FBS/sentiment-analysis.html#lexiconMucin
@SherifMaherEaid: How you create your own dictionary from articles?Misprize
@Misprize probably he categorizes the words and phrases used in different reviews which can be good, bad or neutral.Primrosa
+1 Thanks for asking the question :)Primrosa
P
4

Bing Liu and Minqing Hu from UIC have a number of datasets:

Bo Pang from Cornell has some more.

Pb answered 28/7, 2009 at 21:53 Comment(1)
The Cornell data looks like it'll do the trick. Thanks!Unifoliate
I
3

If you're interested in sentiment dictionaries, many authors have presented work based on manually built lists, and other semi automated methods for obtaining lists of opinionated terms. One good approach is to derive it from the WordNet database, by extending a core of positive/negative words using relationships like synonyms etc.

A good example of a manually built list is the General Inquirer.

For a semi automated method that derives lists, check out SentiWordNet from Esuli and Sebastiani.

These I believe are generally available for research, but you may need to get in touch with the authors regarding the use of these resources for non-research purposes.

B.

Introductory answered 20/9, 2009 at 11:23 Comment(0)
G
1

You can use the AFINN word list here:

http://www2.imm.dtu.dk/pubdb/views/publication_details.php?id=6010

AFINN is a list of English words rated for valence with an integer between minus five (negative) and plus five (positive). The words have been manually labeled by Finn Årup Nielsen in 2009-2011. The file is tab-separated. There are two versions:

AFINN-111: Newest version with 2477 words and phrases.

AFINN-96: 1468 unique words and phrases on 1480 lines. Note that there are 1480 lines, as some words are listed twice. The word list in not entirely in alphabetic ordering.

Gilded answered 17/11, 2013 at 1:6 Comment(0)
B
1

I maintain a list of corpora and word lists for sentiment analysis (where my AFINN is one of them):

http://neuro.compute.dtu.dk/wiki/Sentiment_analysis#Corpora

http://neuro.compute.dtu.dk/wiki/Sentiment_analysis#Affective_word_lists

Bharat answered 21/7, 2014 at 19:44 Comment(1)
+1 Thanks for the AFINN database of words with scores.Primrosa

© 2022 - 2024 — McMap. All rights reserved.