I'm experimenting with deriving sentiment from Twitter using Stanford's CoreNLP library, a la https://www.openshift.com/blogs/day-20-stanford-corenlp-performing-sentiment-analysis-of-twitter-using-java - so see here for the code that I'm implementing.
I am getting results, but I've noticed that there appears to be a bias towards 'negative' results, both in my target dataset and another dataset I use with ground truth - the Sanders Analytics Twitter Sentiment Corpus http://www.sananalytics.com/lab/twitter-sentiment/ - even though the ground truth data do not have this bias.
I'm posting this question on the off chance that someone else has experienced this and/or may know if this is the result of something I've done or some bug in the CoreNLP code.
(edit - sorry it took me so long to respond) I am posting links to plots showing what I mean. I don't have enough reputation to post the images, and can only include two links in this post, so I'll add the links in the comments.