I have had a project in mind where I would download all the tweets sent to celebrities for the last one year and do a sentiment analysis on them and evaluate who had the most positive fans.
Then I discovered that you can at max retrieve twitter mentions for the last 7 days using tweepy/twitter API. I scavenged the net but couldn't find any ways to download tweets for the last one year.
Anyways, I decided to do the project on last 7 days data only and wrote the following code:
try:
while 1:
for results in tweepy.Cursor(twitter_api.search, q="@celebrity_handle").items(9999999):
item = (results.text).encode('utf-8').strip()
wr.writerow([item, results.created_at]) # write to a csv (tweet, date)
I am using the Cursor
search api because the other way to get mentions (the more accurate one) has a limitation of retrieving the last 800 tweets only.
Anyways, after running the code overnight, I was able to download only 32K tweets. Around 90% of them were Retweets.
Is there a better more efficient way to get mentions data?
Do keep in mind, that:
- I want to do this for multiple celebrities. (Famous ones with millions of followers).
- I don't care about retweets.
- They have thousands to tweets sent out to them per day.
Any suggestions would be welcome but at the current moment, I am out of ideas.