For a research project, I am collecting tweets using Python-Twitter. However, when running our program nonstop on a single computer for a week we manage to collect about only 20 MB of data per week. I am only running this program on one machine so that we do not collect the same tweets twice.
Our program runs a loop that calls getPublicTimeline() every 60 seconds. I tried to improve this by calling getUserTimeline() on some of the users that appeared in the public timeline. However, this consistently got me banned from collecting tweets at all for about half an hour each time. Even without the ban, it seemed that there was very little speed-up by adding this code.
I know about Twitter's "whitelisting" that allows a user to submit more requests per hour. I applied for this about three weeks ago, and have not hear back since, so I am looking for alternatives that will allow our program to collect tweets more efficiently without going over the standard rate limit. Does anyone know of a faster way to collect public tweets from Twitter? We'd like to get about 100 MB per week.
Thanks.