How to Collect Tweets More Quickly Using Twitter API in Python?
Asked Answered
B

3

5

For a research project, I am collecting tweets using Python-Twitter. However, when running our program nonstop on a single computer for a week we manage to collect about only 20 MB of data per week. I am only running this program on one machine so that we do not collect the same tweets twice.

Our program runs a loop that calls getPublicTimeline() every 60 seconds. I tried to improve this by calling getUserTimeline() on some of the users that appeared in the public timeline. However, this consistently got me banned from collecting tweets at all for about half an hour each time. Even without the ban, it seemed that there was very little speed-up by adding this code.

I know about Twitter's "whitelisting" that allows a user to submit more requests per hour. I applied for this about three weeks ago, and have not hear back since, so I am looking for alternatives that will allow our program to collect tweets more efficiently without going over the standard rate limit. Does anyone know of a faster way to collect public tweets from Twitter? We'd like to get about 100 MB per week.

Thanks.

Brassiere answered 22/11, 2010 at 20:2 Comment(1)
old post, but for people who stumble here: getPublicTimeline() isn't in the current Twitter API nor python-twitter code, but it probably gets the timeline of the user who has been authenticated - my guess is thats the reason for the small number of tweets. I don't think you can do an open query with the REST API, but with the streaming API you can (gives I think .1% of the firehose - plenty to get some gigs of data in a few days)Nautch
S
5

How about using the streaming API? This is exactly the use-case it was created to address. With the streaming API you will not have any problems gathering megabytes of tweets. You still won't be able to access all tweets or even a statistically significant sample without being granted access by Twitter though.

Swen answered 22/11, 2010 at 20:17 Comment(2)
The given link for streaming API (developer.twitter.com/pages/streaming_api) does not work.Demure
Updated. Still, the answer is over eight years old, so things may have changed drastically in the meantime.Swen
P
1

I did a similar project analyzing data from tweets. If you're just going at this from a pure data collection/analysis angle, you can just scrape any of the better sites that collect these tweets for various reasons. Many sites allow you to search by hashtag, so throw in a popular enough hashtag and you've got thousands of results. I just scraped a few of these sites for popular hashtags, collected these into a large list, queried that list against the site, and scraped all of the usable information from the results. Some sites also allow you to export the data directly, making this task even easier. You'll get a lot of garbage results that you'll probably need to filter (spam, foreign language, etc), but this was the quickest way that worked for our project. Twitter will probably not grant you whitelisted status, so I definitely wouldn't count on that.

Pretender answered 22/11, 2010 at 21:39 Comment(1)
Could you give some pointers to good examples of such websites?Barta
B
1

There is pretty good tutorial from ars technica on using streaming API n Python that might be helpful here.

Otherwise you could try doing it via cURL.

.

Barta answered 3/11, 2011 at 14:30 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.