How to extract tweet geocode in twitteR package in R
Asked Answered
M

3

11

Recently Edwin Chen posted a great map of the regional usage of soda vs pop vs coke created from geocoded tweets inolving those words in the context of drinking. http://blog.echen.me/2012/07/06/soda-vs-pop-with-twitter/

He mentions that he used the twitteR package created by Jeff Gentry in R. Sure enough, it is easy to gather tweets that use a given word and put them in a dataframe:

require(twitteR)
require(plyr)
cat.tweets<-searchTwitter("cats",n=1000)
tweets.df = ldply(cat.tweets, function(t) t$toDataFrame() ) 

the dataframe (tweets.df) will contain the user id, tweet text, etc. for each tweet, but does not appear to contain the geocode. Any idea on how to get it in R?

Mathildamathilde answered 26/7, 2012 at 17:35 Comment(6)
You need to provide a geocode for searchTwitter to use. See the library documentation ?searchTwitter.Nebulosity
I see that you can supply a geocode and radius into searchTwitter but that does not produce a geocode for each pulled tweet.Mathildamathilde
but you would have the geocode that you supplied, right? with a smaller radii might that give you what you need?Nebulosity
Good idea, i see what you mean. I could iterate through essentilly a grid of points across a given map. Thanks for the suggestion.Mathildamathilde
When you get it working you should answer your own question so others can see how you did it. I really like the post you linked to, but they didn't post any code. =(Nebulosity
I'll keep working on it and try to make a package, I'll certainly post the code as well.Mathildamathilde
S
4

Does geocode mean longitude and latitude coordinate? If yes, following commands works for me.

cat.tweets = searchTwitter("cats",n=1000)
tweets.df = do.call("rbind",lapply(cat.tweets,as.data.frame))

Source : LINK

Stoddart answered 1/11, 2014 at 14:48 Comment(0)
M
3

Ive been tinkering around with an R function, you enter in the search text, the number of search sites, and the radius around each site. For example twitterMap("#rstats",10,"10mi")here's the code:

twitterMap <- function(searchtext,locations,radius){
require(ggplot2)
require(maps)
require(twitteR)
#radius from randomly chosen location
radius=radius
lat<-runif(n=locations,min=24.446667, max=49.384472)
long<-runif(n=locations,min=-124.733056, max=-66.949778)
#generate data fram with random longitude, latitude and chosen radius
coordinates<-as.data.frame(cbind(lat,long,radius))
coordinates$lat<-lat
coordinates$long<-long
#create a string of the lat, long, and radius for entry into searchTwitter()
for(i in 1:length(coordinates$lat)){
coordinates$search.twitter.entry[i]<-toString(c(coordinates$lat[i],
coordinates$long[i],radius))
}
# take out spaces in the string
coordinates$search.twitter.entry<-gsub(" ","", coordinates$search.twitter.entry ,
fixed=TRUE)

#Search twitter at each location, check how many tweets and put into dataframe
for(i in 1:length(coordinates$lat)){
coordinates$number.of.tweets[i]<-
 length(searchTwitter(searchString=searchtext,n=1000,geocode=coordinates$search.twitter.entry[i]))
}
#making the US map
all_states <- map_data("state")
#plot all points on the map
p <- ggplot()
p <- p + geom_polygon( data=all_states, aes(x=long, y=lat, group = group),colour="grey",     fill=NA )

p<-p + geom_point( data=coordinates, aes(x=long, y=lat,color=number.of.tweets
                                     )) + scale_size(name="# of tweets")
p
}
# Example
searchTwitter("dolphin",15,"10mi")

example map

There are some big problems I've encountered that I'm not sure how to deal with. First, as written the code searches 15 different randomly generated locations, these locations are generated from a uniform distribution from the maximum longitude east in the US to the maximum west, and the latitude furthest north to the furthest south. This will include locations not in the united states, say just east of lake of the woods minnesota in Canada. I'd like a function that randomly checks to see if the generated location is in the US and discard it if it isn't. More importantly, I'd like to search thousands of locations, but twitter doesn't like that and gives me an 420 error enhance your calm. So perhaps it's best to search every few hours and slowly build a database and delete duplicate tweets. Finally, if one chooses a remotely popular topic, R gives an error like Error in function (type, msg, asError = TRUE) : transfer closed with 43756 bytes remaining to read. I'm a bit mystified at how to get around this problem.

Mathildamathilde answered 30/7, 2012 at 0:46 Comment(3)
please work on it... .and post when its figured out... even i need itDoggo
can you tell me how to extract the longitude and latitude from the tweets that are harversted from searchTwitter then may be you can use thisDoggo
I'm getting an error message: In doRppAPICall("search/tweets", n, params = params, retryOnRateLimit = retryOnRateLimit, : 15 tweets were requested but the API can only return 0Nephrotomy
H
2

Here is a toy example, given that you can extract only 100 tweets per call:

require(twitteR)
require(plyr)
URL = paste('http://search.twitter.com/search.atom? 
      q=','&geocode=39.724089,-104.820557,3mi','&rpp=100&page=', page, sep='') #Aurora,CO with radii of 3mi
XML = htmlTreeParse(URL, useInternal=TRUE)
entry = getNodeSet(XML, "//entry")
tweets = c()

for (i in 1:99){ 
    t = unlist(xpathApply(entry[[i]], "//title", xmlValue))
    tweets = c(tweets,t)
}

This solution might not be too elegant, but I was able to get tweets given particular geocode.

Haplosis answered 26/7, 2012 at 21:5 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.