How to convert searchTwitter results (from library(twitteR)) into a data.frame?
Asked Answered
L

6

11

I am working on saving twitter search results into a database (SQL Server) and am getting an error when I pull the search results from twitteR.

If I execute:

library(twitteR)
puppy <- as.data.frame(searchTwitter("puppy", session=getCurlHandle(),num=100))

I get an error of:

Error in as.data.frame.default(x[[i]], optional = TRUE) : 
  cannot coerce class structure("status", package = "twitteR") into a data.frame

This is important because in order to use RODBC to add this to a table using sqlSave it needs to be a data.frame. At least that's the error message I got:

Error in sqlSave(localSQLServer, puppy, tablename = "puppy_staging",  : 
  should be a data frame

So does anyone have any suggestions on how to coerce the list to a data.frame or how I can load the list through RODBC?

My final goal is to have a table that mirrors the structure of values returned by searchTwitter. Here is an example of what I am trying to retrieve and load:

library(twitteR)
puppy <- searchTwitter("puppy", session=getCurlHandle(),num=2)
str(puppy)

List of 2
 $ :Formal class 'status' [package "twitteR"] with 10 slots
  .. ..@ text        : chr "beautifull and  kc reg Beagle Mix for rehomes: This little puppy is looking for a new loving family wh... http://bit.ly/9stN7V "| __truncated__
  .. ..@ favorited   : logi FALSE
  .. ..@ replyToSN   : chr(0) 
  .. ..@ created     : chr "Wed, 16 Jun 2010 19:04:03 +0000"
  .. ..@ truncated   : logi FALSE
  .. ..@ replyToSID  : num(0) 
  .. ..@ id          : num 1.63e+10
  .. ..@ replyToUID  : num(0) 
  .. ..@ statusSource: chr "&lt;a href=&quot;http://twitterfeed.com&quot; rel=&quot;nofollow&quot;&gt;twitterfeed&lt;/a&gt;"
  .. ..@ screenName  : chr "puppy_ads"
 $ :Formal class 'status' [package "twitteR"] with 10 slots
  .. ..@ text        : chr "the cutest puppy followed me on my walk, my grandma won't let me keep it. taking it to the pound sadface"
  .. ..@ favorited   : logi FALSE
  .. ..@ replyToSN   : chr(0) 
  .. ..@ created     : chr "Wed, 16 Jun 2010 19:04:01 +0000"
  .. ..@ truncated   : logi FALSE
  .. ..@ replyToSID  : num(0) 
  .. ..@ id          : num 1.63e+10
  .. ..@ replyToUID  : num(0) 
  .. ..@ statusSource: chr "&lt;a href=&quot;http://blackberry.com/twitter&quot; rel=&quot;nofollow&quot;&gt;Twitter for BlackBerry®&lt;/a&gt;"
  .. ..@ screenName  : chr "iamsweaters"

So I think the data.frame of puppy should have column names like:

- text
- favorited
- replytoSN
- created
- truncated
- replytoSID
- id
- replytoUID
- statusSource
- screenName
Lines answered 16/6, 2010 at 18:34 Comment(0)
S
3

Try this:

ldply(searchTwitter("#rstats", n=100), text)

twitteR returns an S4 class, so you need to either use one of its helper functions, or deal directly with its slots. You can see the slots by using unclass(), for instance:

unclass(searchTwitter("#rstats", n=100)[[1]])

These slots can be accessed directly as I do above by using the related functions (from the twitteR help: ?statusSource):

 text Returns the text of the status
 favorited Returns the favorited information for the status
 replyToSN Returns the replyToSN slot for this status
 created Retrieves the creation time of this status
 truncated Returns the truncated information for this status
 replyToSID Returns the replyToSID slot for this status
 id Returns the id of this status
 replyToUID Returns the replyToUID slot for this status
 statusSource Returns the status source for this status

As I mentioned, it's my understanding that you will have to specify each of these fields yourself in the output. Here's an example using two of the fields:

> head(ldply(searchTwitter("#rstats", n=100), 
        function(x) data.frame(text=text(x), favorited=favorited(x))))
                                                                                                                                          text
1                                                     @statalgo how does that actually work? does it share mem between #rstats and postgresql?
2                                   @jaredlander Have you looked at PL/R? You can call #rstats from PostgreSQL: http://www.joeconway.com/plr/.
3   @CMastication I was hoping for a cool way to keep data in a DB and run the normal #rstats off that. Maybe a translator from R to SQL code.
4                     The distribution of online data usage: AT&amp;T has recently announced it will no longer http://goo.gl/fb/eTywd #rstat
5 @jaredlander not that I know of. Closest is sqldf package which allows #rstats and sqlite to share mem so transferring from DB to df is fast
6 @CMastication Can #rstats run on data in a DB?Not loading it in2 a dataframe or running SQL cmds but treating the DB as if it wr a dataframe
  favorited
1     FALSE
2     FALSE
3     FALSE
4     FALSE
5     FALSE
6     FALSE

You could turn this into a function if you intend on doing it frequently.

Sporangium answered 16/6, 2010 at 18:39 Comment(6)
Shane, what library do I need to load for that? Is it plyr?Lines
I see that it is plyr. It did convert the list to a data.frame. Now the 10 columns that get returned from searchTwitter are in a single column in the data.frame. How can I split them out?Lines
Can you update your question? I'm not sure what you want the final output to look like...Sporangium
I updated my question, thank you for your suggestions. I'm going through them trying to get this to get this in the right structure.Lines
@analyticsPierce: see my response.Sporangium
Alright! I am able to create the data.frame. Thank you for your help and patience with me. I'm a database marketing guy just getting into R. One follow up question on this, the above solution does not work for columns that may come back with no value like replyToSN. Is there something I can add so it can be assigned? If I run: head(ldply(searchTwitter("puppy", n=100), function(x) data.frame(text=text(x), favorited=favorited(x), replyToSN=replyToSN(x)))) I get: Error in data.frame : arguments imply differing number of rows: 1, 0 Let me know if this should be a new question.Lines
U
18

I use this code I found from http://blog.ouseful.info/2011/11/09/getting-started-with-twitter-analysis-in-r/ a while ago:

#get data
tws<-searchTwitter('#keyword',n=10)

#make data frame
df <- do.call("rbind", lapply(tws, as.data.frame))

#write to csv file (or your RODBC code)
write.csv(df,file="twitterList.csv")
Uthrop answered 8/12, 2011 at 15:7 Comment(0)
B
7

I know this is an old question, but still, here is what I think is a ``modern'' version to solve this. Just use the function twListToDf

gvegayon <- getUser("gvegayon")
timeline <- userTimeline(gvegayon,n=400)
tl <- twListToDF(timeline)

Hope it helps

Bussey answered 16/6, 2010 at 18:34 Comment(4)
It seems to me that this solution only works if you search and work with the tweets of a specific user?Improvvisatore
Yes, this is not really a solution... at least not to this problem or the general use case of twitteRHsiuhsu
I disagree. The problem was basically how to get a data.frame from a status object from twitteR. If you have a list of it, which is the case of the original question, then you just apply the function to each object in the list. HTHBussey
Looks good to me. Works fine for multiple users: twListToDF(lapply(c('@handle1','@handle2'), getUser))Examination
S
3

Try this:

ldply(searchTwitter("#rstats", n=100), text)

twitteR returns an S4 class, so you need to either use one of its helper functions, or deal directly with its slots. You can see the slots by using unclass(), for instance:

unclass(searchTwitter("#rstats", n=100)[[1]])

These slots can be accessed directly as I do above by using the related functions (from the twitteR help: ?statusSource):

 text Returns the text of the status
 favorited Returns the favorited information for the status
 replyToSN Returns the replyToSN slot for this status
 created Retrieves the creation time of this status
 truncated Returns the truncated information for this status
 replyToSID Returns the replyToSID slot for this status
 id Returns the id of this status
 replyToUID Returns the replyToUID slot for this status
 statusSource Returns the status source for this status

As I mentioned, it's my understanding that you will have to specify each of these fields yourself in the output. Here's an example using two of the fields:

> head(ldply(searchTwitter("#rstats", n=100), 
        function(x) data.frame(text=text(x), favorited=favorited(x))))
                                                                                                                                          text
1                                                     @statalgo how does that actually work? does it share mem between #rstats and postgresql?
2                                   @jaredlander Have you looked at PL/R? You can call #rstats from PostgreSQL: http://www.joeconway.com/plr/.
3   @CMastication I was hoping for a cool way to keep data in a DB and run the normal #rstats off that. Maybe a translator from R to SQL code.
4                     The distribution of online data usage: AT&amp;T has recently announced it will no longer http://goo.gl/fb/eTywd #rstat
5 @jaredlander not that I know of. Closest is sqldf package which allows #rstats and sqlite to share mem so transferring from DB to df is fast
6 @CMastication Can #rstats run on data in a DB?Not loading it in2 a dataframe or running SQL cmds but treating the DB as if it wr a dataframe
  favorited
1     FALSE
2     FALSE
3     FALSE
4     FALSE
5     FALSE
6     FALSE

You could turn this into a function if you intend on doing it frequently.

Sporangium answered 16/6, 2010 at 18:39 Comment(6)
Shane, what library do I need to load for that? Is it plyr?Lines
I see that it is plyr. It did convert the list to a data.frame. Now the 10 columns that get returned from searchTwitter are in a single column in the data.frame. How can I split them out?Lines
Can you update your question? I'm not sure what you want the final output to look like...Sporangium
I updated my question, thank you for your suggestions. I'm going through them trying to get this to get this in the right structure.Lines
@analyticsPierce: see my response.Sporangium
Alright! I am able to create the data.frame. Thank you for your help and patience with me. I'm a database marketing guy just getting into R. One follow up question on this, the above solution does not work for columns that may come back with no value like replyToSN. Is there something I can add so it can be assigned? If I run: head(ldply(searchTwitter("puppy", n=100), function(x) data.frame(text=text(x), favorited=favorited(x), replyToSN=replyToSN(x)))) I get: Error in data.frame : arguments imply differing number of rows: 1, 0 Let me know if this should be a new question.Lines
M
1

For those that run into the same problem I did which was getting an error saying

Error in as.double(y) : cannot coerce type 'S4' to vector of type 'double' 

I simply changed the word text in

ldply(searchTwitter("#rstats", n=100), text) 

to statusText, like so:

ldply(searchTwitter("#rstats", n=100), statusText)

Just a friendly heads-up :P

Marx answered 4/12, 2012 at 4:38 Comment(0)
C
0

Here is a nice function to convert it into a DF.

TweetFrame<-function(searchTerm, maxTweets)
{
  tweetList<-searchTwitter(searchTerm,n=maxTweets)
  return(do.call("rbind",lapply(tweetList,as.data.frame)))
}

Use it as :

tweets <- TweetFrame(" ", n)
Champollion answered 15/10, 2016 at 6:8 Comment(0)
G
0

The twitteR package now includes a function twListToDF that will do this for you.

puppy_table <- twListToDF(puppy)
Galba answered 13/7, 2018 at 19:12 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.