Using read.csv.sql to select multiple values from a single column

Asked 11/11, 2014 at 9:50 Answered 11/11, 2014 at 15:14

I am using read.csv.sql from the package sqldf to try and read in a subset of rows, where the subset selects from multiple values - these values are stored in another vector.

I have hacked a way to a form that works but I would like to see the correct way to pass the sql statement.

Code below gives minimum example.

library(sqldf)

# some data
write.csv(mtcars, "mtcars.csv", quote = FALSE, row.names = FALSE)

# values to select from variable 'carb'
cc <- c(1, 2)

# This only selects last value from 'cc' vector
read.csv.sql("mtcars.csv", sql = paste("select * from file where carb = ", cc ))

# So try using the 'in' operator - this works
read.csv.sql("mtcars.csv", sql = "select * from file where carb in (1,2)" ) 

# but this doesn't
read.csv.sql("mtcars.csv", sql = paste("select * from file where carb in ", cc ))

# Finally this works
read.csv.sql("mtcars.csv", sql = paste("select * from file where carb in ", 
                                       paste("(", paste(cc, collapse=",") ,")")))

The final line above works, but is there are cleaner way to pass this statement, thanks.

Bollen answered 11/11, 2014 at 9:50 Comment(1)

Can't think of a cleaner way. That's what I routinely do when I face this problem. – Parse 11/11, 2014 at 10:37

1) fn$ Substitution can be done with fn$ of gsubfn (which is automatically pulled in by sqldf). See the fn$ examples on the sqldf home page. In this case we have:

fn$read.csv.sql("mtcars.csv", 
  sql = "select * from file where carb in ( `toString(cc)` )")

2) join Another approach would be to create a data.frame of the carb values desired and perform a join with it:

Carbs <- data.frame(carb = cc)
read.csv.sql("mtcars.csv", sql = "select * from Carbs join file using (carb)")

Brumaire answered 11/11, 2014 at 15:14 Comment(2)

That's great, thanks Gabor; I had gone through your home page but did not pick up on using the fn$ method - ill spend a bit of time to try an understand it. Your second method is more the sql magic i was expecting. Does this still result in only the selected rows being read in by R (I am working with a quite large dataset)? – Bollen 11/11, 2014 at 18:53

Yes, mtcars.csv and Carbs are read into sqlite, the join is done there and only the result is read back into R. – Brumaire 11/11, 2014 at 19:7

You could use deparse, but I'm not sure it's much cleaner than what you already have:

read.csv.sql("mtcars.csv",
             sql = paste("select * from file where carb in ", gsub("c","",deparse(cc)) ))

And note that this is not really a general solution, because deparse will not always give you the right character string. It just happens to work in this instance.

Angellaangelle answered 11/11, 2014 at 10:0 Comment(4)

Thanks for your answer Thomas. It is a bit cleaner although along the same lines. Your final sentence suggests that the paste method is safer? (although expecting another alternative / better method than paste for this) – Bollen 11/11, 2014 at 10:7

@Bollen Yes, this isn't going to generalize because deparse represents different vectors in different ways. It just happens to be that for the vector 1:2, it represents it in the way that you need. – Angellaangelle 11/11, 2014 at 10:10

I would never use deparse. See for instance the difference between deparse(1:2) and deparse(c(1,2)), even if the the argument is basically the same. Also, deparse can return a character vector longer than one (for example deparse(c(1:100,101,102:200))). – Parse 11/11, 2014 at 10:35

@nicola; thank you for the explanation - it shows some of the problems that i think Thomas was hinting at, but as I am not familiar with deparse, i appreciate the example. – Bollen 11/11, 2014 at 18:44

Recommended topics

Hot tags