Add a dynamic value into RMySQL getQuery [duplicate]
Asked Answered
H

2

5

Is it possible to pass a value into the query in dbGetQuery from the RMySQL package.

For example, if I have a set of values in a character vector:

df <- c('a','b','c')

And I want to loop through the values to pull out a specific value from a database for each.

library(RMySQL)    
res <- dbGetQuery(con, "SELECT max(ID) FROM table WHERE columna='df[2]'")

When I try to add the reference to the value I get an error. Wondering if it is possible to add a value from an R object in the query.

Hass answered 27/11, 2010 at 8:3 Comment(3)
Duplicate of RSQLite query with user specified variable in the WHERE fieldHilversum
Joshua, thanks for pointing that out. that is a great answer also. I had been searching for RMySQL solutions.Hass
the solution is the same whether you're using RMySQL or RSQLlite.Hilversum
T
4

One option is to manipulate the SQL string within the loop. At the moment you have a string literal, the 'df[2]' is not interpreted by R as anything other than characters. There are going to be some ambiguities in my answer, because df in your Q is patently not a data frame (it is a character vector!). Something like this will do what you want.

Store the output in a numeric vector:

require(RMySQL)
df <- c('a','b','c')
out <- numeric(length(df))
names(out) <- df

Now we can loop over the elements of df to execute your query three times. We can set the loop up two ways: i) with i as a number which we use to reference the elements of df and out, or ii) with i as each element of df in turn (i.e. a, then b, ...). I will show both versions below.

## Version i
for(i in seq_along(df)) {
    SQL <- paste("SELECT max(ID) FROM table WHERE columna='", df[i], "';", sep = "")
    out[i] <- dbGetQuery(con, SQL)
    dbDisconnect(con)
}

OR:

## Version ii
for(i in df) {
    SQL <- paste("SELECT max(ID) FROM table WHERE columna='", i, "';", sep = "")
    out[i] <- dbGetQuery(con, SQL)
    dbDisconnect(con)
}

Which you use will depend on personal taste. The second (ii) version requires you to set names on the output vector out that are the same as the data inside out.

Having said all that, assuming your actual SQL Query is similar to the one you post, can't you do this in a single SQL statement, using the GROUP BY clause, to group the data before computing max(ID)? Doing simple things in the data base like this will likely be much quicker. Unfortunately, I don't have a MySQL instance around to play with and my SQL-fu is weak currently, so I can't given an example of this.

Teak answered 27/11, 2010 at 8:59 Comment(7)
@Gavin, off the top of my head, I think the query you mention in your last paragraph would be: SELECT max(ID), columna FROM table WHERE columna in ('a','b','c') GROUP BY columnaHilversum
Remembering of course to make sure you don't have anything that can break SQL syntax in your variable. Obligatory XKCD reference: xkcd.com/327Savill
@Spacedman, Little Bobby Tables... classic!Hilversum
Thank you for the detail in your post. It is very helpful and, once again, I learned quite a bit from it. Regarding the need for a GROUP BY clause in my SQL Statement, because this ID is a numeric auto-incrementer and primary key there can only be distinct values in that column so I don't need a GROUP BY.Hass
@analyticsPierce: I think you miss the point of the GROUP BY. What I was thinking, and Joshua Ulrich put into SQL for us, is that you could return a data frame with the 3 max(ID) values in one column and the columna (a,b,c) in the second column, from a single SQL statement. So no need to do the three SQL statements in a loop. Do the one statement, get the DB to do the work, and then pull out the values you want on the R side. That is far less effort, typing, and compute time.Teak
ok, I understand your example now. My question is simplified from the full problem I am trying to solve. Here is a summary of the bigger issue: I have a db table with a set of keywords, each keyword has a set of rows associated to it. Each of these rows has a numeric ID. I am trying to query an API (has to be done one keyword at a time) and return a data.frame of results. Then I need to insert the rows into the db table where the ID is > than IDs already in the table. I understand your point but struggling with the right structure in R and I am not able to add a string into dbGetQuery.Hass
I did get the paste function figured out so I could create the queries properly. that's handy stuff.Hass
S
3

You could also use the sprintf command to solve the issue (it's what I use when building Shiny Apps).

df <- c('a','b','c')

res <- dbGetQuery(con, sprintf("SELECT max(ID) FROM table WHERE columna='%s'"),df())

Something along those lines should work.

Sabbatarian answered 12/12, 2013 at 22:40 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.