Can I gracefully include formatted SQL strings in an R script?
Asked Answered
G

6

9

I'm working in an R script that uses a long SQL string, and I would like to keep the query relatively free of other markup so as to allow copying and pasting between editors and applications. I'd also like the ability to split the query across lines for better readability.

In the RODBC documentation, the paste function is used to build the query out of separate chunks, but I'd prefer something less kludgy and with fewer quotes and commas. Thanks for your help.

Gerry answered 27/10, 2009 at 13:14 Comment(0)
M
8

you can override the %+% operator to have better string concatination syntax:

'%+%' <- function(x,y) paste(x,y,sep="")

y<-"y1"
x<-"somethingorother"
query<-
'SELECT DISTINCT x AS ' %+% x %+%',\n'    %+%
'                y AS ' %+% y %+% '\n'    %+%
' FROM tbl
 WHERE id=%s
 AND num=%d'

cat(query,"\n")

yields:

> cat(query,"\n")
SELECT DISTINCT x AS somethingorother,
                y AS y1
 FROM tbl
 WHERE id=%s
 AND num=%d 
Muns answered 27/10, 2009 at 15:25 Comment(1)
I find shQuote helpful when x or y are strings.Mcclurg
N
13

If you're an old C programmer from way back, as I am, you might enjoy just using sprintf().

Borrowing Ian's example:

y<-"y1"
x<-"somethingorother"
query <- sprintf(
'SELECT DISTINCT x AS %s,
                 y AS %s,
 FROM tbl
 WHERE id=%%s
 AND num=%%d', x, y)

yields:

> cat(query,"\n")
SELECT DISTINCT x AS somethingorother,
                 y AS y1,
 FROM tbl
 WHERE id=%s
 AND num=%d 
Neonatal answered 27/10, 2009 at 18:21 Comment(1)
I also like using sprintf for interpolating within multi-line strings. It is ever-so-slightly reminiscent of how you bind variables to statements in Perl's DBI. And readable.Leopold
M
8

you can override the %+% operator to have better string concatination syntax:

'%+%' <- function(x,y) paste(x,y,sep="")

y<-"y1"
x<-"somethingorother"
query<-
'SELECT DISTINCT x AS ' %+% x %+%',\n'    %+%
'                y AS ' %+% y %+% '\n'    %+%
' FROM tbl
 WHERE id=%s
 AND num=%d'

cat(query,"\n")

yields:

> cat(query,"\n")
SELECT DISTINCT x AS somethingorother,
                y AS y1
 FROM tbl
 WHERE id=%s
 AND num=%d 
Muns answered 27/10, 2009 at 15:25 Comment(1)
I find shQuote helpful when x or y are strings.Mcclurg
B
8

A graceful way of "including" a long SQL query is to keep it in a separate .sql file. Preferably somewhere it can be syntax highlighted, a text file in RStudio will do the job. You can then in your main R script read the file into a string and populate it with variables using one of the many "named" sprintf-type solutions, such as infuser.

.sql

select *
from mytable
where id = {{a}} 
and somevar = {{b}}

.R

library(readr)
library(infuser)

query <- read_file("query.sql") %>%
         infuse(a = 1, b = 2) 
Branny answered 10/10, 2016 at 19:57 Comment(2)
A prepared statement is the safer way to do this.Backsword
A prepared statement is also far more efficient in many cases. If your DB has a caching optimizer, it can use the same execution plan for the same statement with various bound parameters, but if you substitute the parameters directly into the SQL, it will probably have to recompute the plan every time.Teak
T
4

I'd recommend just using a plain string, and not embedding variable values into it. Use placeholders instead.

sql <- "SELECT foo FROM bar
    WHERE col1 = ?
    AND col2 = ?
    ORDER BY yomama"

I'm not sure if the double-quote is the best way to embed multi-line strings in R code (is there something like here-docs?), but it does work, unlike in Java.

Is there some reason you don't want to send "\n" or "\t" to your database? They should be fine in the SQL.

Teak answered 29/10, 2009 at 16:52 Comment(4)
How can I bind values to these placeholders? I couldn't find anything in the documentation of RODBC.Thorin
Some drivers seem to support it, some don't. See https://mcmap.net/q/1015788/-bind-variables-in-r-dbi .Teak
RODBC does not support parameterized queries. RODBCext adds this. It sounds like DBI is in the process of adding them.Ladybird
Passing a query built as describe here works fine in RStudio . At the beginning Rstudio says that the string may be an error, since you split it in 4 lines, but when running it, if no empty new lines between line and line, it works like a charm. I use ODBC and to pass arguments to this SQL queries, I use DBI too. Here the tutorial to pass named arguments to the SQL query: dbi.r-dbi.org/reference/sqlinterpolateRuano
G
1

I've ended up simply hitting the sql string with sql <- gsub("\n","",sql) and sql <- gsub("\t","",sql) before running it. The string itself can be as long as it needs to be, but stays free of any concatenation markup.

Gerry answered 28/10, 2009 at 16:36 Comment(0)
E
0

You can now do this easily using glue, allowing you to write code like

glue_sql("
  SELECT {`var`}
  FROM {`tbl`}
  WHERE {`tbl`}.sepal_length > {num}
    AND {`tbl`}.species = {val}
  ", .con = con)
Entebbe answered 25/5, 2022 at 21:7 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.