Writing Lengthy SQL queries in R
Asked Answered
K

4

7

I am researching how to read in data from a server directly to a data frame in R. In the past I have written SQL queries that were over 50 lines long (with all the selects and joins). Any advice on how to write long queries in R? Is there some way to write the query elsewhere in R, then paste it in to the "sqlQuery" part of the code?

Kabyle answered 7/4, 2015 at 17:42 Comment(2)
You could start here: https://mcmap.net/q/1093183/-parameterized-queries-with-rodbc/324364Lucknow
I'm new to this too-- I've found the sprintf function useful, especially for variable selectionBarracks
E
12

Keep long SQL queries in .sql files and read them in using readLines + paste with collapse='\n'

my_query <- paste(readLines('your_query.sql'), collapse='\n')
results <- sqlQuery(con, my_query)
Elvinelvina answered 7/4, 2015 at 18:3 Comment(0)
A
3

You can paste any SQL query into R as is and then simply replace the newlines + spaces with a single space. For instance:

## Connect ot DB
library(RODBC)
con <- odbcConnect('MY_DB')

## Paste the query as is (you can have how many spaces and lines you want)
query <- 
"
  SELECT [Serial Number]
        ,[Series]
        ,[Customer Name]
        ,[Zip_Code]

  FROM [dbo].[some_db]

  where [Is current] = 'Yes' and 
        [Serial Number] LIKE '5%' and
        [Series] = '20'

  order by [Serial Number]

"

## Simply replace the new lines + spaces with a space and you good to go
res <- sqlQuery(con, gsub("\\n\\s+", " ", query))
close(con)
Archaism answered 8/3, 2018 at 6:30 Comment(0)
R
1

I had this issue trying to run a 17 line SQL query through RODBC and tried @arvi1000's solution but no matter what I did it would produce an error message and not execute more than one line of the .sql file. Tried variations of the value for collapse and different ways for reading in the file. Spent 90 minutes trying to get it to work.. Suspect RODBC might behave differently with multi-line queries on different platforms or with different versions of MySQL or ODBC settings.

Anyway, the following loop arrangement may not be as elegant but it works and is possibly more robust:

channel <- odbcConnect("mysql_odbc", uid="username", pwd="password")
sqlString<-readLines("your_query.sql")           

for (i in 1:length(sqlString)) {
  print(noquote(sqlString[i]))
  sqlQuery(channel, as.name(sqlString[i]))
}

In my script, all except the last lines were doing joins, creating temporary tables etc, only the last line had a SELECT statment and produced an output. .sql file was tidy with only one query per line, no comments or newline characters within the query. It seems that this loop runs all the code, but the output is possibly lost in the scope somewhere, so the one SELECT statement needs to be repeated outside the loop.

Relax answered 17/10, 2016 at 13:34 Comment(0)
A
1

Approach with separate .sql (most sql or nosql engines) files can be trouble if one prefer to edit code in one file.
As far as someone using RStudio (or other tool where code folding can be customized), simplifying can be done using parenthesis. I prefer using {...} and fold the code.

For example:

query <- {'
 SELECT 
  user_id,
  format(date,"%Y-%m") month,
  product_group,
  product,
  sum(amount_net) income,
  count(*) number
FROM People 
WHERE 
  date > "2015-01-01" and
  country = "Canada"
  GROUP BY 1,2,3;'})

Fold 14-lines long query to one-line

Folding a query can be even done within function (folding long argument), or in other situations where our code extends to inconvenient sizes.

Folding whatever you like

Antonietta answered 18/9, 2017 at 8:12 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.