I am researching how to read in data from a server directly to a data frame in R. In the past I have written SQL queries that were over 50 lines long (with all the selects and joins). Any advice on how to write long queries in R? Is there some way to write the query elsewhere in R, then paste it in to the "sqlQuery" part of the code?
Keep long SQL queries in .sql files and read them in using readLines + paste with collapse='\n'
my_query <- paste(readLines('your_query.sql'), collapse='\n')
results <- sqlQuery(con, my_query)
You can paste any SQL query into R as is and then simply replace the newlines + spaces with a single space. For instance:
## Connect ot DB
library(RODBC)
con <- odbcConnect('MY_DB')
## Paste the query as is (you can have how many spaces and lines you want)
query <-
"
SELECT [Serial Number]
,[Series]
,[Customer Name]
,[Zip_Code]
FROM [dbo].[some_db]
where [Is current] = 'Yes' and
[Serial Number] LIKE '5%' and
[Series] = '20'
order by [Serial Number]
"
## Simply replace the new lines + spaces with a space and you good to go
res <- sqlQuery(con, gsub("\\n\\s+", " ", query))
close(con)
I had this issue trying to run a 17 line SQL query through RODBC and tried @arvi1000's solution but no matter what I did it would produce an error message and not execute more than one line of the .sql file. Tried variations of the value for collapse
and different ways for reading in the file. Spent 90 minutes trying to get it to work.. Suspect RODBC might behave differently with multi-line queries on different platforms or with different versions of MySQL or ODBC settings.
Anyway, the following loop arrangement may not be as elegant but it works and is possibly more robust:
channel <- odbcConnect("mysql_odbc", uid="username", pwd="password")
sqlString<-readLines("your_query.sql")
for (i in 1:length(sqlString)) {
print(noquote(sqlString[i]))
sqlQuery(channel, as.name(sqlString[i]))
}
In my script, all except the last lines were doing joins, creating temporary tables etc, only the last line had a SELECT statment and produced an output. .sql file was tidy with only one query per line, no comments or newline characters within the query. It seems that this loop runs all the code, but the output is possibly lost in the scope somewhere, so the one SELECT statement needs to be repeated outside the loop.
Approach with separate .sql (most sql or nosql engines) files can be trouble if one prefer to edit code in one file.
As far as someone using RStudio (or other tool where code folding can be customized), simplifying can be done using parenthesis. I prefer using {...}
and fold the code.
For example:
query <- {'
SELECT
user_id,
format(date,"%Y-%m") month,
product_group,
product,
sum(amount_net) income,
count(*) number
FROM People
WHERE
date > "2015-01-01" and
country = "Canada"
GROUP BY 1,2,3;'})
Folding a query can be even done within function (folding long argument), or in other situations where our code extends to inconvenient sizes.
© 2022 - 2024 — McMap. All rights reserved.