How to read the contents of an .sql file into an R script to run a query?
Asked Answered
E

7

47

I have tried the readLines and the read.csv functions but then don't work.

Here is the contents of the my_script.sql file:

SELECT EmployeeID, FirstName, LastName, HireDate, City FROM Employees
WHERE HireDate >= '1-july-1993'

and it is saved on my Desktop.

Now I want to run this query from my R script. Here is what I have:

conn = connectDb()

fileName <- "C:\\Users\\me\\Desktop\\my_script.sql"
query <- readChar(fileName, file.info(fileName)$size)

query <- gsub("\r", " ", query)
query <- gsub("\n", " ", query)
query <- gsub("", " ", query)

recordSet <- dbSendQuery(conn, query)
rate <- fetch(recordSet, n = -1)

print(rate)
disconnectDb(conn)

And I am not getting anything back in this case. What can I try?

Esculent answered 30/6, 2017 at 19:7 Comment(8)
If you manually do dbSendQuery(conn, "SELECT Emp..."), are any rows returned there? Are you confident that this is a problem with the read-from-file part and not a data-is-not-present problem?Haldi
Yes, if I do that then data is returned. I am confident that I just don't know how to get it to read from a .sql file.Esculent
I've tested this process on both SQL Server and Postgres with no problems (I don't have mysql available). I've also used paste(readLines(...), collapse = " ") with the same success. Since a .sql file is merely a text file with a different file extension, there's nothing cosmic. Is it something to do with your use of query versus query2? You don't show how you got to that second variable.Haldi
I removed the query2 as it was a mistake, I changed it to just query. Can you suggest a way to make the file be read as UTF-8? I am getting some non UTF-8 symbols.Esculent
Have you tried readLines(..., encoding="UTF-8")? You may also want to look at iconv().Haldi
I just tried that and got the message: Error in postgresqlExecStatement(conn, statement, ...) :   RS-DBI driver: (could not Retrieve the result : ERROR:  syntax error at or near "" LINE 1:  UTF-8 --get priority in stack - APPS UTF-8  UTF-...Esculent
(postgresqlExecStatement? why is this tagged with mysql?) I suggest you redirect your research (and rephrase your question to be) on different encodings within a query, since this problem appears likely to not be able "reading from a file".Haldi
My mistake, thank youEsculent
I
58

I've had trouble with reading sql files myself, and have found that often times the syntax gets broken if there are any single line comments in the sql. Since in R you store the sql statement as a single line string, if there are any double dashes in the sql it will essentially comment out any code after the double dash.

This is a function that I typically use whenever I am reading in a .sql file to be used in R.

getSQL <- function(filepath){
  con = file(filepath, "r")
  sql.string <- ""

  while (TRUE){
    line <- readLines(con, n = 1)

    if ( length(line) == 0 ){
      break
    }

    line <- gsub("\\t", " ", line)

    if(grepl("--",line) == TRUE){
      line <- paste(sub("--","/*",line),"*/")
    }

    sql.string <- paste(sql.string, line)
  }

  close(con)
  return(sql.string)
}
Inexpensive answered 3/7, 2017 at 12:56 Comment(7)
Thank you. This function works for me. I just want to ask a small caveat. Sometimes we need to specify the encoding we use to read a file, for example: con = file(filepath, "r", encoding = "UTF-16LE")Triptolemus
This is great, thanks. Another point, perhaps obvious (but tripped me up!): the Regex replacing line comments '--' with '/*...*/' will replace ALL occurrences of '--' in the query, regardless of whether it prefaces a comment or not. I had a query which contained multiple '--' in within quotes which should not have been changed so beware this kind of thing. I've (lazily) just removed this regex replace section from the function, and then will make sure I take out SQL comments from the file before importing.Mcalpin
Works perfect - I just repalced "line <- paste(sub("--","/*",line),"*/")" with "next" and added an else case where I inserted "sql.string <- paste(sql.string, line)" as the comments somebody puts in do not matter for me when query the db in question.Cauvery
Wouldn't it be useful to have function for getSQL which you don't need to define everytime? So that e.g. you only have to type data <- getSQL('path/data.sql') in a new script without wirting the function above?Isallobar
I agree. One option is to create your own custom package, which you can load like any other library. I do that for any small utility functions like this that I tend to use a lot.Inexpensive
this is great. worked perfectly for my needsDrawknife
Note that the SQL file must end with a line break with this functionUnderslung
R
46

I've found for queries with multiple lines, the read_file() function from the readr package works well. The only thing you have to be mindful of is to avoid single quotes (double quotes are fine). You can even add comments this way.

Example query, saved as query.sql

SELECT 
COUNT(1) as "my_count"
-- comment goes here
FROM -- tabs work too
  my_table

I can then store the results in a data frame with

df <- dbGetQuery(con, statement = read_file('query.sql'))
Rummel answered 7/9, 2017 at 21:19 Comment(3)
thanks you . does this hold true for a postgres query as well?Pliant
'statement' in selecting a method for function 'dbGetQuery': Error: could not find function "read_file" >Pliant
@kRazzyR the answer says read_file() function from the readr package. You will need to install that package and load it with library(readr) to use it.Slavocracy
R
16

You can use the read_file() function from the readr package.

fileName = read_file("C:/Users/me/Desktop/my_script.sql")

You will get a string variable fileName with the desired text.

Note: Use / instead of \\\

Ritz answered 7/3, 2019 at 6:14 Comment(0)
C
6

The answer by Matt Jewett is quite useful, but I wanted to add that I sometimes encounter the following warning when trying to read .sql files generated by sql server using that answer:

Warning message: In readLines(con, n = 1) : line 1 appears to contain an embedded nul

The first line returned by readLines is often "ÿþ" in these cases (i.e. the UTF-16 byte order mark) and subsequent lines are not read properly. I solved this by opening the sql file in Microsoft SQL Server Management Studio and selecting

File -> Save As ...

then on the small downarrow next to the save button selecting

Save with Encoding ...

and choosing

Unicode (UTF-8 without signature) - Codepage 65001

from the Encoding dropdown menu.

If you do not have Microsoft SQL Server Management Studio and are using a Windows machine, you could also try opening the file with the default text editor and then selecting

File -> Save As ...

Encoding: UTF-8

to save with a .txt file extension.

Interestingly changing the file within Microsoft SQL Server Management Studio removes the BOM (byte order mark) altogether, whereas changing the file within the text editor converts the BOM to the UTF-8 BOM but nevertheless causes the query to be properly read using the referenced answer.

Coerce answered 14/11, 2018 at 20:9 Comment(0)
H
1

The combination of readr and textclean works well without having to create any new functions. read_file() reads the file into a character vector and replace_white() ensures all escape sequence characters are removed from your .sql file. Note: Does cause problems if you have comments in your SQL string !!

library(readr)
library(textclean)

SQL <- replace_white(read_file("file_path")))
Hyperboloid answered 31/5, 2022 at 14:34 Comment(0)
S
0

Late to the game but I've been playing around this today and written a slightly different approach to gathering SQL code to run within DBI::dbGetQuery() function:

get_sql_code_from_file <- function(file_path){

   # Get contents of file
   file_lines <- readLines(file_path)

   # Remove single line comments
   file_lines <- file_lines[grepl(pattern = "^--", file_lines) == FALSE]

   # Note indices of lines with multi-line comments start/ends
   multi_line_start_indices <- grep(pattern = "^/\\*", file_lines)
   multi_line_end_indices <- grep(pattern = "\\*/", file_lines)

   # Check if any multi-line comments found
   if (length(multi_line_end_indices) != 0) {

      # Note lines defining and within multi-line comments
      multi_line_indices <- sapply(
         seq_along(multi_line_start_indices),
         FUN = function(index, multi_line_start_indices, multi_line_end_indices){
            return(multi_line_start_indices[index]:multi_line_end_indices[index])
         },
         multi_line_start_indices, multi_line_end_indices
      )

      # Remove multi-line_comments
      file_lines <- file_lines[-multi_line_indices]
   }

   # Combine into single string
   file_string <- paste(file_lines, collapse = "\n")

   return(file_string)
}

This approach differs from the accepted answer as it removes both single and multi-line comments from the file.

Secco answered 2/8, 2023 at 12:8 Comment(0)
O
0

This answer is a bit late for the OP but may be helpful for future readers.

The sqlhelper package provides functions for for reading, preparing and executing files of sql, similar to the discussion in comments of the accepted answer.

Examples might look like:

# Use this if you need your queries as strings
my_sql_queries <- sqlhelper::read_sql("file.SQL")

# Use this if you want to execute the queries
results <- sqlhelper::run_files(c("file1.SQL", "file2.SQL"))

results[[1]] # result of first query in file1.SQL

The read_sql() function handles single and multi-line comments, including comment delimiters within quotes, so SELECT * FROM tab WHERE var = '--oops' will be executed as is.

(disclosure: I wrote sqlhelper)

Onym answered 9/2 at 13:44 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.