How to pass data.frame for UPDATE with R DBI
Asked Answered
M

3

16

With RODBC, there were functions like sqlUpdate(channel, dat, ...) that allowed you pass dat = data.frame(...) instead of having to construct your own SQL string.

However, with R's DBI, all I see are functions like dbSendQuery(conn, statement, ...) which only take a string statement and gives no opportunity to specify a data.frame directly.

So how to UPDATE using a data.frame with DBI?

Middleman answered 12/12, 2013 at 14:42 Comment(1)
Thanks, that works for an INSERT, but what how to perform an UPDATE?Middleman
A
20

Really late, my answer, but maybe still helpful...

There is no single function (I know) in the DBI/odbc package but you can replicate the update behavior using a prepared update statement (which should work faster than RODBC's sqlUpdate since it sends the parameter values as a batch to the SQL server:

library(DBI)
library(odbc)

con <- dbConnect(odbc::odbc(), driver="{SQL Server Native Client 11.0}", server="dbserver.domain.com\\default,1234", Trusted_Connection = "yes", database = "test")  # assumes Microsoft SQL Server

dbWriteTable(con, "iris", iris, row.names = TRUE)      # create and populate a table (adding the row names as a separate columns used as row ID)

update <- dbSendQuery(con, 'update iris set "Sepal.Length"=?, "Sepal.Width"=?, "Petal.Length"=?, "Petal.Width"=?, "Species"=? WHERE row_names=?')

# create a modified version of `iris`
iris2 <- iris
iris2$Sepal.Length <- 5
iris2$Petal.Width[2] <- 1
iris2$row_names <- rownames(iris)  # use the row names as unique row ID

dbBind(update, iris2)  # send the updated data

dbClearResult(update)  # release the prepared statement

# now read the modified data - you will see the updates did work
data1 <- dbReadTable(con, "iris")

dbDisconnect(con)

This works only if you have a primary key which I created in the above example by using the row names which are a unique number increased by one for each row...

For more information about the odbc package I have used in the DBI dbConnect statement see: https://github.com/rstats-db/odbc

Attested answered 26/4, 2017 at 19:6 Comment(7)
Thank you. is there a limit to the number of rows we are trying to update at once? I have created a long sql statement that performs a MERGE and if the statement is too long dbGetQuery returns 0 (0 rows affected)Heterocyclic
There is no simple answer because the DBI package just defines an interfaces and leaves the implementation to different DBI-compliant packages. In the end all limitations depend on the DBI-compliant driver package (e. g. odbc) and the database plus the binary DB driver used by the DBI driver package. Are you using the odbc package? Then the best way is to create a reproducible code example and database and open an issue (asking to fix or at least document the limitations) at github.com/r-dbi/odbc/issuesAttested
@Heterocyclic Out of similar problems you describe I have started a DBI compliance check "project" where I am running the DBItest unit tests against different DBI configurations to estimate the maturity, open issues and limitations of each DBI configuration. The status is still pre-alpha and I have not published or drawn any conclusion from the results without taking to the developers first but you can watch the progress on github.com/aryoda/R_DBI_compliance_reports (and of course try to run the tests against your own configuration)Attested
Thank you, very impressive work!. I will definitely look at it. From what I see so far using odbc and SQL Server, a large (>30k rows) MERGE specifying the source values fails and I had to send to a temporary table then dbExecute the merge between tables. I will check how to translates in your tests.Heterocyclic
If you post a new question here with an example to reproduce the error I will look into it (please add the link here then so that I can find your question)Attested
I get the error Error: Failed to prepare query: ERROR: syntax error at or near "=?" LINE 1: update iris set "Sepal.Length"=?, "Sepal.Width"=?, "Petal.Le... What could be causing this?Erosive
@Erosive Please open a new question (incl. a minimal reproducible example) at SO to separate the issues. You can add a link to this answer of course. THX :-)Attested
L
5

Building on R Yoda's answer, I made myself the helper function below. This allows using a dataframe to specify update conditions.

While I built this to run transaction updates (i.e. single rows), it can in theory update multiple rows passing a condition. However, that's not the same as updating multiple rows using an input dataframe. Maybe somebody else can build on this...


dbUpdateCustom = function(x, key_cols, con, schema_name, table_name) {
  
  if (nrow(x) != 1) stop("Input dataframe must be exactly 1 row")
  if (!all(key_cols %in% colnames(x))) stop("All columns specified in 'key_cols' must be present in 'x'")
  
  # Build the update string --------------------------------------------------

  df_key     <- dplyr::select(x,  one_of(key_cols))
  df_upt     <- dplyr::select(x, -one_of(key_cols))
  
  set_str    <- purrr::map_chr(colnames(df_upt), ~glue::glue_sql('{`.x`} = {x[[.x]]}', .con = con))
  set_str    <- paste(set_str, collapse = ", ")
  
  where_str  <- purrr::map_chr(colnames(df_key), ~glue::glue_sql("{`.x`} = {x[[.x]]}", .con = con))
  where_str  <- paste(where_str, collapse = " AND ")
  
  update_str <- glue::glue('UPDATE {schema_name}.{table_name} SET {set_str} WHERE {where_str}')
  
  # Execute ------------------------------------------------------------------
  
  query_res <- DBI::dbSendQuery(con, update_str)
  DBI::dbClearResult(query_res)

  return (invisible(TRUE))
}

Where

  • x: 1-row dataframe that contains 1+ key columns, and 1+ update columns.
  • key_cols: character vector, of 1 or more column names that are the keys (i.e. used in the WHERE clause)
Lithology answered 18/7, 2020 at 0:5 Comment(1)
This looks very interesting for my case. Im seeing some errors when my dataframe has multiple rows which i want to update. Error in dbUpdateCustom: Input dataframe must be exactly 1 rowPearlinepearlman
I
0

Here is a little helper function I put together using REPLACE INTO to update a table using DBI, replacing old duplicate entries with the new values. It's basic and for my own needs, but should be easy to modify. All you need to pass to the function is the connection, table name, and dataframe. Note that the table must have a PRIMARY KEY column. I've also included a simple working example.

row_to_list <- function(Y)  suppressWarnings(split(Y, f = row(Y)))

sql_val <- function(y){
  if(!is.numeric(y)){
    return(paste0("'",y,"'"))
  }else{
    if(is.na(y)){
      return("NULL")
    }else{
      return(as.character(y))
    }
  }
}

to_sql_row <- function(x) paste0("(",paste(do.call("c", lapply(x, FUN = sql_val)), collapse = ", "),")")

bracket <- function(x) paste0("`",x,"`")

to_sql_string <- function(x) paste0("(",paste(sapply(x, FUN = bracket), collapse = ", "),")")

replace_into_table <- function(con, table_name, new_data){
  #new_data <- data.table(new_data)
  cols <- to_sql_string(names(new_data))
  vals <- paste(lapply(row_to_list(new_data), FUN = to_sql_row), collapse = ", ")
  query <- paste("REPLACE INTO", table_name, cols, "VALUES", vals)
  rs <- dbExecute(con, query)
  return(rs)
}

tb <- data.frame("id" = letters[1:20], "A" = 1:20, "B" = seq(.1,2,.1)) # sample data
dbWriteTable(con, "test_table", tb) # create table
dbExecute(con, "ALTER TABLE test_table ADD PRIMARY KEY (id)") # set primary key

new_data <- data.frame("id" = letters[19:23], "A" = 1:5, "B" = seq(101,105)) # new data
new_data[4,2] <- NA # add some NA values
new_data[5,3] <- NA
table_name <- "test_table"
replace_into_table(con, "test_table", new_data)

result <- dbReadTable(con, "test_table")
Introversion answered 15/1, 2021 at 22:33 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.