How to use RODBC to save dataframe to table with primary key generated at database
Asked Answered
H

1

6

I would like to enter a data frame into an existing table in a database using an R script, and I want the table in the database to have a sequential primary key. My problem is that RODBC doesn't seem to allow the primary key constraint.

Here's the SQL for creating the table I want:

CREATE TABLE [dbo].[results] (
    [ID]         INT            IDENTITY (1, 1) NOT NULL,
    [FirstName]  VARCHAR (255) NULL,
    [LastName]   VARCHAR (255) NULL,
    [Birthday]   DATETIME      NULL,
    [CreateDate] DATETIME      NULL,
    CONSTRAINT [PK_dbo.results] PRIMARY KEY CLUSTERED ([ID] ASC)
);

And a test with some R code:

ConnectionString1="Driver=ODBC Driver 11 for SQL Server;Server=myserver; Database=TestDb; trusted_connection=yes"
ConnectionString2="Driver=ODBC Driver 11 for SQL Server;Server=notmyserver; Database=TestDb; trusted_connection=yes"
db1=odbcDriverConnect(ConnectionString1)    
query="SELECT a.[firstname] as FirstName
  , a.[lastname] as LastName
  , Cast(a.[dob] as datetime) as Birthday
  , cast(a.createDate as datetime) as CreateDate
FROM [dbo].[People] a"
results=NULL
results=sqlQuery(db1,query,stringsAsFactors=FALSE)
close(db1)

db2=odbcDriverConnect(ConnectionString)
sqlSave(db2, 
    results, 
    append = TRUE, 
    varTypes=c(Birthday="datetime", CreateDate="datetime"),
    colnames = FALSE,  
    rownames = FALSE,fast=FALSE)
close(db2)

The first part of the R code is just getting some test data into a dataframe--it works fine and it's not part of my question here (I'm just including it here so you can see what format the test data is). When I run the sqlSave function I get an error message:

Error in dimnames(x) <- dn : length of 'dimnames' [2] not equal to array extent

However, if I remove the primary key from the database, everything works fine with this table:

CREATE TABLE [dbo].[results] (
    [FirstName]  VARCHAR (255) NULL,
    [LastName]   VARCHAR (255) NULL,
    [Birthday]   DATETIME      NULL,
    [CreateDate] DATETIME      NULL
);

Clearly the primary key is the issue. Normally with entity framework or whatever (as I understand it), the primary key is created at the database when you enter data.

I'd like a way to append data to a table with a primary key using only an R script. Is that possible? There could already be data in the table I'm adding to, so I don't really see a way to create keys in R before trying to append to the table.

Horselaugh answered 12/5, 2018 at 15:56 Comment(3)
I can reproduce the error message using mysql with some syntactical modifications of your SQL code...Janssen
I've debugged RODBC. The problem is line 361 in github.com/cran/RODBC/blob/master/R/sql.R - the data.frame and the DB table must have exactly the same number of columns otherwise you get this error...Janssen
Bad luck, the data.frame ("results") and the target table must have exactly the same columns because insert statements are generated (sql.R#379). I am looking for a workaround...Janssen
J
4

The problem is line 361 in http://github.com/cran/RODBC/blob/master/R/sql.R - the data.frame and the DB table must have exactly the same number of columns otherwise you get this error with this stacktrace:

Error in dimnames(x) <- dn : 
  length of 'dimnames' [2] not equal to array extent 
3. `colnames<-`(`*tmp*`, value = c("ID", "FirstName", "LastName", 
   "Birthday", "CreateDate")) at sql.R#361
2. sqlwrite(channel, tablename, dat, verbose = verbose, fast = fast, 
   test = test, nastring = nastring) at sql.R#211
1. sqlSave(db2, results, append = TRUE, varTypes = c(Birthday = "datetime", 
    CreateDate = "datetime"), colnames = FALSE, rownames = FALSE, 
    fast = FALSE, verbose = TRUE) 

If you add the ID column to your data.frame you can no longer use the autoinc ID column so this is no solution (or workaround).

A "simple" workaround to the "same columns" limitation of RODBC::sqlSave is:

  1. Use sqlSave to save the new rows into another table name
  2. Send an insert into ... select from ... via RODBC::sqlQuery to append the new rows to your original table that includes the autoinc ID column
  3. Delete the table with the new rows again (drop table...)

A better option would be to use the new odbc package which also offers better performance through bulk-alike inserts instead of sending single insert statements like RODBC does:

https://github.com/r-dbi/odbc

Look for the function dbWriteTable (which is an implementation of the interface DBI::dbWriteTable).

Janssen answered 12/5, 2018 at 18:59 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.