RODBC: chars and numerics converted aggressively (with/without as.is)
Asked Answered
G

1

7

Related to https://mcmap.net/q/589929/-rodbc-pulling-float-as-character-in-sql-server, I'm finding inconsistent behavior with pulling data from SQL Server (2014).

library(RODBC)
sqlQuery(.conn, "CREATE TABLE r2test ( [mychar] [NVARCHAR](16), [mynum] [FLOAT])")
# character(0)
sqlQuery(.conn, "INSERT INTO r2test (mychar,mynum) VALUES ('1',3.141593),('2',6.283185)")
character(0)
str(sqlQuery(.conn, "SELECT * FROM r2test", stringsAsFactors = FALSE))
# 'data.frame': 2 obs. of  2 variables:
#  $ mychar: int  1 2
#  $ mynum : num  3.14 6.28

In that example we see the undesired behavior: the characters of mychar are being internally converted to integers. Per the previously-mentioned SO answer, the as.is option defeats this, but has the unfortunate side-effect of also forcing decimal-represented floats to strings:

str(sqlQuery(.conn, "SELECT * FROM r2test", stringsAsFactors = FALSE, as.is = TRUE))
# 'data.frame': 2 obs. of  2 variables:
#  $ mychar: chr  "1" "2"
#  $ mynum : chr  "3.1415929999999999" "6.2831849999999996"

If at least one of mychar is actually not integer-izable, things are fine:

sqlQuery(.conn, "INSERT INTO r2test (mychar,mynum) VALUES ('a',9.424778)")
# character(0)
str(sqlQuery(.conn, "SELECT * FROM r2test", stringsAsFactors = FALSE))
# 'data.frame': 3 obs. of  2 variables:
#  $ mychar: chr  "1" "2" "a"
#  $ mynum : num  3.14 6.28 9.42

Unfortunately, the data model does not support arbitrarily adding something to encourage this behavior (or I just haven't thought of a good-enough way of doing it). The data model is such that values of mychar include 01 and 1 which are character-wise distinct. The only workaround I've found is to use as.is = TRUE, which will require me to as.numeric all related columns, something that is both tedious and (theoretically) unnecessary work.

Since the docs suggest needing to set DBMSencoding, I checked the current encoding (helped by https://mcmap.net/q/246694/-sql-server-default-character-encoding):

sqlQuery(.conn, "SELECT SERVERPROPERTY('Collation')")
# 1 SQL_Latin1_General_CP1_CI_AS

I've tried using (for kicks): DBMSencoding="latin1", DBMSencoding="UTF-8", and explicitly though the default DBMSencoding="" with no change in behavior.

How can I encourage the behavior of not over-coercing the data types?

Currently using R-3.2.5 and RODBC-1.3.13 on ubuntu.

Guilty answered 18/7, 2016 at 17:58 Comment(0)
L
12

If I understand correctly, I think this if what you are looking for,

str(sqlQuery(
  .conn, 
  "SELECT * FROM r2test", 
  stringsAsFactors = FALSE,
  as.is = c(TRUE, FALSE)
))
#'data.frame':  2 obs. of  2 variables:
# $ mychar: chr  "1" "2"
# $ mynum : num  3.14 6.28

where as.is is specified as a logical vector (must have the same length as the number of columns in the result set). To be fair, this isn't really spelled out very well. The man page for sqlQuery just refers you to the as.is argument in read.table, which states:

Note that as.is is specified per column (not per variable) and so includes the column of row names (if any) and any columns to be skipped.

The downside of this approach is that you need to know in advance which columns you want to convert and which you don't. Personally I don't see why the default behavior isn't to just map SQL character types to R character types, SQL numeric types to R numeric types, etc, but perhaps there is a good reason for this on the backend. Automatically converting '1', '2', ... to integers does not seem like much of a "feature" to me.

Ly answered 18/7, 2016 at 18:34 Comment(4)
I hadn't considered using a logical vector, that does ameliorate things a bit, providing a more direct way out, thanks! Agreed on the "don't see why" but then again, I'm not (yet) offering to write code (and submit a PR) to handle things differently.Guilty
Yes it's a little confusing when function arguments have a default value of TRUE or FALSE but also accept a logical vector. And ditto, I have a long-standing bone to pick with this package about sqlUpdate not working as advertised, but I haven't yet gotten around to working on a patch so I can't complain too loudly...Ly
nrussell, besides R-sigs-db, do you know of any other RODBC-specific discussion forum? Source code for issues and PRs? I'd like to discuss things like parameterized queries, partial-table sqlSave, and other things that are unnecessarily constraining my use of RODBC.Guilty
I'm not aware of any unfortunately -- the only repository I've seen for this package is on CRAN's read-only GitHub mirror, but that is just something automatically exported by CRAN, and not a channel for development. However, you might consider contacting Brian Ripley at the email listed in the DESCRIPTION file. Development may take place through SVN or some other means, but regardless, as the current maintainer, I imagine he would be the best person to speak to about this. I have an interest in such features as well since I spend a significant amount of time working with SQL (Server) at work.Ly

© 2022 - 2024 — McMap. All rights reserved.