Related to https://mcmap.net/q/589929/-rodbc-pulling-float-as-character-in-sql-server, I'm finding inconsistent behavior with pulling data from SQL Server (2014).
library(RODBC)
sqlQuery(.conn, "CREATE TABLE r2test ( [mychar] [NVARCHAR](16), [mynum] [FLOAT])")
# character(0)
sqlQuery(.conn, "INSERT INTO r2test (mychar,mynum) VALUES ('1',3.141593),('2',6.283185)")
character(0)
str(sqlQuery(.conn, "SELECT * FROM r2test", stringsAsFactors = FALSE))
# 'data.frame': 2 obs. of 2 variables:
# $ mychar: int 1 2
# $ mynum : num 3.14 6.28
In that example we see the undesired behavior: the characters of mychar
are being internally converted to integers. Per the previously-mentioned SO answer, the as.is
option defeats this, but has the unfortunate side-effect of also forcing decimal-represented floats to strings:
str(sqlQuery(.conn, "SELECT * FROM r2test", stringsAsFactors = FALSE, as.is = TRUE))
# 'data.frame': 2 obs. of 2 variables:
# $ mychar: chr "1" "2"
# $ mynum : chr "3.1415929999999999" "6.2831849999999996"
If at least one of mychar
is actually not integer-izable, things are fine:
sqlQuery(.conn, "INSERT INTO r2test (mychar,mynum) VALUES ('a',9.424778)")
# character(0)
str(sqlQuery(.conn, "SELECT * FROM r2test", stringsAsFactors = FALSE))
# 'data.frame': 3 obs. of 2 variables:
# $ mychar: chr "1" "2" "a"
# $ mynum : num 3.14 6.28 9.42
Unfortunately, the data model does not support arbitrarily adding something to encourage this behavior (or I just haven't thought of a good-enough way of doing it). The data model is such that values of mychar
include 01
and 1
which are character-wise distinct. The only workaround I've found is to use as.is = TRUE
, which will require me to as.numeric
all related columns, something that is both tedious and (theoretically) unnecessary work.
Since the docs suggest needing to set DBMSencoding
, I checked the current encoding (helped by https://mcmap.net/q/246694/-sql-server-default-character-encoding):
sqlQuery(.conn, "SELECT SERVERPROPERTY('Collation')")
# 1 SQL_Latin1_General_CP1_CI_AS
I've tried using (for kicks): DBMSencoding="latin1"
, DBMSencoding="UTF-8"
, and explicitly though the default DBMSencoding=""
with no change in behavior.
How can I encourage the behavior of not over-coercing the data types?
Currently using R-3.2.5 and RODBC-1.3.13 on ubuntu.