annoying "feature" (or bugs?) for RODBC - McMap

About

annoying "feature" (or bugs?) for RODBC

Asked 30/1, 2012 at 10:56 Answered 30/1, 2012 at 23:41

Solved r import odbc xls rodbc

I

1

5

RODBC is the main library in R to import data from a database into R. RODBC seems to have the ability of "guess" the datatype of the column which I find it particularly annoying.

I have uploaded a file test.xls here, or you may create a xls file yourself:

create 2 columns, first column named col_a and the second column named col_b.
type whatever you like in col_a, I typed letters on this column for 92 rows
at the 92th row of col_b, type a number there, I typed "1923" without changing the data type (i.e. not using ')
try to import the xls file into R using the following script:

library(RODBC)

setwd("C:/Users/hke775/Documents/Enoch/MISC/R_problems/RODBC")
channel <- odbcConnectExcel("test.xls",readOnly=TRUE)
dummy.df <- sqlFetch(channel,"Sheet1")
odbcClose(channel)

You will see that in dummy.df, col_b is all NA, the 1923 in this column is gone.

If you want to see the 1923 again, you can change the 1st row of col_b to a number, and it is back again.

This is very annoying as I don't prefer modifying data manually. I need to use other package to do the xls importing, but I can't find other packages do as smooth as RODBC (I tried gdata and xlsReadWrite).

Did I missing anything in the sqlFetch command, and cause the trouble? Thanks.

Immune answered 30/1, 2012 at 10:56 Comment(5)

Are you sure you tried gdata? When I tried your test file with the gdata package using DF <- read.xls("test.xls") it worked. The 1923 was present. – Crumple 30/1, 2012 at 12:33

Gdata needs to have perl, which I can't install it in my office – Immune 30/1, 2012 at 12:46

If the powers that be can install R then they should also install the Rtools distribution too since that is required for R development and that has perl in it: cran.r-project.org/bin/windows/Rtools – Crumple 30/1, 2012 at 13:6

so my issue is considered to be a bug for RODBC? – Immune 30/1, 2012 at 14:3

It might be a problem with the ODBC driver rather than RODBC itself. – Crumple 30/1, 2012 at 14:18

D

8

Please don't blame R or RODBC for Microsoft's bugs... ;)

However, due to a bug in the ODBC driver, specifying the Rows to Scan (MaxScanRows) setting currently has no effect. In other words, the Excel ODBC driver (MDAC 2.1 and later) always scans the first 8 rows in the specified data source in order to determine each column's datatype.

For additional information about the Rows to Scan bug, including a simple workaround, click the article number below to view the article in the Microsoft Knowledge Base:

189897 XL97: Data Truncated to 255 Characters with Excel ODBC Driver

I tried the fix in KB189897 by setting the TypeGuessRows value to 0 and look what happens!

> library(RODBC)
> channel <- odbcConnectExcel("test.xls",readOnly=TRUE)
> tail(dummy.df <- sqlFetch(channel,"Sheet1"))
   col_a col_b
87     c    NA
88     d    NA
89     e    NA
90     f    NA
91     g    NA
92     h  1923
> odbcClose(channel)

Please, no up-votes or check marks... just send cash. :)

Diagonal answered 30/1, 2012 at 23:41 Comment(3)

Great. But then should we conclude that a user without admin privileges has no solution ? – Droppings 16/10, 2013 at 9:31

better set TypeGuessRows to 24 or any deep enough setting. But 0 will do an unlimited scan and will will affect peformance. (I'm reading thousands of non structured .xls/.xlsx files) – Hairspring 21/12, 2014 at 20:29

@Langeleppel: Yes, setting to 0 will affect performance, but it will guarantee correctness. Setting to 24 (or any other non-zero value) will only be correct if it is "deep enough" for all future xls/xlsx files read on the affected system. – Diagonal 21/12, 2014 at 20:40

Recommended topics

#Godot #Unity #Godot 4.X #Mongodb

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

© 2022 - 2024 — McMap. All rights reserved.