RODBC Query Tuning

Asked 22/11, 2010 at 19:3 Answered 10/5, 2017 at 21:7

I am trying to pull data from our Oracle datamart into R using RODBC. I have been able to work through some of my issues, but I have one basic question.

How can I retain the formatting as it sits in the database?

Using the following command

uapp <- sqlQuery(ch, SQL, stringsAsFactors=F)

My data are retrieved as expected, but some fields are converted to numeric and logical once in R when they should be characters. I can get around this if I use:

uapp <- sqlQuery(ch, SQL, stringsAsFactors=F, as.is=T)

But I "lose" columns being formatted as dates when I would like them to be.

Evince answered 22/11, 2010 at 19:3 Comment(3)

What exactly do you mean by "lose"? Do the date columns get converted to integer or as the Oracle default date (i.e. "22-NOV-10")? – Kutzer 22/11, 2010 at 19:32

If I use the first query, date fields are returned as POSIXct in the data frame. The second query is character. Not the end of the world obviously, but I am trying to minimize correcting data types once inside R. – Evince 22/11, 2010 at 19:47

Related: #38443657 – Postscript 10/5, 2017 at 21:22

You're caught in a gap I've struggled with as well. I can't opine on what is "right" or "best" but only what I've done in the past.

I usually do what you did in the first example and just deal with type changes once they get into R. If you wanted to do the latter method, you could convert the date once it was in R. My Oracle systems seem to always be set up to return dates in the "22-NOV-10" format which is annoying as heck to parse. So I would use the Oracle to_char() function in my query to format my dates into something R can easily recognize.

So, for example, I might have this in my SELECT statement:

to_char(myDate, 'yyyy-mm-dd') as myDate

then I pull that into a data frame called myData and do this:

myData$properDate <- strptime(myData$myDate, "%Y-%m-%d")

Whether to deal with fixing dates or deal with fixing other fields really depends on how many date fields you have and how many non-date fields the first method messes up. But in my experience I either end up fixing one, or the other.

Something you might consider when using method 1: Try using cast() in your SQL in order to force a field into a particular type. The only times I've had trouble with RODBC molesting my data types is when the type is ambiguous on the server side. This is usually the result of CASE statements or somesuch on the SQL end.

Kutzer answered 22/11, 2010 at 20:8 Comment(1)

Thanks JD. I will give cast a shot, but at least it confirms that I will need to do some data prep no matter what. – Evince 22/11, 2010 at 21:51

The as.is argument can be a boolean vector.

So if your result set consists of, say, two date columns followed by one character column, you can do:

uapp <- sqlQuery(ch, SQL, stringsAsFactors = FALSE, as.is = c(FALSE, FALSE, TRUE))

EDIT: as suggested by Kalin you can also refer to the columns that should be "as is". For example

uapp <- sqlQuery(ch, SQL, stringsAsFactors = FALSE, as.is = c(2, 4))

will leave columns two and four "as is".

Postscript answered 10/5, 2017 at 21:7 Comment(1)

FYI, you may also put in the numbers of the columns that should be "as is" (and leaving the rest to default, I think). For example ... as.is = c( 2, 4 ) will make columns 2 and 4 be interpreted "as is". – Hilde 26/9, 2017 at 21:52

Recommended topics

Hot tags