Is there a specific way to handle timestamp columns in R when pulling data using RPostgreSQL?
Asked Answered
C

2

7

I'm trying to pull data from a PostgreSQL database and the results for a timestamp field are inconsistent. I'm not sure if I'm handling POSIXct results properly. Otherwise, I think I found a bug in the RPostgreSQL package. Here is the way to replicate the issue:

Suppose there is a table in a postgres database with one field (run this in PostgreSQL):

CREATE DATABASE mydb;
CREATE TABLE test_table
(   
  "DateTime" timestamp without time zone NOT NULL,
  CONSTRAINT "pk_test_table" PRIMARY KEY ("DateTime")
)
WITH (
  OIDS=FALSE
);
ALTER TABLE test_table
  OWNER TO postgres;

And let’s say there are a few hundred records. I will populate them in R. Here is the code:

library(RPostgreSQL)

# Let's feed the table with some sequence of date/time values
date_values <-  as.chron(seq(10000, 10500, 1/24))

format.chron <- function(z)  {
  sprintf("%04.0f-%02.0f-%02.0f %02.0f:%02.0f:00", 
            as.numeric(as.character(years(z))), 
            months(z), 
            as.numeric(as.character(days(z))), 
            as.numeric(as.character(hours(z))), 
            as.numeric(as.character(minutes(z))))
}

.generateInsertQuery <- function(date_values, field_name, table_name) {
  insert_val  <- paste(paste0("(", sQuote(format(date_values)), ")"), collapse=',')
  qry         <- paste("INSERT INTO", dQuote(table_name), paste0("(", dQuote(field_name), ")"), "VALUES", insert_val)
  qry
}

drv <- dbDriver('PostgreSQL')
con <- dbConnect(drv, user='postgres', dbname='mydb')
qry <- .generateInsertQuery(date_values, "DateTime", "test_table")
dbSendQuery(con, qry)

If I try to get the values, the time component gets stripped out of the resulting data

res <- dbGetQuery(con, "SELECT * FROM test_table")
res[1:20,1]

The class of the result, however, is POSIXct

class(res[,1])

If the result is fetched one record at a time, the values with hour:min equal to 00:00 loose the time component:

rs <- dbSendQuery(con, "SELECT \"DateTime\" FROM test_table")
res_list <- list()
for(i in 1:100) res_list[i]  <- fetch(rs,1)
res_list

As a workaround, I'm fetching the result 1 record at a time, fixing, and aggregating them into a data.frame. But this is very time-consuming, especially for large data sets. Any ideas of why this is happening and how to deal with this issue?

Thanks in advance!

Courtesy answered 19/7, 2013 at 20:7 Comment(1)
As mentioned above, using timestamp with timezone in the CREATE TABLE statement did the trick for me. Though it's not really part of the question, in my experience I have found that dbWriteTable does not like any capital letters when specifying the SQL table names to write to.Tropophilous
C
3

First off, the RPostgreSQL project has a mailing list; I suggest you post there.

PostgreSQL has two datetime types: with and without timezone. As I recall, R only maps the latter. I did write some early regression tests for this (see the package source) but have not been that involved with the project of late. But I do recall that POSIXct maps back and forth to the PostgreSQL datetime type just fine.

Cleptomania answered 19/7, 2013 at 20:12 Comment(4)
Thanks @Dirk for the suggestion. In my case, I'm using timestamp without timezone, and I'm still finding the issue described above. I will try the mailing list.Courtesy
Try timestamp with timezone and see if it works. You can always use UTC as a timezone.Cleptomania
Thanks again for the tip @Dirk. I was able to push and pull a long vector of POSIXlt values from year 2000 to 2010 with no problem. So it seems like the timestamp with no time zone is the unsupported type. This is even better, as my previous approach was ambiguous in terms of time zone, which could create confusion among data users.Courtesy
That is precisely why I chose timestamp with timezone when all this started. ;-)Cleptomania
C
5

RPostgreSQL's dbWriteTable with any posixct field will create database field of type timestamp with timezone always with tz +00 no matter what posixct timezone it will be. I believe more precise would be to create timestamp without timezone instead.

Best solution for both dbReadTable and dbWriteTable is to use Sys.setenv(TZ = "UTC"). In my opinion it is too deep dependency because many other processes in R session may require proper timezone setting.

Much more specific and not to deep dependent is to define own dbReadTable and dbWriteTable which wraps the DBI version with appropriate preprocess/postprocess of posixct types. But still it is not an option if you are developing DBI-compliant code/package (not only postgres related).

It would be great to have RPostgreSQL migrated to github for easier contribution.

Claycomb answered 6/7, 2014 at 11:17 Comment(0)
C
3

First off, the RPostgreSQL project has a mailing list; I suggest you post there.

PostgreSQL has two datetime types: with and without timezone. As I recall, R only maps the latter. I did write some early regression tests for this (see the package source) but have not been that involved with the project of late. But I do recall that POSIXct maps back and forth to the PostgreSQL datetime type just fine.

Cleptomania answered 19/7, 2013 at 20:12 Comment(4)
Thanks @Dirk for the suggestion. In my case, I'm using timestamp without timezone, and I'm still finding the issue described above. I will try the mailing list.Courtesy
Try timestamp with timezone and see if it works. You can always use UTC as a timezone.Cleptomania
Thanks again for the tip @Dirk. I was able to push and pull a long vector of POSIXlt values from year 2000 to 2010 with no problem. So it seems like the timestamp with no time zone is the unsupported type. This is even better, as my previous approach was ambiguous in terms of time zone, which could create confusion among data users.Courtesy
That is precisely why I chose timestamp with timezone when all this started. ;-)Cleptomania

© 2022 - 2024 — McMap. All rights reserved.