How to convert Date or Datetime field when some parts are blank; na.omit fails
Asked Answered
R

2

8

I have a data set that has dates and times for in and out. Each line is an in and out set, but some are blank. I can remove the blanks with na.omit and a nice read in (it was a csv, and na.strings=c("") works on the read.csv).

Of course, because the real world is never like the tutorial, some of the times are only dates, so my as.POSIXlt(Dataset$In,format="%m/%d/%Y %H:%M") returns NA on the "only date no time"s.

na.omit does not remove these lines. so the questions are 2

  1. Why doesn't na.omit work, or how can I get it to work?

  2. Better, How can I convert one column into both Dates and Times (in the posix format) without 2 calls or with some sort of optional parameter in the format string? (or is this even possible?).

This is a sample of the dates and times. I can't share the real file, 1 it's huge, 2 it's PII.

Id,In,Out
1,8/15/2015 8:00,8/15/2015 17:00
1,8/16/2015 8:04,8/16/2015
1,8/17/2015 8:50,8/17/2015 18:00
1,8/18/2015,8/18/2015 17:00
2,8/15/2015,8/15/2015 13:00
2,8/16/2015 8:00,8/16/2015 17:00
3,8/15/2015 4:00,8/15/2015 11:00
3,8/16/2015 9:00,8/16/2015 19:00
3,8/17/2015,8/17/2015 17:00
3,,
4,,
4,8/16/2015 6:00,8/16/2015 20:00
Rybinsk answered 19/11, 2015 at 19:48 Comment(4)
Regarding the second question: What times should these be? 00:00?Doily
Regarding the first question. A POSIXlt object is actually a list. Look at unclass(DF$In). The fix is easy. Use POSIXct instead. It's rare that POSIXlt is a better choice than POSIXct.Doily
Well, change as.POSIXlt to as.POSIXct and you are set.Doily
Regarding the second question, 00:00 or some default. would be fine, an NA that would work would also be fine. With the lt, can I use the parts of it like a normal POSIX object, i.e. InP$yday? (sorry, noob question, I think my use of LT came from the fact that strptime defaults to LT). Roland, POSIXct works to make the NAs omittable. Thanks.Rybinsk
D
4
DF <- read.table(text = "Id,In,Out
                 1,8/15/2015 8:00,8/15/2015 17:00
                 1,8/16/2015 8:04,8/16/2015
                 1,8/17/2015 8:50,8/17/2015 18:00
                 1,8/18/2015,8/18/2015 17:00
                 2,8/15/2015,8/15/2015 13:00
                 2,8/16/2015 8:00,8/16/2015 17:00
                 3,8/15/2015 4:00,8/15/2015 11:00
                 3,8/16/2015 9:00,8/16/2015 19:00
                 3,8/17/2015,8/17/2015 17:00", header = TRUE, sep = ",",
                 stringsAsFactors = FALSE) #set this option during import


DF$In[nchar(DF$In) < 13] <- paste(DF$In[nchar(DF$In) < 13], "0:00")
DF$Out[nchar(DF$Out) < 13] <- paste(DF$Out[nchar(DF$Out) < 13], "0:00")

DF$In <- as.POSIXct(DF$In, format = "%m/%d/%Y %H:%M", tz = "GMT")
DF$Out <- as.POSIXct(DF$Out, format = "%m/%d/%Y %H:%M", tz = "GMT")
#  Id                  In                 Out
#1  1 2015-08-15 08:00:00 2015-08-15 17:00:00
#2  1 2015-08-16 08:04:00 2015-08-16 00:00:00
#3  1 2015-08-17 08:50:00 2015-08-17 18:00:00
#4  1 2015-08-18 00:00:00 2015-08-18 17:00:00
#5  2 2015-08-15 00:00:00 2015-08-15 13:00:00
#6  2 2015-08-16 08:00:00 2015-08-16 17:00:00
#7  3 2015-08-15 04:00:00 2015-08-15 11:00:00
#8  3 2015-08-16 09:00:00 2015-08-16 19:00:00
#9  3 2015-08-17 00:00:00 2015-08-17 17:00:00

na.omit doesn't work with POSIXlt objects because it is documented to "handle vectors, matrices and data frames comprising vectors and matrices (only)." (see help("na.omit")). And in the strict sense, POSIXlt objects are not vectors:

unclass(as.POSIXlt(DF$In))
#$sec
#[1] 0 0 0 0 0 0 0 0 0
#
#$min
#[1]  0  4 50  0  0  0  0  0  0
#
#$hour
#[1] 8 8 8 0 0 8 4 9 0
#
#$mday
#[1] 15 16 17 18 15 16 15 16 17
#
#$mon
#[1] 7 7 7 7 7 7 7 7 7
#
#$year
#[1] 115 115 115 115 115 115 115 115 115
#
#$wday
#[1] 6 0 1 2 6 0 6 0 1
#
#$yday
#[1] 226 227 228 229 226 227 226 227 228
#
#$isdst
#[1] 0 0 0 0 0 0 0 0 0
#
#attr(,"tzone")
#[1] "GMT"

There is hardly any reason to prefer POSIXlt over POSIXct (which is an integer giving the number of seconds since the origin internally and thus needs less memory).

Doily answered 19/11, 2015 at 20:18 Comment(1)
Truly wish I could upvote 2 times... That was a great answer (your knowledge is vast.)Rybinsk
O
2

You've been given a couple of strategies that bring these character values in and process "in-place". I almost never use as.POSIXlt since there are so many pitfalls in dealing with the list-in-list structures that it returns, especially considering its effective incompatibility with dataframes. Here's a method that does the testing and coercion at the read.-level by defining an as-method:

setOldClass("inTime", prototype="POSIXct")
setAs("character", "inTime", 
      function(from) structure( ifelse( is.na(as.POSIXct(from, format="%m/%d/%Y %H:%M") ), 
                                          as.POSIXct(from, format="%m/%d/%Y") ,
                                          as.POSIXct(from, format="%m/%d/%Y %H:%M")  ), 
                                class="POSIXct" ) )

read.csv(text=txt, colClasses=c("numeric", 'inTime','inTime') )
  Id                  In                 Out
1  1 2015-08-15 08:00:00 2015-08-15 17:00:00
2  1 2015-08-16 08:04:00 2015-08-16 00:00:00
3  1 2015-08-17 08:50:00 2015-08-17 18:00:00
4  1 2015-08-18 00:00:00 2015-08-18 17:00:00
5  2 2015-08-15 00:00:00 2015-08-15 13:00:00
6  2 2015-08-16 08:00:00 2015-08-16 17:00:00
7  3 2015-08-15 04:00:00 2015-08-15 11:00:00
8  3 2015-08-16 09:00:00 2015-08-16 19:00:00
9  3 2015-08-17 00:00:00 2015-08-17 17:00:00

The structure "envelope" is needed because of the rather strange behavior of ifelse, which otherwise would return a numeric object rather than an object of class-'POSIXct'.

Olathe answered 19/11, 2015 at 20:36 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.