Why read.zoo gives index as dates when times are available
Asked Answered
T

3

5

I'm trying to understand my difficulties in the past with inputting zoo objects. The following two uses of read.zoo give different results despite the default argument for tz supposedly being "" and that is the only difference between the two read.zoo calls:

Lines <- "2013-11-25 12:41:21         2 
2013-11-25 12:41:22.25      2 
2013-11-25 12:41:22.75      75 
2013-11-25 12:41:24.22      3 
2013-11-25 12:41:25.22      1 
2013-11-25 12:41:26.22      1"

library(zoo)
z <- read.zoo(text = Lines, index = 1:2)

> dput(z)
structure(c(2L, 2L, 75L, 3L, 1L, 1L), index = structure(c(16034, 
16034, 16034, 16034, 16034, 16034), class = "Date"), class = "zoo")

z <- read.zoo(text = Lines, index = 1:2, tz="")
> dput(z)
structure(c(2L, 2L, 75L, 3L, 1L, 1L), index = structure(c(1385412081, 
1385412082.25, 1385412082.75, 1385412084.22, 1385412085.22, 1385412086.22
), class = c("POSIXct", "POSIXt"), tzone = ""), class = "zoo")
> 
Tiresias answered 28/8, 2014 at 20:6 Comment(2)
don't you get a warning in the first example?Monteria
I do get a warning. It seemed rather tangential to the matter at hand.Tiresias
N
4

Effectively, the default index class is "Date" unless tz is used in which case the default is "POSIXct". Thus the first example in the question gives "Date" class since that is the default and the second "POSIXct" since tz was specified.

If you want to specify the class without making use of these defaults then to be explicit use the FUN argument:

read.zoo(...whatever..., FUN = as.Date)
read.zoo(...whatever..., FUN = as.POSIXct) # might need FUN=paste,FUN2=as.POSIXct
read.zoo(...whatever..., FUN = as.yearmon)
# etc. 

The FUN argument can also take a custom function as shown in the examples in the package.

Note that it always assumes standard formats (e.g. "%Y-%m-%d" in the case of "Date" class) if no format is specified and never tries to automatically determine the format.

The way it works is explained in detail in ?read.zoo and there are many examples in ?read.zoo (there are 78 lines of code in the examples section) as well as in an entire vignette (one of six vignettes) dedicated just to read.zoo" : Reading Data in zoo.

Added Have expanded the above. Also, in the development version of zoo available here the heuristic has been improved and with that improvement the first example in the question does recognize the date/times and chooses POSIXct. Also some clarification of the simple heuristic has been added to the read.zoo help file so that the many examples provided do not have to be relied upon as much.

Here are some examples. Note that the heuristic referred to is a heuristic to determine the class of the time index only. It can only identify "numeric", "Date" and "POSIXct" classes. The heuristic cannot identify other classes (although you can specify them yourself using FUN=). Also the heuristic does not identify formats. If the format is not provided using format= or implicitly through FUN= then standard format is assumed, e.g. "%Y-%m-%d" in the case of "Date".

Lines <- "2013-11-25 12:41:21  2 
2013-12-25 12:41:22.25      3 
2013-12-26 12:41:22.75      8"

# explicit.  Uses POSIXct.
z <- read.zoo(text = Lines, index = 1:2, FUN = paste, FUN2 = as.POSIXct) 

# tz implies POSIXct
z <- read.zoo(text = Lines, index = 1:2, tz = "")

# heuristic: Date now; devel ver uses POSIXct
z <- read.zoo(text = Lines, index = 1:2) 


Lines <- "2013-11-251  2 
2013-12-25 3 
2013-12-26 8"

z <- read.zoo(text = Lines, FUN = as.Date) # explicit.  Uses Date.
z <- read.zoo(text = Lines, format = "%Y-%m-%d") # format & no tz implies Date
z <- read.zoo(text = Lines) # heuristic: Date

Note:

(1) In general, its safer to be explicit by using FUN or by using tz and/or format as opposed to relying on the heuristic. If you are explicit by using FUN or semi-explicit by using tz and/or format then there is no change between the current and the development versions of read.zoo.

(2) Its safer to rely on the documentation rather than the internals as the internals can change without warning and in fact have changed in the development version. If you really want to look at the code despite this then the key statement that selects the class of the index if FUN is not explicitly defined is the if (is.null(FUN)) ... statement in the read.zoo source.

(3) I recommend using read.zoo as being easier, direct and compact rather than workarounds such as read.table followed by zoo. I have been using read.zoo for years as have many others and it seems pretty solid to me but if anyone finds specific problems with read.zoo or with the documentation (always possible since there is quite a bit of it) they can always be reported. Even though the package has been around for years improvements are still being made.

Nosegay answered 28/8, 2014 at 20:39 Comment(4)
The help page says 1) POSIXct if tz is present, 2) Date if format is present and 3) "heuristics" if neither is present. So why aren't the heuristics better given that tz is presumed to be "" and a two column spec was given?Tiresias
The function to look at then is processFUN?Tiresias
@G.Grothendieck Oh, you mean the exact same lines of code I quote in my answer. Guess you either got out of bed the wrong side this morning or are really touchy about your documentation. Clearly my answer was so terribly bad for suggesting, oh, no, wait...Corunna
Yes, and I'm now wondering why in the case of a two-column index (with default value for tx) there is not an effort to use toDefault? It seems to be designed for this purpose, and as far as I can tell, it is not being called.Tiresias
C
9

The answer (of course) is in the sources for read.zoo(), wherein there is:

....
ix <- if (missing(format) || is.null(format)) {
    if (missing(tz) || is.null(tz)) 
        processFUN(ix)
    else processFUN(ix, tz = tz)
}
else {
    if (missing(tz) || is.null(tz)) 
        processFUN(ix, format = format)
    else processFUN(ix, format = format, tz = tz)
}
....

Even though the default for tz is "", in your first case tz is considered missing (by missing()) and hence processFUN(ix) is used. When you set tz = "", it is no longer missing and hence you get processFUN(ix, tz = tz).

Without looking at the details of read.zoo() this could possibly be handled better by having tz = NULL or tz (no default) in the arguments and then in the code, if tz needs to be set to "" for some reason, do:

if (missing(tz) || is.null(tz)) {
    tz <- ""
}

or perhaps this is not even needed if all the is required is to avoid the confusion about the two different calls?

Corunna answered 28/8, 2014 at 20:44 Comment(9)
Oh come on Gabor, seriously. -1 for showing in the code why this happens!? Now you're just being petty. Can you explain what is factually wrong about my answer? If so I'll gladly delete it. I was just about to add pointers to the documentation but saw that you'd already done this so didn't see the need. Just because it is extensively documented doesn't mean than a seasoned R user like David will spot the pointers but I'm sure he could have worked out the issue from the code.Corunna
Arrrgh. Why is tz="" considered missing? It should be a length 1 character vector.Tiresias
@BondedDust From ?missing "‘missing’ can be used to test whether a value was specified as an argument to a function." You didn't specify tz as an argument in the function call.Corunna
Given the extensitve documentation which it seems no one is reading I don't think its appropriate to claim that you need to read the source.Nosegay
Did we really need three answers and a dozen comments to get you to consider reading the docs?Plowshare
I didn't say you needed to, @G.Grothendieck, all I said was that the cause at the R-level of the behaviour noted in the question is easily discernable from the sources if you debug and step through. And to push back on your suggestion that one only needs to read the documentation; not all documentation is correct or up-to-date or without bugs. If you really want to know what is happening, reading the source will tell you exactly what is happening. I still don't know why you feel a -1 is appropriate given the Stack Overflow norms!? What is factually wrong or unhelpful about my answer?Corunna
@BondedDust Sorry, I didn't mean to be terse in that comment; I meant that tz is considered missing because that is the semantics of the missing() operator. You did ask why and I explained why and pointed you to the place to look for more.Corunna
No cause for apology. You caused me to better understand missing()Tiresias
@G.Grothendieck Now you are just being obtuse. Go away.Corunna
N
4

Effectively, the default index class is "Date" unless tz is used in which case the default is "POSIXct". Thus the first example in the question gives "Date" class since that is the default and the second "POSIXct" since tz was specified.

If you want to specify the class without making use of these defaults then to be explicit use the FUN argument:

read.zoo(...whatever..., FUN = as.Date)
read.zoo(...whatever..., FUN = as.POSIXct) # might need FUN=paste,FUN2=as.POSIXct
read.zoo(...whatever..., FUN = as.yearmon)
# etc. 

The FUN argument can also take a custom function as shown in the examples in the package.

Note that it always assumes standard formats (e.g. "%Y-%m-%d" in the case of "Date" class) if no format is specified and never tries to automatically determine the format.

The way it works is explained in detail in ?read.zoo and there are many examples in ?read.zoo (there are 78 lines of code in the examples section) as well as in an entire vignette (one of six vignettes) dedicated just to read.zoo" : Reading Data in zoo.

Added Have expanded the above. Also, in the development version of zoo available here the heuristic has been improved and with that improvement the first example in the question does recognize the date/times and chooses POSIXct. Also some clarification of the simple heuristic has been added to the read.zoo help file so that the many examples provided do not have to be relied upon as much.

Here are some examples. Note that the heuristic referred to is a heuristic to determine the class of the time index only. It can only identify "numeric", "Date" and "POSIXct" classes. The heuristic cannot identify other classes (although you can specify them yourself using FUN=). Also the heuristic does not identify formats. If the format is not provided using format= or implicitly through FUN= then standard format is assumed, e.g. "%Y-%m-%d" in the case of "Date".

Lines <- "2013-11-25 12:41:21  2 
2013-12-25 12:41:22.25      3 
2013-12-26 12:41:22.75      8"

# explicit.  Uses POSIXct.
z <- read.zoo(text = Lines, index = 1:2, FUN = paste, FUN2 = as.POSIXct) 

# tz implies POSIXct
z <- read.zoo(text = Lines, index = 1:2, tz = "")

# heuristic: Date now; devel ver uses POSIXct
z <- read.zoo(text = Lines, index = 1:2) 


Lines <- "2013-11-251  2 
2013-12-25 3 
2013-12-26 8"

z <- read.zoo(text = Lines, FUN = as.Date) # explicit.  Uses Date.
z <- read.zoo(text = Lines, format = "%Y-%m-%d") # format & no tz implies Date
z <- read.zoo(text = Lines) # heuristic: Date

Note:

(1) In general, its safer to be explicit by using FUN or by using tz and/or format as opposed to relying on the heuristic. If you are explicit by using FUN or semi-explicit by using tz and/or format then there is no change between the current and the development versions of read.zoo.

(2) Its safer to rely on the documentation rather than the internals as the internals can change without warning and in fact have changed in the development version. If you really want to look at the code despite this then the key statement that selects the class of the index if FUN is not explicitly defined is the if (is.null(FUN)) ... statement in the read.zoo source.

(3) I recommend using read.zoo as being easier, direct and compact rather than workarounds such as read.table followed by zoo. I have been using read.zoo for years as have many others and it seems pretty solid to me but if anyone finds specific problems with read.zoo or with the documentation (always possible since there is quite a bit of it) they can always be reported. Even though the package has been around for years improvements are still being made.

Nosegay answered 28/8, 2014 at 20:39 Comment(4)
The help page says 1) POSIXct if tz is present, 2) Date if format is present and 3) "heuristics" if neither is present. So why aren't the heuristics better given that tz is presumed to be "" and a two column spec was given?Tiresias
The function to look at then is processFUN?Tiresias
@G.Grothendieck Oh, you mean the exact same lines of code I quote in my answer. Guess you either got out of bed the wrong side this morning or are really touchy about your documentation. Clearly my answer was so terribly bad for suggesting, oh, no, wait...Corunna
Yes, and I'm now wondering why in the case of a two-column index (with default value for tx) there is not an effort to use toDefault? It seems to be designed for this purpose, and as far as I can tell, it is not being called.Tiresias
P
2

I suspect your use of read.zoo tripped you up. Here is what I did:

library(zoo)
tt <- read.table(text=Lines)
z <- zoo(as.integer(tt[,3]), order.by=as.POSIXct(paste(tt[,1], tt[,2])))

Now z is a proper zoo object:

R> z
2013-11-25 12:41:21.00 2013-11-25 12:41:22.25 2013-11-25 12:41:22.75 
                     2                      2                     75  
2013-11-25 12:41:24.22 2013-11-25 12:41:25.22 2013-11-25 12:41:26.22 
                     3                      1                      1 
R> class(z)
[1] "zoo"
R> class(index(z))
[1] "POSIXct" "POSIXt" 
R> 

And by making sure I used a POSIXct object for the index, I am in fact getting a POSIXct object back.

Plowshare answered 28/8, 2014 at 20:19 Comment(7)
Well, something tripped me up but I was not asking for an alternate set of parameters but rather an explanation why the apparently identical set of parameters leads to different results.Tiresias
Both your approaches deploy read.zoo(). I have used xts and zoo extensively for many years, yet I never quite liked read.zoo() for these very reason. I showed you how to avoid the issue by using a very standard direct zoo() constructor -- which is what I do (or rather, xts()) all the time. YMMV.Plowshare
I suppose your code may be useful, but it's not answer the narrow question I posed. My question was prompted by seeing an answer from @GGrothendeick that used this form. And then I read the help page or read.zoo where the 'tz' default is listed as "". So why does leaving off the 'tz' parameter lead to Dates rather than datetimes?Tiresias
With that attitude I will certainly not become more likely to use, or, gosh, recommend read.zoo(). But thanks anyway for providing 'Exhibit 1' of how not to behave on SO.Plowshare
@BondedDust: To me, it appears to be a clear and simple bug, oh, wait, let me call it 'misfeature' before Gabor sends a drone my way. Which is why a) I avoid the function and b) I showed you a simple and straightforward way to avoid it. But as I just wasted enough time on this I'll simply delete the answer now.Plowshare
Of course it has been around for years. But I have also avoided this function for years while diligently recommending zoo (and xts) and its vignettes. Have a look at the brand new parsedate package. Dealing with dates is hard so doing hidden fancy like read.zoo() does is a bad idea from the start. At least in my book.Plowshare
More attitude on your behalf, so I am done here. Rest assured that I did not suggest that you replace your code with parsedate; I merely mentioned it to say that dates, are, well, complicated. Which is pretty much what I said one comment ago.Plowshare

© 2022 - 2024 — McMap. All rights reserved.