Difference in outputs using cumsum
Asked Answered
G

1

7

Why are these two operations different?

library(lubridate)
library(magrittr)

> seconds_to_period(1:1000) %>% cumsum %>% sum
[1] 14492440
> 1:1000 %>% cumsum %>% sum
[1] 167167000

I have seen, however, that the issue lies on the fact that cumsum only adds the seconds of the period and ignores the rest:

seconds_to_period(60) +  seconds_to_period(60)
[1] "2M 0S"

but

> cumsum(c(seconds_to_period(60), seconds_to_period(60)))
[1] 0 0

Why is this behavior the default form? I think it is rather unintuitive. Additionally, what is the way to overcome this and get as a result the same as cumsum(1:1000) using 'Period' classes of lubridate that doesn't involve doing something like:

c(seconds_to_period(60), seconds_to_period(60)) %>% as.numeric %>% cumsum

Glossolalia answered 23/3, 2019 at 15:16 Comment(0)
M
6

Being cumsuma primitive, you can see here https://github.com/Microsoft/microsoft-r-open/blob/master/source/src/main/cum.c what R it is doing under the hood. Moreover, if you read from line 215:

PROTECT(t = coerceVector(CAR(args), REALSXP));
    n = XLENGTH(t);
    PROTECT(s = allocVector(REALSXP, n));
    setAttrib(s, R_NamesSymbol, getAttrib(t, R_NamesSymbol));
    UNPROTECT(2); 

This it is doing the coercion from period to numeric and because the structure of period, it is only keeping .Data

Compare

seconds_to_period(60)@.Data
seconds_to_period(59)@.Data

Therefore, at C level, R is not doing as.numeric but a faster, more efficient (but you may say less subtle because it is not realizing others elements from .Data as as.numericdoes) coercion of data.

Look as this:

 setClass("Foo", representation(.Data="numeric", number1 = "numeric", number2 = "numeric"))

 bar <- new("Foo",.Data=5, number1 = 12, number2 = 31)

 cumsum(bar) 

The result is 5, because it is only coercing to numeric Data

Moreover:

 setClass("Foo2", representation(.Data="numeric", number1 = "numeric", number2 = "numeric"))

 bar2 <- new("Foo2", number1 = 12, number2 = 31)

 cumsum(bar2) 

Give you back numeric(0) because there is no .Data

And

 setClass("Foo3", representation( number1 = "numeric", number2 = "numeric"))

 bar3 <- new("Foo3", number1 = 12, number2 = 31)

 cumsum(bar3) 

This is not working at all: without .Data, internally, R does not know how to coerce it to numeric when doing cumsum

So: it is because of how R internally works with complex S4 objects. You can always tell the lubridate people to create a new parameter seconds and store in .Data the cumulative seconds of the whole S4 object. I guess this way cumsum will work. But right now, the are using .Data to store the second argument. See edit(seconds_to_period):

function (x) 
{
  span <- as.double(x)
  remainder <- abs(span)
  newper <- period(second = rep(0, length(x)))
  slot(newper, "day") <- remainder%/%(3600 * 24)
  remainder <- remainder%%(3600 * 24)
  slot(newper, "hour") <- remainder%/%(3600)
  remainder <- remainder%%(3600)
  slot(newper, "minute") <- remainder%/%(60)
  slot(newper, ".Data") <- remainder%%(60)
  newper * sign(span)
}

Finally, just for fun. This is my mock version of how to make cumsum work here:

setClass("Period2",representation(.Data="numeric", period="Period"))


seconds_to_period_2 <- function(x){
   (lapply(x, function(y) new("Period2", .Data=y, period=seconds_to_period(y))))
}

a<-seconds_to_period_2(1:60)

cumsum(a)

Best!

Mcvey answered 23/3, 2019 at 16:40 Comment(1)
I'm really proud of this community and answers like this one. I learned a lot with it, thank you!Glossolalia

© 2022 - 2024 — McMap. All rights reserved.