Time difference in years with lubridate?
Asked Answered
K

7

44

I would like to use lubridate to calculate age in years given their date of birth and today's date. Right now I have this:

library(lubridate)
today<-mdy(08312015)
dob<-mdy(09071982)
today-dob

which gives me their age in days.

Kanarese answered 31/8, 2015 at 13:52 Comment(5)
Is dividing by 365.25 not accurate enough?Psalms
Dividing by 365.25. Or maybe use year(today)-year(dob). But this just subtracts year 1 minus year 2.Lundgren
Yes, but (today-dob)/365.25 gives me Time difference of 32.98015 days instead of yearsKanarese
what you're seeing is really just a label. I often find it easier to change the class of the result: as.numeric((today-dob)/365.25). And for a very minor increase in precision, divide by 365.2425.Altigraph
Note that using today - dob is not really the lubridate way to go, but uses basic R functionality (difftime). See my answer for a lubridate approach.Nunhood
N
70

This is the lubridate approach I would take:

interval(dob, today) / years(1)

Yields the answer of 32 years.

Note that the function will complain that it cannot express the remainder of the fraction of the year. This is because year is not a fixed concept, i.e. 366 in leap years and 365 in non-leap years. You can get an answer with more detail in regard to the number of weeks and days:

interval_period = interval(dob, today)
full_year = interval_period %/% years(1)
remaining_weeks = interval_period %% years(1) %/% weeks(1)
remaining_days = interval_period %% years(1) %% weeks(1) %/% days(1)
sprintf('Your age is %d years, %d weeks and %d days', full_year, remaining_weeks, remaining_days)
# [1] "Your age is 32 years, 51 weeks and 1 days"

Note that I use %/% for division and %% as modulo to get the remaining weeks/days after subtracting the full years/weeks.

Nunhood answered 31/8, 2015 at 14:20 Comment(4)
Thanks, this does what I want. Note new_interval is deprecated, use interval insteadPartee
Please also note 'new_interval' is deprecated; use 'interval' instead. Deprecated in version '1.5.0'.Adna
What's the difference between new_interval(dob, today) and dob-today and as.period(today - dob, unit = "years")?Atal
new_interval is deprecated since v1.5.0, now it's just interval.Selenite
D
13

This is an old question, but I still missing the following clean approach. (Tidyverse is only necessary for the %>% operator.)

library(tidyverse)
library(lubridate)

today<-mdy(08312015)
dob<-mdy(09071982)

interval(dob, today) %>%
  as.numeric('years')

# 32.98015 - you have to decide how to deal with the fraction of a year
Disparagement answered 17/3, 2020 at 6:55 Comment(1)
This is really neat. Would you mind explaining how as.numeric("years") work over the output of interval()?Norseman
S
6
as.duration(interval(dob,today)) %/% as.duration(years(1))

should do the job without errors.

Siemens answered 4/7, 2017 at 7:41 Comment(1)
Thank you for this code snippet, which may provide some immediate help. A proper explanation would greatly improve its educational value by showing why this is a good solution to the problem, and would make it more useful to future readers with similar, but not identical, questions. Please edit your answer to add explanation, and give an indication of what limitations and assumptions apply.Maronite
S
4
as.period(today - dob, unit = "years")

This will give a message that it's only an estimate because it doesn't take into account the exact starting date and end date.

Shawannashawl answered 31/8, 2015 at 14:31 Comment(0)
N
3

Another Tidyverse approach (with the shortest amount of code) would be

library(tidyverse)
library(lubridate)

today<-mdy(08312015)
dob<-mdy(09071982)

dob %--% today / ddays(365.25)

Newcastle answered 13/6, 2021 at 5:24 Comment(2)
dob %--% today / ddays(365)) is 33.00274, which is inaccurate (32 years, 52 weeks and 1 day). dob %--% today / years(1) is 32.98082 which is accurate (and also shorter code :-) ). Also you have a extra parenthesis in you example.Melodics
Thanks for the paranthesis...I've removed...so it depends how you define a year: in a leap year, you'll have 366 days; in a normal year, you'll have 365.25. I added the .25 in which case, we get the same answer. I do like the years approach though...and yes, I guess, technically shorter:)Newcastle
G
1

Another answer, it's much faster. See speed test below

as.numeric(today - dob) / 365.25

Comparing all the answers

library(dplyr)
library(lubridate)

today<-mdy(08312015)
dob<-mdy(09071982)

interval(dob, today) / years(1)
> 32.98082

as.duration(interval(dob,today)) %/% as.duration(years(1))
> 32

interval(dob, today) %>% as.numeric('years')
> 32.98015

dob %--% today / ddays(365.25)
> 32.98015

as.numeric(today - dob) / 365.25
> 32.98015

I'm not sure whether 32.98082 or 32.98015 is more correct. See https://mcmap.net/q/370562/-time-difference-in-years-with-lubridate

Speed test

microbenchmark::microbenchmark(
  interval(dob, today) / years(1),
  as.duration(interval(dob,today)) %/% as.duration(years(1)),
  interval(dob, today) %>% as.numeric('years'),
  dob %--% today / ddays(365.25),
  as.numeric(today - dob) / 365.25
)

> Unit: microseconds
>                                                       expr      min        lq       mean    median       uq      max neval
>                              interval(dob, today)/years(1) 1913.601 1996.1510 2172.96001 2059.1005 2102.851 6037.201   100
>  as.duration(interval(dob, today))%/%as.duration(years(1))  749.700  799.1010  912.30394  823.1510  863.751 5078.601   100
>               interval(dob, today) %>% as.numeric("years")  439.701  464.0510  485.31708  480.3010  501.101  591.000   100
>                               dob %--% today/ddays(365.25)  394.501  427.5510  450.37502  443.7010  463.301  620.601   100
>                             as.numeric(today - dob)/365.25   17.400   25.9005   30.66293   32.7515   36.151   52.700   100
Guardhouse answered 21/6, 2022 at 8:32 Comment(0)
O
0

This gets the floored difference in years in base R:

> d1=as.Date("2021-01-23");d2=as.Date("2022-01-23")
> x=as.POSIXlt(d1);y=as.POSIXlt(d2)
> y$year-x$year-pmax(y$mon<x$mon,y$mon==x$mon&y$mday<x$mday)
[1] 1

This is similar but it was about 5 times slower in my benchmark with long vectors of dates:

> as.numeric(format(d2,"%Y"))-as.numeric(format(d1,"%Y"))-as.numeric(format(d2,"%m-%d")<format(d1,"%m-%d"))
[1] 1

And this was about 12 times slower:

> as.numeric(substr(d2,1,4))-as.numeric(substr(d1,1,4))-as.numeric(substr(d2,5,10)<substr(d1,5,10))
[1] 1

These two Lubridate options were about 4 times slower than my first base R option:

> floor(interval(d1,d2)/years())
[1] 1
> d1%--%d2%/%years()
[1] 1

These two Lubridate options are fast, but they treat the length of each year as 365.25 days, so they produce the incorrect result on dates that are on the same date but whose difference in years is not divisible by 4:

> floor(lubridate::time_length(difftime(d2,d1),"years"))
[1] 0
> floor(as.numeric(lubridate::interval(d1,d2),"years"))
[1] 0

This is similar to the two previous options but it was slightly faster in my benchmark:

> as.numeric(d2-d1)%/%365.25
[1] 0

This is a very fast way to get the floored difference in years so that leap years are considered, but it incorrectly treats 1900 and 2100 as leap years so the dates have to be greater than 1900 and less than 2100:

> n1=as.numeric(d1);n2=as.numeric(d2)
> l1=(n1-789)%/%1461+1;l2=(n2-789)%/%1461+1;(n2-n1-(l2-l1))%/%365
[1] 1
# the l1 and l2 variables are leap days since the epoch
# 789 is the first leap day after the epoch, and 1461 is 365*4+1

Benchmark:

d1=as.Date(sample(-20000:40000),"1970-1-1")
d2=as.Date(sample(-20000:40000),"1970-1-1")
pick=d1<=d2;d1=d1[pick];d2=d2[pick]

b=microbenchmark::microbenchmark(times=10,
  {p1=as.POSIXlt(d1);p2=as.POSIXlt(d2);p2$year-p1$year-pmax(p1$mon>p2$mon,p1$mon==p2$mon&p1$mday>p2$mday)},
  floor(interval(d1,d2)/years()),
  d1%--%d2%/%years(),
  floor(time_length(difftime(d2,d1),"years")),
  as.numeric(d2-d1)%/%365.25,
  floor(as.numeric(interval(d1,d2),"years")),
  as.numeric(substr(d2,1,4))-as.numeric(substr(d1,1,4))-as.numeric(substr(d2,5,10)<substr(d1,5,10)),
  as.numeric(format(d2,"%Y"))-as.numeric(format(d1,"%Y"))-as.numeric(format(d2,"%m-%d")<format(d1,"%m-%d")),
  {n1=as.numeric(d1);n2=as.numeric(d2);l1=(n1-789)%/%1461+1;l2=(n2-789)%/%1461+1;(n2-n1-(l2-l1))%/%365}
)

o=sort(tapply(b$time,gsub("  +"," ",b$expr),median))
writeLines(sprintf("%.1f %s",o/min(o),names(o)))

The output shows the median time of a ten runs relative to the fastest option:

1.0 as.numeric(d2 - d1)%/%365.25
1.5 { n1 = as.numeric(d1) n2 = as.numeric(d2) l1 = (n1 - 789)%/%1461 + 1 l2 = (n2 - 789)%/%1461 + 1 (n2 - n1 - (l2 - l1))%/%365 }
1.5 floor(time_length(difftime(d2, d1), "years"))
1.7 floor(as.numeric(interval(d1, d2), "years"))
9.2 { p1 = as.POSIXlt(d1) p2 = as.POSIXlt(d2) p2$year - p1$year - pmax(p1$mon > p2$mon, p1$mon == p2$mon & p1$mday > p2$mday) }
72.2 floor(interval(d1, d2)/years())
72.3 d1 %--% d2%/%years()
78.1 as.numeric(format(d2, "%Y")) - as.numeric(format(d1, "%Y")) - as.numeric(format(d2, "%m-%d") < format(d1, "%m-%d"))
195.0 as.numeric(substr(d2, 1, 4)) - as.numeric(substr(d1, 1, 4)) - as.numeric(substr(d2, 5, 10) < substr(d1, 5, 10))
Oneal answered 24/12, 2023 at 18:31 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.