I would like to use lubridate
to calculate age in years given their date of birth and today's date. Right now I have this:
library(lubridate)
today<-mdy(08312015)
dob<-mdy(09071982)
today-dob
which gives me their age in days.
I would like to use lubridate
to calculate age in years given their date of birth and today's date. Right now I have this:
library(lubridate)
today<-mdy(08312015)
dob<-mdy(09071982)
today-dob
which gives me their age in days.
This is the lubridate
approach I would take:
interval(dob, today) / years(1)
Yields the answer of 32
years.
Note that the function will complain that it cannot express the remainder of the fraction of the year. This is because year is not a fixed concept, i.e. 366 in leap years and 365 in non-leap years. You can get an answer with more detail in regard to the number of weeks and days:
interval_period = interval(dob, today)
full_year = interval_period %/% years(1)
remaining_weeks = interval_period %% years(1) %/% weeks(1)
remaining_days = interval_period %% years(1) %% weeks(1) %/% days(1)
sprintf('Your age is %d years, %d weeks and %d days', full_year, remaining_weeks, remaining_days)
# [1] "Your age is 32 years, 51 weeks and 1 days"
Note that I use %/%
for division and %%
as modulo to get the remaining weeks/days after subtracting the full years/weeks.
new_interval
is deprecated, use interval
instead –
Partee new_interval
is deprecated since v1.5.0, now it's just interval
. –
Selenite This is an old question, but I still missing the following clean approach. (Tidyverse is only necessary for the %>%
operator.)
library(tidyverse)
library(lubridate)
today<-mdy(08312015)
dob<-mdy(09071982)
interval(dob, today) %>%
as.numeric('years')
# 32.98015 - you have to decide how to deal with the fraction of a year
as.numeric("years")
work over the output of interval()
? –
Norseman as.duration(interval(dob,today)) %/% as.duration(years(1))
should do the job without errors.
as.period(today - dob, unit = "years")
This will give a message that it's only an estimate because it doesn't take into account the exact starting date and end date.
Another Tidyverse approach (with the shortest amount of code) would be
library(tidyverse)
library(lubridate)
today<-mdy(08312015)
dob<-mdy(09071982)
dob %--% today / ddays(365.25)
dob %--% today / ddays(365))
is 33.00274, which is inaccurate (32 years, 52 weeks and 1 day). dob %--% today / years(1)
is 32.98082 which is accurate (and also shorter code :-) ). Also you have a extra parenthesis in you example. –
Melodics Another answer, it's much faster. See speed test below
as.numeric(today - dob) / 365.25
Comparing all the answers
library(dplyr)
library(lubridate)
today<-mdy(08312015)
dob<-mdy(09071982)
interval(dob, today) / years(1)
> 32.98082
as.duration(interval(dob,today)) %/% as.duration(years(1))
> 32
interval(dob, today) %>% as.numeric('years')
> 32.98015
dob %--% today / ddays(365.25)
> 32.98015
as.numeric(today - dob) / 365.25
> 32.98015
I'm not sure whether 32.98082
or 32.98015
is more correct. See https://mcmap.net/q/370562/-time-difference-in-years-with-lubridate
Speed test
microbenchmark::microbenchmark(
interval(dob, today) / years(1),
as.duration(interval(dob,today)) %/% as.duration(years(1)),
interval(dob, today) %>% as.numeric('years'),
dob %--% today / ddays(365.25),
as.numeric(today - dob) / 365.25
)
> Unit: microseconds
> expr min lq mean median uq max neval
> interval(dob, today)/years(1) 1913.601 1996.1510 2172.96001 2059.1005 2102.851 6037.201 100
> as.duration(interval(dob, today))%/%as.duration(years(1)) 749.700 799.1010 912.30394 823.1510 863.751 5078.601 100
> interval(dob, today) %>% as.numeric("years") 439.701 464.0510 485.31708 480.3010 501.101 591.000 100
> dob %--% today/ddays(365.25) 394.501 427.5510 450.37502 443.7010 463.301 620.601 100
> as.numeric(today - dob)/365.25 17.400 25.9005 30.66293 32.7515 36.151 52.700 100
This gets the floored difference in years in base R:
> d1=as.Date("2021-01-23");d2=as.Date("2022-01-23")
> x=as.POSIXlt(d1);y=as.POSIXlt(d2)
> y$year-x$year-pmax(y$mon<x$mon,y$mon==x$mon&y$mday<x$mday)
[1] 1
This is similar but it was about 5 times slower in my benchmark with long vectors of dates:
> as.numeric(format(d2,"%Y"))-as.numeric(format(d1,"%Y"))-as.numeric(format(d2,"%m-%d")<format(d1,"%m-%d"))
[1] 1
And this was about 12 times slower:
> as.numeric(substr(d2,1,4))-as.numeric(substr(d1,1,4))-as.numeric(substr(d2,5,10)<substr(d1,5,10))
[1] 1
These two Lubridate options were about 4 times slower than my first base R option:
> floor(interval(d1,d2)/years())
[1] 1
> d1%--%d2%/%years()
[1] 1
These two Lubridate options are fast, but they treat the length of each year as 365.25 days, so they produce the incorrect result on dates that are on the same date but whose difference in years is not divisible by 4:
> floor(lubridate::time_length(difftime(d2,d1),"years"))
[1] 0
> floor(as.numeric(lubridate::interval(d1,d2),"years"))
[1] 0
This is similar to the two previous options but it was slightly faster in my benchmark:
> as.numeric(d2-d1)%/%365.25
[1] 0
This is a very fast way to get the floored difference in years so that leap years are considered, but it incorrectly treats 1900 and 2100 as leap years so the dates have to be greater than 1900 and less than 2100:
> n1=as.numeric(d1);n2=as.numeric(d2)
> l1=(n1-789)%/%1461+1;l2=(n2-789)%/%1461+1;(n2-n1-(l2-l1))%/%365
[1] 1
# the l1 and l2 variables are leap days since the epoch
# 789 is the first leap day after the epoch, and 1461 is 365*4+1
Benchmark:
d1=as.Date(sample(-20000:40000),"1970-1-1")
d2=as.Date(sample(-20000:40000),"1970-1-1")
pick=d1<=d2;d1=d1[pick];d2=d2[pick]
b=microbenchmark::microbenchmark(times=10,
{p1=as.POSIXlt(d1);p2=as.POSIXlt(d2);p2$year-p1$year-pmax(p1$mon>p2$mon,p1$mon==p2$mon&p1$mday>p2$mday)},
floor(interval(d1,d2)/years()),
d1%--%d2%/%years(),
floor(time_length(difftime(d2,d1),"years")),
as.numeric(d2-d1)%/%365.25,
floor(as.numeric(interval(d1,d2),"years")),
as.numeric(substr(d2,1,4))-as.numeric(substr(d1,1,4))-as.numeric(substr(d2,5,10)<substr(d1,5,10)),
as.numeric(format(d2,"%Y"))-as.numeric(format(d1,"%Y"))-as.numeric(format(d2,"%m-%d")<format(d1,"%m-%d")),
{n1=as.numeric(d1);n2=as.numeric(d2);l1=(n1-789)%/%1461+1;l2=(n2-789)%/%1461+1;(n2-n1-(l2-l1))%/%365}
)
o=sort(tapply(b$time,gsub(" +"," ",b$expr),median))
writeLines(sprintf("%.1f %s",o/min(o),names(o)))
The output shows the median time of a ten runs relative to the fastest option:
1.0 as.numeric(d2 - d1)%/%365.25
1.5 { n1 = as.numeric(d1) n2 = as.numeric(d2) l1 = (n1 - 789)%/%1461 + 1 l2 = (n2 - 789)%/%1461 + 1 (n2 - n1 - (l2 - l1))%/%365 }
1.5 floor(time_length(difftime(d2, d1), "years"))
1.7 floor(as.numeric(interval(d1, d2), "years"))
9.2 { p1 = as.POSIXlt(d1) p2 = as.POSIXlt(d2) p2$year - p1$year - pmax(p1$mon > p2$mon, p1$mon == p2$mon & p1$mday > p2$mday) }
72.2 floor(interval(d1, d2)/years())
72.3 d1 %--% d2%/%years()
78.1 as.numeric(format(d2, "%Y")) - as.numeric(format(d1, "%Y")) - as.numeric(format(d2, "%m-%d") < format(d1, "%m-%d"))
195.0 as.numeric(substr(d2, 1, 4)) - as.numeric(substr(d1, 1, 4)) - as.numeric(substr(d2, 5, 10) < substr(d1, 5, 10))
© 2022 - 2024 — McMap. All rights reserved.
year(today)-year(dob)
. But this just subtracts year 1 minus year 2. – Lundgren(today-dob)/365.25
gives meTime difference of 32.98015 days
instead of years – Kanareseas.numeric((today-dob)/365.25)
. And for a very minor increase in precision, divide by 365.2425. – Altigraphtoday - dob
is not really the lubridate way to go, but uses basic R functionality (difftime
). See my answer for alubridate
approach. – Nunhood