R - How can I choose the earliest date column from date columns?

Asked 23/8, 2016 at 9:51 Answered 23/8, 2016 at 10:5

I would like to get a column that has the earliest date in each row from multiple date columns.

My dataset is like this.

df = data.frame( x_date = as.Date( c("2016-1-3", "2016-3-5", "2016-5-5")) , y_date = as.Date( c("2016-2-2", "2016-3-1", "2016-4-4")), z_date = as.Date(c("2016-3-2", "2016-1-1", "2016-7-1")) )

+---+-----------+------------+-----------+ | | x_date | y_date | z_date | +---+-----------+------------+-----------+ |1 | 2016-01-03 | 2016-02-02 |2016-03-02 | |2 | 2016-03-05 | 2016-03-01 |2016-01-01 | |3 | 2016-05-05 | 2016-04-04 |2016-07-01 | +---+-----------+------------+-----------+

I would like to get something like the following column.

+---+---------------+ | | earliest_date | +---+---------------+ |1 | 2016-01-03 | |2 | 2016-01-01 | |3 | 2016-04-04 | +---+---------------+

This is my code, but it outputs the earliest date from the overall columns and rows....

library(dplyr) df %>% dplyr::mutate(earliest_date = min(x_date, y_date, z_date))

Ichthyo answered 23/8, 2016 at 9:51 Comment(0)

One option is pmin

df %>% 
   mutate(earliest_date = pmin(x_date, y_date, z_date))
#    x_date     y_date     z_date   earliest_date
#1 2016-01-03 2016-02-02 2016-03-02    2016-01-03
#2 2016-03-05 2016-03-01 2016-01-01    2016-01-01
#3 2016-05-05 2016-04-04 2016-07-01    2016-04-04

If we need only the single column, then transmute is the option

df %>%
    transmute(earliest_date = pmin(x_date, y_date,z_date))

Floe answered 23/8, 2016 at 10:5 Comment(3)

This is what I wanted to do! pmin() the function I need to use. Thank you very much. – Ichthyo 23/8, 2016 at 23:55

Additionally, when I tried to use pmin() for rows with missing values, NAs, I needed to use ifelse() to deal with NAs. However, at that time, Date class was automatically converted to double type (precisely speaking, Date class information was removed). To keep class information of Date, I tried safe.ifelse() proposed here , and it's working fine. – Ichthyo 24/8, 2016 at 1:34

@Ichthyo There is na.rm argument in pmin . By default, it is FALSE i.e. pmin(x_date, y_date, z_date, na.rm = TRUE) – Floe 24/8, 2016 at 3:33

You can apply rowwise to get minimum of the date (as the dates are already of class Date)

apply(df, 1, min)

#[1] "2016-01-03" "2016-01-01" "2016-04-04"

Or you can also use pmin with do.call

do.call(pmin, df)

#[1] "2016-01-03" "2016-01-01" "2016-04-04"

Seine answered 23/8, 2016 at 9:55 Comment(0)

You need to transform your data set first if you want the output to be a data frame with columns in rows.

library(reshape2)
melt(df) %>% group_by(variable) %>% summarize(earliest_date = min(value))

Supportable answered 23/8, 2016 at 9:57 Comment(1)

Thanks for the hint, I had forgotten about that. – Supportable 23/8, 2016 at 10:5

Recommended topics

Hot tags