creating sequence of dates for each group in r
Asked Answered
H

1

9

I have a dataset that looks like this:

      ID    created_at
MUM-0001    2014-04-16
MUM-0002    2014-01-14
MUM-0003    2014-04-17
MUM-0004    2014-04-12
MUM-0005    2014-04-18
MUM-0006    2014-04-17

I am trying to introduce new column that would be all dates between start date and defined last day (say, 12th-july-2015). I used seq function in dplyr but getting an error.

data1 <- data1 %>%
         arrange(ID) %>%
         group_by(ID) %>%
         mutate(date = seq(as.Date(created_at), as.Date('2015-07-12'), by= 1))

the error which I am getting is:

Error: incompatible size (453), expecting 1 (the group size) or 1

Can you please suggest some better way to perform this task in R ?

Hair answered 7/8, 2015 at 8:49 Comment(0)
D
10

You could use data.table to get the sequence of Dates from 'created_at' to '2015-07-12', grouped by the 'ID' column.

 library(data.table)
 setDT(df1)[, list(date=seq(created_at, as.Date('2015-07-12'), by='1 day')) , ID]

If you need an option with dplyr, use do

 library(dplyr)
 df1 %>% 
   group_by(ID) %>% 
   do( data.frame(., Date= seq(.$created_at,
                            as.Date('2015-07-12'), by = '1 day')))

If you have duplicate IDs, then we may need to group by row_number()

df1 %>%
    group_by(rn=row_number()) %>%
     do(data.frame(ID= .$ID, Date= seq(.$created_at,
          as.Date('2015-07-12'), by = '1 day'), stringsAsFactors=FALSE))

Update

Based on @Frank's commment, the new idiom for tidyverse is

library(tidyverse)
df1 %>%
  group_by(ID) %>% 
  mutate(d = list(seq(created_at, as.Date('2015-07-12'), by='1 day')), created_at = NULL) %>%
  unnest()

In the case of data.table

setDT(df1)[, list(date=seq(created_at, 
             as.Date('2015-07-12'), by = '1 day')), by = 1:nrow(df1)] 

data

df1 <- structure(list(ID = c("MUM-0001", "MUM-0002", "MUM-0003",
 "MUM-0004", 
 "MUM-0005", "MUM-0006"), created_at = structure(c(16176, 16084, 
16177, 16172, 16178, 16177), class = "Date")), .Names = c("ID", 
"created_at"), row.names = c(NA, -6L), class = "data.frame")
Dominickdominie answered 7/8, 2015 at 8:51 Comment(4)
I used the same code you shared: data1 <- data1 %>% group_by(NEW_FORM_ID) %>% do( data.frame(., Date= seq(.$created_at, as.Date('2015-07-12'), by = '1 day'))) but getting an error: Error in seq.Date(.$created_at, as.Date("2015-07-12"), by = "1 day") : 'from' must be of length 1Hair
@DheerajSingh I used the data you showed as example. Updated with the dput output. It is not giving me error. Do you have duplicated 'IDs'?Dominickdominie
there were few duplicates in ids. Now it is working. thanks akrun.Hair
I just used this as a dupe target. I'm told by tidyversers that the new idiom is to make a list col and unnest it, maybe like df1 %>% group_by(ID) %>% mutate(d = list(seq(created_at, as.Date('2015-07-12'), by='1 day')), created_at = NULL) %>% unnest()Genovera

© 2022 - 2024 — McMap. All rights reserved.