Splitting single data frame row into multiple rows while performing calculation
Asked Answered
E

1

5

I have a df akin to df1 where I want to break out the rows so that the HOURS column is in intervals of 4, shown in df2. How would I approach this problem and what packages are recommended?

IDs can have more than one sequence on a given day. For example, an ID can be listed 2-3 times on a given day, being assigned more than one unit and and more than one CODE.

The following are required:

  • All categorical data must remain the same on child rows (e.g., CODE stays the same on every child row)
  • If there is a remainder that is less than four, the remainder amount should be listed on the last line (e.g., df2; row B)
  • If a child row starts or ends on the next date the date column should be updated accordingly (e.g., df2; row E)

df1 (current)

EMPLID TIME_RPTG_CD START_DATE_TIME     END_DATE_TIME       Hrs_Time_Worked
   <chr>  <chr>        <dttm>              <dttm>                        <dbl>
 1 X00007 REG          2014-07-03 16:00:00 2014-07-03 02:00:00            10.0

df2 (desired)

 EMPLID TIME_RPTG_CD START_DATE_TIME     END_DATE_TIME       Hrs_Time_Worked
   <chr>  <chr>        <dttm>              <dttm>                        <dbl>
1 X00007 REG          2014-07-03 16:00:00 2014-07-03 20:00:00            4.0
1 X00007 REG          2014-07-03 20:00:00 2014-07-04 24:00:00            4.0
1 X00007 REG          2014-07-04 24:00:00 2014-07-04 02:00:00            2.0
Earthy answered 4/6, 2018 at 23:48 Comment(0)
X
8
library(tidyverse)
library(lubridate)
df1%>%
 group_by(Row)%>%
 mutate(S=paste(START_DATE,START_TIME),
        HOURS=list((n<-c(rep(4,HOURS%/%4),HOURS%%4))[n!=0]))%>%
 unnest()%>%
 mutate(E=dmy_hm(S)+hours(cumsum(HOURS)),
        S=E-hours(unlist(HOURS)),
        START_DATE=format(S,"%d-%b-%y"),
        END_DATE=format(E,"%d-%b-%y"),
        START_TIME=format(S,"%H:%M"),
        END_TIME=format(E,"%H:%M"),S=NULL,E=NULL)
# A tibble: 6 x 9
# Groups:   Row [3]
  Row      ID UNIT  CODE  START_DATE END_DATE  START_TIME END_TIME HOURS
  <chr> <int> <chr> <chr> <chr>      <chr>     <chr>      <chr>    <dbl>
1 A         1 3ESD  REG   06-Aug-14  06-Aug-14 01:00      05:00       4.
2 A         1 3ESD  REG   06-Aug-14  06-Aug-14 05:00      07:00       2.
3 B         2 3E14E OE2   12-Aug-14  13-Aug-14 21:00      01:00       4.
4 C         3 3E5E  REG   19-Aug-14  20-Aug-14 21:00      01:00       4.
5 C         3 3E5E  REG   20-Aug-14  20-Aug-14 01:00      05:00       4.
6 C         3 3E5E  REG   20-Aug-14  20-Aug-14 05:00      07:00       2.
Xerophilous answered 5/6, 2018 at 0:38 Comment(5)
When I run the code I get an error : "Error in mutate_impl(.data, dots) : Evaluation error: invalid class <U+0093>Period<U+0094> object: periods must have integer values." Is this a formatting issue on my end?Earthy
are you using the data you provided? its been long since I answered this question. I even don't remember what was happeningXerophilous
I just updated the data in the posting to account for formats. Slightly different than the original. I appreciate the helpEarthy
You should be able to obtain an idea from this answer on how to solve the problem you have at hand. Changing the data structure 5 days after the question was asked wont help that much. Or rather, you can ask another question about it.In that case you might even get a better solution than this. A solution that solves all your problemsXerophilous
I was able to make the code work by using dhours() instead of hours() in the second mutate function. Thanks again!Earthy

© 2022 - 2024 — McMap. All rights reserved.