Time difference in every n/two rows
Asked Answered
B

5

5

I have this dataset

library(dplyr)

# creating a dataframe
data_frame <- data.frame(id = c(1,1,2,2,3,3),
                         col2 = c("start", "finish", "start", "finish","start", "finish"),
                         col3 =  c(as.POSIXct("2021-05-08 08:32:07"),
                                   as.POSIXct("2021-05-08 08:32:45"),
                                   as.POSIXct("2020-11-28 23:32:09"),
                                   as.POSIXct("2020-11-28 23:32:25"),
                                   as.POSIXct("2021-05-08 08:32:07"),
                                   as.POSIXct("2021-05-08 08:32:12")))

# computing difference from start to finish
df <- data_frame %>%
  arrange(id) %>%
  mutate(timetaken = col3 - lag(col3))

df

I want

  1. To calculate the time difference from start to finish.
  2. Average time difference

To calculate the time difference I can calculate the lag as show above. However, I want to have output in the following way and calculate the average time of that variable.

> df
  id   col2                col3      timetaken
1  1  start 2021-05-08 08:32:07        NA secs
2  1 finish 2021-05-08 08:32:45        38 secs
3  2  start 2020-11-28 23:32:09        NA secs
4  2 finish 2020-11-28 23:32:25        16 secs
5  3  start 2021-05-08 08:32:07        NA secs
6  3 finish 2021-05-08 08:32:12         5 secs

Any idea how to achieve this?

Bosomy answered 28/8 at 9:40 Comment(0)
S
3

If you want to use dplyr::lag():

library(dplyr)

df <- data_frame |>
  mutate(timetaken = col3 - lag(col3), .by = id)

df
#   id   col2                col3 timetaken
# 1  1  start 2021-05-08 08:32:07   NA secs
# 2  1 finish 2021-05-08 08:32:45   38 secs
# 3  2  start 2020-11-28 23:32:09   NA secs
# 4  2 finish 2020-11-28 23:32:25   16 secs
# 5  3  start 2021-05-08 08:32:07   NA secs
# 6  3 finish 2021-05-08 08:32:12    5 secs

and a dplyr() approach to get the mean (although, as in the previous answer, I find using base R cleaner in this instance):

summarise(df, mean(timetaken, na.rm = TRUE))
#   mean(timetaken, na.rm = TRUE)
# 1                 19.66667 secs
Satirist answered 28/8 at 10:8 Comment(0)
K
3

You can compute the difference between "start" and "finish" for each group like so:

data_frame %>%
  mutate(timetaken = ifelse(col2 == "start", NA, 
                            col3[col2 == "finish"] - col3[col2 == "start"]), 
         .by = id)

#   id   col2                col3 timetaken
# 1  1  start 2021-05-08 08:32:07        NA
# 2  1 finish 2021-05-08 08:32:45        38
# 3  2  start 2020-11-28 23:32:09        NA
# 4  2 finish 2020-11-28 23:32:25        16
# 5  3  start 2021-05-08 08:32:07        NA
# 6  3 finish 2021-05-08 08:32:12         5

Get the average time difference like so:

mean(data_frame$timetaken, na.rm = TRUE)
#[1] 19.66667
Kanpur answered 28/8 at 9:44 Comment(0)
S
3

If you want to use dplyr::lag():

library(dplyr)

df <- data_frame |>
  mutate(timetaken = col3 - lag(col3), .by = id)

df
#   id   col2                col3 timetaken
# 1  1  start 2021-05-08 08:32:07   NA secs
# 2  1 finish 2021-05-08 08:32:45   38 secs
# 3  2  start 2020-11-28 23:32:09   NA secs
# 4  2 finish 2020-11-28 23:32:25   16 secs
# 5  3  start 2021-05-08 08:32:07   NA secs
# 6  3 finish 2021-05-08 08:32:12    5 secs

and a dplyr() approach to get the mean (although, as in the previous answer, I find using base R cleaner in this instance):

summarise(df, mean(timetaken, na.rm = TRUE))
#   mean(timetaken, na.rm = TRUE)
# 1                 19.66667 secs
Satirist answered 28/8 at 10:8 Comment(0)
A
3

If you just want one row per id, you could reshape:

tidyr::pivot_wider(data_frame, names_from=col2, values_from=col3) |>
   mutate(timetaken=difftime(finish, start)) |>
   print() |>
   summarise(avg=mean(timetaken, na.rm=TRUE))

# A tibble: 3 x 4
     id start               finish              timetaken
  <dbl> <dttm>              <dttm>              <drtn>   
1     1 2021-05-08 08:32:07 2021-05-08 08:32:45 38 secs  
2     2 2020-11-28 23:32:09 2020-11-28 23:32:25 16 secs  
3     3 2021-05-08 08:32:07 2021-05-08 08:32:12  5 secs 

# A tibble: 1 x 1
  avg          
  <drtn>       
1 19.66667 secs
Allogamy answered 28/8 at 10:8 Comment(0)
O
3

Assuming it is always start/finish every row, then we can get diff for every row and subset alternating results:

diffSec <- diff(data_frame$col3)[ c(TRUE, FALSE) ]
diffSec
# Time differences in secs
# [1] 38 16  5
mean(diffSec)
# Time difference of 19.66667 secs
Orotund answered 28/8 at 10:23 Comment(0)
G
1

It appears to me that for the desired operation the data should be in wide format. If your real data is as well organised as the toy data, we do not need to account for grouping on id, see ave().

Base R

reshape(data_frame, direction="wide", idvar="id", timevar="col2") |>
  setNames(c("id", "start", "finish")) |> # cosmetics
  transform(diff = difftime(finish, start))

giving

  id               start              finish    diff
1  1 2021-05-08 08:32:07 2021-05-08 08:32:45 38 secs
3  2 2020-11-28 23:32:09 2020-11-28 23:32:25 16 secs
5  3 2021-05-08 08:32:07 2021-05-08 08:32:12  5 secs
Gamic answered 28/8 at 11:40 Comment(1)
How does your answer differ from mine?Allogamy

© 2022 - 2024 — McMap. All rights reserved.