I have a specific problem; I have data in the following format:
# USER_ID SUBMISSION_DATE CONTRACT_REF
1 1 20/6 1:00 W001
2 1 20/6 2:00 W002
3 1 20/6 3:30 W003
4 4 20/6 4:00 W004
5 5 20/6 5:00 W005
6 5 20/6 6:00 W006
7 7 20/6 7:00 W007
8 7 20/6 8:00 W008
9 7 20/6 9:00 W009
10 7 20/6 10:00 W0010
Now I need to somehow calculate the time difference between the different submissions (uniquely identifiable).
In other words: I have a table of submissions, in this table, there are all submissions for all users. I need to find a way how to calculate the time difference for each unique STUDENT-CONTRACT tuple between nth assignment and the (n-1)th assignment.
Also note that each new user has to has zero for the new assignment. So the output would look as follows:
# USER_ID SUBMISSION_DATE CONTRACT_REF TIME_DIFFRENCE
1 1 20/6 1:00 W001 0
2 1 20/6 2:00 W002 3600
3 1 20/6 3:30 W003 5400
4 4 20/6 4:00 W004 3600
5 5 20/6 5:00 W005 0
6 5 20/6 6:00 W006 3600
7 7 20/6 7:00 W007 0
8 7 20/6 8:00 W008 3600
9 7 20/6 9:00 W009 3600
10 7 20/6 10:00 W0010 3600
Note that the time may NOT be in seconds, but whatever is suitable.
My thoughts:
1) I presume this will require as.POSIXct somewhere so that R knows how to deal with the time
2) This may involve some package such as plyr
, but I am so utterly lost in the documentation and examples are hard to find.
Thank you very much for all responses!
Best, Jakub
unlist(tapply(...))
though - if the data.frame isn't sorted it won't be in the correct order. – Raddi