Conting events between sequential stages in a process using R

I've been trying to resolve an exercise from a textbook where I am faced with the challenge of count different events between sequential stages of an industrial process.

Information related to the process: A test Subject is made to go through a 3 stage process with stages A, B and C respectively the first one being A, second B and finally C; a test Subject may abandon the process at stages A or B and then start again from point A, each time the process takes place a dataset is created with the IDENTIFICATION of the test subject, the TIMESTAMP in which the stage took place and a unique VISIT_CODE During any stage, a test subject may trigger an "ALERT" and this will be recorded with the TIMESTAMP, ALERT_CODE and test subject IDENTIFICATION.

What is to be calculated: I have to create a code in R to count how many ALERTS where generated by the test subjects between stages A and B, between stages B and C and finally how many ALERTS where generated after C. Please note that a test subject may at some point abandon the process to later on start again from point A.

The textbook gives a HINT: "Take a closer look at the stage a test subject is currently to then determine if the ALERT is generated from stage A and before a stage B and also if the test subject triggers an ALERT after stage B and before stage C, but keep in mind that if a test subject abandons at stage A and triggers an ALERT if the TIMESTAMP of that ALERT is smaller than their next attempt at stage A then it should be credited as an ALERT after a stage A"

as another hint, the textbook reveals that the ALERTS after Stage C where only 1 and that it was trigger by test subject W-6 with ALTER_CODE AYUJ-3915716168. The datasets are:

Stage Process

TableA<-tribble(~STAGE, ~TEST_SUBJECT,~TIMESTAMP,~VISIT_CODE,
"A",    "XYU-1",    "10",   "BKO",
"A",    "XYU-1",    "15",   "JUJD",
"B",    "XYU-1",    "20",   "DUDH",
"A",    "FF-09",    "25",   "KSIWJD",
"B",    "FF-09",    "30",   "AJAKAM",
"C",    "FF-09",    "35",   "ZISKS",
"A",    "UU-89",    "40",   "NNXJD",
"B",    "UU-89",    "45",   "DDUWO",
"A",    "I-44", "50",   "JIWIW",
"A",    "W-6",  "55",   "SHDN",
"B",    "W-6",  "60",   "IWOLS",
"C",    "W-6",  "65",   "JDDD",
"A",    "U-90", "70",   "DJDKSMS",
"B",    "U-90", "75",   "NDJSM",
"A",    "T-87", "80",   "DNDJDK")

Alerts dataset

TableB<-tribble(~TEST_SUBJECT,~TIMESTAMP,~ALERT_CODE,
"XYU-1",    "11",   "AYUJ-151571406",
"XYU-1",    "12",   "AYUJ-487008829",
"XYU-1",    "28",   "AYUJ-211990388",
"FF-09",    "32",   "AYUJ-4177221842",
"W-6",  "56",   "AYUJ-1300211351",
"W-6",  "63",   "AYUJ-3014305494",
"I-44", "67",   "AYUJ-4454800551",
"U-90", "73",   "AYUJ-1079921935",
"U-90", "76",   "AYUJ-3348911727",
"U-90", "79",   "AYUJ-2381219626",
"T-87", "82",   "AYUJ-4778326278",
"W-6",  "89",   "AYUJ-3915716168")

SOLUTION:

The textbook states that the proper solution for this problem is:

Alerts between Stages A & B including alerts from test subjects that abandoned the process in the attempt nth at stage A	Alerts between Stages B & C including alerts from test subjects that abandoned the process in the attempt nth at stage B	Alerts after stage C
AYUJ-151571406	AYUJ-211990388	AYUJ-3915716168
AYUJ-487008829	AYUJ-3014305494
AYUJ-1300211351	AYUJ-3348911727
AYUJ-1079921935	AYUJ-4177221842
AYUJ-4778326278	AYUJ-2381219626
AYUJ-4454800551

What have I done? : I have tried grouping and "joining" the data by TEST_SUBJECT and stage and timestamp to make the counts but I and finding a really hard to set the conditions to make the code count events between stages because I am finding difficult to corelate which events took plave before and after each timespan by test subject and stage.

Thank you so much to all of you talented and cool people for your help or recomandations

Here is a data.table apprioach, resulting in a list of alerts after a-b-c..

library(data.table)
# Make tables data.table format
setDT(TableA)
setDT(TableB)
# set TiMESTAP to numeric
TableA[, TIMESTAMP := as.numeric(TIMESTAMP)]
TableB[, TIMESTAMP := as.numeric(TIMESTAMP)]
# Create data.table with Stage intervals by test subject
DT.interval <- TableA[, .(start = min(TIMESTAMP)), by = .(TEST_SUBJECT, STAGE)]
# Perform rolling join
TableB[, Stage := DT.interval[TableB, 
                              STAGE, 
                              on = .(TEST_SUBJECT, start = TIMESTAMP), 
                              roll = Inf]][]
# Split alerts by stage
split(TableB[,3:4], by = "Stage")
# $A
#         ALERT_CODE Stage
# 1:  AYUJ-151571406     A
# 2:  AYUJ-487008829     A
# 3: AYUJ-1300211351     A
# 4: AYUJ-4454800551     A
# 5: AYUJ-1079921935     A
# 6: AYUJ-4778326278     A
# 
# $B
#         ALERT_CODE Stage
# 1:  AYUJ-211990388     B
# 2: AYUJ-4177221842     B
# 3: AYUJ-3014305494     B
# 4: AYUJ-3348911727     B
# 5: AYUJ-2381219626     B
# 
# $C
#         ALERT_CODE Stage
# 1: AYUJ-3915716168     C

Recommended topics

Hot tags