I am trying to evaluate the impact of store visitors on the spread of COVID-19.
Here is a simple scenario:
- VisitorA walks into store and meets Employee1 @ Time = 0.
- VisitorA then meets Employee2 @ Time = 1.
- VisitorB walks into store and meets Employee1 @ Time = 1.
- VisitorB then meets Employee3 @ Time = 2.
- VisitorA leaves store.
When I collect all visitor data and who they met over time the data set looks something like this:
Table visitorByEmployee
:
| VisitorID | EmployeeID | Contact |
+-----------+------------+-------------------+
| 100 | X123 | 3/11/2020 1:00 |
| 100 | X124 | 3/11/2020 1:10 |
| 101 | X123 | 3/12/2020 1:11 |
| 101 | X125 | 3/11/2020 1:20 |
| 102 | X126 | 3/12/2020 10:00 |
| 102 | X124 | 3/12/2020 10:00 |
| 103 | X123 | 3/12/2020 11:00 |
| 104 | X124 | 3/12/2020 12:00 |
| 104 | X126 | 3/12/2020 12:00 |
| 105 | X126 | 3/12/2020 12:00 |
I want to build a hierarchy off of this data that can eventually be expressed as follows:
Each Tree represents the impact of the Visitors on the spread of the virus:
100
--> X123
--> 101
--> X125
--> 103
--> X124
--> 104
102
--> X126
--> 104
--> 105
--> X124
--> 104
--> X126
I attempted to do this by first finding the root nodes (root visitors who were no impacted by previous visitors and/or employees they saw). These were 100 and 102.
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY EmployeeID ORDER BY Contact) AS SeenOrder
INTO
#SeenOrder
FROM
visitorByEmployee
SELECT *
INTO #RootVisitors
FROM #SeenOrder
WHERE SeenOrder = 1
From #RootVisitors
and #SeenOrder
, I want to build a table that can tell me that hierarchy of impact and maybe result in something like this:
| InitVisitorID | HLevel | EmployeeID | VisitorID |
+---------------+------------+-------------------+-------------+
| 100 | 0 | X123 | 100 |
| 100 | 0 | X124 | 100 |
| 100 | 1 | X123 | 101 |
| 100 | 1 | X123 | 103 |
| 100 | 1 | X124 | 104 |
| 100 | 2 | X125 | 101 |
| 102 | 0 | X126 | 102 |
| 102 | 0 | X124 | 102 |
| 102 | 1 | X126 | 104 |
| 102 | 1 | X126 | 105 |
| 102 | 1 | X124 | 104 |
| 102 | 2 | X126 | 104 |
Is this something that can be done using a recursive CTEs? I attempted to do this but due to the shifting hierarchy from visitor to employee to visitor to employee, I am having a hard time creating that recursive CTE.
UPDATE Here is the recursive CTE I am working on. It doesn't work yet but the approach is what I am sharing:
; WITH exposure_tree AS (
/* == Anchor with the root visitors == */
/* == You can think of this: The Employees who were exposed by the Visior == */
SELECT re.VisitorID InitVisitor,
1 as Level,
CASE WHEN 1%2=1 THEN 'Visitor' ELSE 'Employee' END ExposerType,
re.VisitorID Exposer,
re.EmployeeID Exposee,
re.SeenOrder,
re.InitialContact
FROM #SeenOrder re
WHERE re.SeenOrder = 1
/* == Recursive Part #1 ==
Get the visitors who were exposed next by the exposed employees
*/
UNION ALL
SELECT et.VisitorID InitVisitor,
Level + 1,
CASE WHEN (Level+1)%2=1 THEN 'Visitor' ELSE 'Employee' END ExposerType,
re.EmployeeID,
re.VisitorID, -- These are switched from the anchor.
re.SeenOrder,
re.InitialContact
FROM #SeenOrder re
JOIN exposure_tree et ON et.Exposee = re.EmployeeID AND re.SeenOrder > 1 AND re.InitialContact > et.InitialContact
UNION ALL
/* == Recursive Part #2 ==
Get the next employees who were exposed the second level exposed visitors
*/
SELECT et.VisitorID InitVisitor,
Level + 2,
CASE WHEN (Level+2)%2=1 THEN 'Visitor' ELSE 'Employee' END ExposerType,
re.VisitorID,
re.EmployeeID,
re.SeenOrder,
re.InitialContact
FROM #ROOT_EXPOSURES re
JOIN exposure_tree et ON re.VisitorID = et.Exposer and re.SeenOrder > 1 AND re.InitialContact > et.InitialContact
)
select top 1000 * from exposure_tree ORDER BY InitVisitor, Level
101
's meeting withX125
was the first meeting for both that employee and that visitor. Or is it an error that the date suddenly changes to11
not12
. Conversely visitor 102 first met with Employee X124 - but X124 had already seen Visitor 100 so why is 102 a "root"? – Hughey3/12/2020 10:00
– Hughey