Complex recursive SQL to produce hierarchical data
Asked Answered
U

1

6

I am trying to evaluate the impact of store visitors on the spread of COVID-19.

Here is a simple scenario:

  1. VisitorA walks into store and meets Employee1 @ Time = 0.
  2. VisitorA then meets Employee2 @ Time = 1.
  3. VisitorB walks into store and meets Employee1 @ Time = 1.
  4. VisitorB then meets Employee3 @ Time = 2.
  5. VisitorA leaves store.

When I collect all visitor data and who they met over time the data set looks something like this:

Table visitorByEmployee:

| VisitorID | EmployeeID | Contact           |
+-----------+------------+-------------------+
| 100       |   X123     | 3/11/2020 1:00    |
| 100       |   X124     | 3/11/2020 1:10    |
| 101       |   X123     | 3/12/2020 1:11    |
| 101       |   X125     | 3/11/2020 1:20    |
| 102       |   X126     | 3/12/2020 10:00   |
| 102       |   X124     | 3/12/2020 10:00   |
| 103       |   X123     | 3/12/2020 11:00   |
| 104       |   X124     | 3/12/2020 12:00   |
| 104       |   X126     | 3/12/2020 12:00   |
| 105       |   X126     | 3/12/2020 12:00   |

I want to build a hierarchy off of this data that can eventually be expressed as follows:

Each Tree represents the impact of the Visitors on the spread of the virus:

100
  --> X123
    --> 101
      --> X125
    --> 103
  --> X124
    --> 104

102
  --> X126
    --> 104
    --> 105
  --> X124
    --> 104
      --> X126

I attempted to do this by first finding the root nodes (root visitors who were no impacted by previous visitors and/or employees they saw). These were 100 and 102.

SELECT 
    *,
    ROW_NUMBER() OVER (PARTITION BY EmployeeID ORDER BY Contact) AS SeenOrder
INTO 
    #SeenOrder
FROM
    visitorByEmployee

SELECT *
INTO #RootVisitors
FROM #SeenOrder
WHERE SeenOrder = 1

From #RootVisitors and #SeenOrder, I want to build a table that can tell me that hierarchy of impact and maybe result in something like this:

| InitVisitorID | HLevel     | EmployeeID        |   VisitorID |
+---------------+------------+-------------------+-------------+
| 100           |   0        |  X123             |     100     |
| 100           |   0        |  X124             |     100     |
| 100           |   1        |  X123             |     101     |
| 100           |   1        |  X123             |     103     |
| 100           |   1        |  X124             |     104     |
| 100           |   2        |  X125             |     101     |
| 102           |   0        |  X126             |     102     |
| 102           |   0        |  X124             |     102     |
| 102           |   1        |  X126             |     104     |
| 102           |   1        |  X126             |     105     |
| 102           |   1        |  X124             |     104     |
| 102           |   2        |  X126             |     104     |

Is this something that can be done using a recursive CTEs? I attempted to do this but due to the shifting hierarchy from visitor to employee to visitor to employee, I am having a hard time creating that recursive CTE.

UPDATE Here is the recursive CTE I am working on. It doesn't work yet but the approach is what I am sharing:

; WITH exposure_tree AS (
/* == Anchor with the root visitors == */
/* == You can think of this: The Employees who were exposed by the Visior == */
SELECT re.VisitorID InitVisitor,
    1 as Level, 
    CASE WHEN 1%2=1 THEN 'Visitor' ELSE 'Employee' END ExposerType,
    re.VisitorID Exposer,
    re.EmployeeID Exposee,
    re.SeenOrder,
    re.InitialContact
FROM #SeenOrder re
WHERE re.SeenOrder = 1

/* == Recursive Part #1 ==
Get the visitors who were exposed next by the exposed employees
*/
UNION ALL

SELECT et.VisitorID InitVisitor,
    Level + 1,
    CASE WHEN (Level+1)%2=1 THEN 'Visitor' ELSE 'Employee' END ExposerType,
    re.EmployeeID,
    re.VisitorID, -- These are switched from the anchor.
    re.SeenOrder,
    re.InitialContact
FROM #SeenOrder re
JOIN exposure_tree et ON et.Exposee = re.EmployeeID AND re.SeenOrder > 1 AND re.InitialContact > et.InitialContact

UNION ALL

/* == Recursive Part #2 ==
Get the next employees who were exposed the second level exposed visitors
*/
SELECT et.VisitorID InitVisitor,
    Level + 2,
    CASE WHEN (Level+2)%2=1 THEN 'Visitor' ELSE 'Employee' END ExposerType,
    re.VisitorID,
    re.EmployeeID,
    re.SeenOrder,
    re.InitialContact
FROM #ROOT_EXPOSURES re
JOIN exposure_tree et ON re.VisitorID = et.Exposer and re.SeenOrder > 1 AND re.InitialContact > et.InitialContact
)
select top 1000 * from exposure_tree ORDER BY InitVisitor, Level
Unclog answered 31/3, 2020 at 19:40 Comment(4)
Good question. +1 Let me try a few things...Cistaceous
Why are the "root visitors" 100 and 102 in your example? 101's meeting with X125 was the first meeting for both that employee and that visitor. Or is it an error that the date suddenly changes to 11 not 12. Conversely visitor 102 first met with Employee X124 - but X124 had already seen Visitor 100 so why is 102 a "root"?Hughey
You are right Martin 101 is a root. 102 is root because their first encounter was with X126.Unclog
Their first encounter was with two employees simultaneously at 3/12/2020 10:00Hughey
C
2

You can still write a recursive CTE using those tables. The coding gets tricky, though.

Here's the CTE. You probably will need to tweak it to get exactly what you want. I changed the column names fo simplicity:

with
c as (
  select 'v' as type, vid as id, contact, 0 as lvl, cast(concat('/', vid, '/') as varchar(255)) as path
  from (select *, row_number() over(partition by vid order by contact) as rn from v) x where rn = 1
 union all
  select
   case when type = 'v' then 'e' else 'v' end, -- type
   case when type = 'v' then v.eid else v.vid end, -- id
   v.contact,
   c.lvl + 1,
   cast(concat(c.path, case when type = 'v' then v.eid else v.vid end, '/') as varchar(255))
  from c 
  join v on c.lvl <= 10 and v.contact >= c.contact and (c.type = 'v' and v.vid = c.id or c.type = 'e' and v.eid = c.id)
        and c.path not like concat('%', case when type = 'v' then v.eid else v.vid end, '%')
)
select * from c order by path

Result:

type  id    contact                lvl  path                   
----  ----  ---------------------  ---  -----------------------
v     100   2020-03-11 01:00:00.0    0  /100/                  
e     X123  2020-03-11 01:00:00.0    1  /100/X123/             
v     101   2020-03-12 01:11:00.0    2  /100/X123/101/         
v     103   2020-03-12 11:00:00.0    2  /100/X123/103/         
e     X124  2020-03-11 01:10:00.0    1  /100/X124/             
v     102   2020-03-12 10:00:00.0    2  /100/X124/102/         
e     X126  2020-03-12 10:00:00.0    3  /100/X124/102/X126/    
v     104   2020-03-12 12:00:00.0    4  /100/X124/102/X126/104/
v     105   2020-03-12 12:00:00.0    4  /100/X124/102/X126/105/
v     104   2020-03-12 12:00:00.0    2  /100/X124/104/         
e     X126  2020-03-12 12:00:00.0    3  /100/X124/104/X126/    
v     105   2020-03-12 12:00:00.0    4  /100/X124/104/X126/105/
v     101   2020-03-11 01:20:00.0    0  /101/                  
e     X123  2020-03-12 01:11:00.0    1  /101/X123/             
v     103   2020-03-12 11:00:00.0    2  /101/X123/103/         
e     X125  2020-03-11 01:20:00.0    1  /101/X125/             
v     102   2020-03-12 10:00:00.0    0  /102/                  
e     X124  2020-03-12 10:00:00.0    1  /102/X124/             
v     104   2020-03-12 12:00:00.0    2  /102/X124/104/         
e     X126  2020-03-12 12:00:00.0    3  /102/X124/104/X126/    
v     105   2020-03-12 12:00:00.0    4  /102/X124/104/X126/105/
e     X126  2020-03-12 10:00:00.0    1  /102/X126/             
v     104   2020-03-12 12:00:00.0    2  /102/X126/104/         
e     X124  2020-03-12 12:00:00.0    3  /102/X126/104/X124/    
v     105   2020-03-12 12:00:00.0    2  /102/X126/105/         
v     103   2020-03-12 11:00:00.0    0  /103/                  
e     X123  2020-03-12 11:00:00.0    1  /103/X123/             
v     104   2020-03-12 12:00:00.0    0  /104/                  
e     X124  2020-03-12 12:00:00.0    1  /104/X124/             
e     X126  2020-03-12 12:00:00.0    1  /104/X126/             
v     105   2020-03-12 12:00:00.0    2  /104/X126/105/         
v     105   2020-03-12 12:00:00.0    0  /105/                  
e     X126  2020-03-12 12:00:00.0    1  /105/X126/             
v     104   2020-03-12 12:00:00.0    2  /105/X126/104/         
e     X124  2020-03-12 12:00:00.0    3  /105/X126/104/X124/    

For reference, here's the data script I used to test, if you need to create a SQL Fiddle to run it:

create table v (
  vid varchar(6),
  eid varchar(6),
  contact datetime
);

insert into v (vid, eid, contact) values ('100', 'X123', '2020-03-11 01:00:00');
insert into v (vid, eid, contact) values ('100', 'X124', '2020-03-11 01:10:00');
insert into v (vid, eid, contact) values ('101', 'X123', '2020-03-12 01:11:00');
insert into v (vid, eid, contact) values ('101', 'X125', '2020-03-11 01:20:00');
insert into v (vid, eid, contact) values ('102', 'X126', '2020-03-12 10:00:00');
insert into v (vid, eid, contact) values ('102', 'X124', '2020-03-12 10:00:00');
insert into v (vid, eid, contact) values ('103', 'X123', '2020-03-12 11:00:00');
insert into v (vid, eid, contact) values ('104', 'X124', '2020-03-12 12:00:00');
insert into v (vid, eid, contact) values ('104', 'X126', '2020-03-12 12:00:00');
insert into v (vid, eid, contact) values ('105', 'X126', '2020-03-12 12:00:00');
Cistaceous answered 31/3, 2020 at 20:33 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.