Recursive query challenge - simple parent/child example
Asked Answered
T

3

11

Note: with help from RhodiumToad on #postgresql, I've arrived at a solution, which I posted as answer. If anyone can improve on this, please chime in!

I have not been able to adapt a previous recursive query solution to the following directed acyclic graph that includes multiple "root" (ancestor-less) nodes. I'm trying to write a query whose output is what is commonly known as a closure table: a many-to-many table that stores every path from each node to each of its descendants and to itself:

1  2  11  8  4  5  7
 \/    |  |   \ | /
  3    |   \    6
   \   |    \  /
    9  |     10
     \/     /
     12    13
       \  /
        14

CREATE TABLE node (
  id        SERIAL PRIMARY KEY,
  node_name VARCHAR(50) NOT NULL
);

CREATE TABLE node_relations (
  id                 SERIAL PRIMARY KEY,
  ancestor_node_id   INT REFERENCES node(id),
  descendant_node_id INT REFERENCES node(id)
);

INSERT into node (node_name)
SELECT 'node ' || g FROM generate_series(1,14) g;

INSERT INTO node_relations(ancestor_node_id, descendant_node_id) VALUES
(1,3),(2,3),(4,6),(5,6),(7,6),(3,9),(6,10),(8,10),(9,12),(11,12),(10,13),(12,14),(13,14);

It's been hard to pinpoint the issue(s) - am I'm missing node_relation rows? Is the query wrong?

WITH RECURSIVE node_graph AS (
   SELECT ancestor_node_id, ARRAY[descendant_node_id] AS path, 0 AS level
   FROM   node_relations

   UNION  ALL
   SELECT nr.ancestor_node_id,  ng.path || nr.descendant_node_id,ng.level + 1 AS level
   FROM   node_graph ng
   JOIN   node_relations nr ON nr.descendant_node_id = ng.ancestor_node_id 
)
SELECT path[array_upper(path,1)] AS ancestor,
       path[1] AS descendant,
       path, 
       level as depth
FROM   node_graph
ORDER  BY level, ancestor;

Expected Output:

ancestor | descendant | path
---------+------------+------------------
1        | 3          | "{1,3}"
1        | 9          | "{1,3,9}"
1        | 12         | "{1,3,9,12}"
1        | 14         | "{1,3,9,12,14}"
2        | 3          | "{2,3}"
2        | 9          | "{2,3,9}"
2        | 12         | "{2,3,9,12}"
2        | 14         | "{2,3,9,12,14}"
3        | 9          | "{3,9}"
3        | 12         | "{3,9,12}"
3        | 14         | "{3,9,12,14}"
4        | 6          | "{4,6}"
4        | 10         | "{4,6,10}"
4        | 13         | "{4,6,10,13}"
4        | 14         | "{4,6,10,13,14}"
5        | 6          | "{5,6}"
5        | 10         | "{5,6,10}"
5        | 13         | "{5,6,10,13}"
5        | 14         | "{5,6,10,13,14}"
6        | 10         | "{6,10}"
6        | 13         | "{6,10,13}"
6        | 14         | "{6,10,13,14}"
7        | 6          | "{7,6}"
7        | 10         | "{7,6,10}"
7        | 13         | "{7,6,10,13}"
7        | 14         | "{7,6,10,13,14}"
8        | 10         | "{8,10}"
8        | 13         | "{8,10,13}"
8        | 14         | "{8,10,13,14}"
9        | 12         | "{9,12}"
9        | 14         | "{9,12,14}"
10       | 13         | "{10,13}"
10       | 14         | "{10,13,14}"
11       | 12         | "{11,12}"
11       | 14         | "{11,12,14}"
12       | 14         | "{12,14}"
13       | 14         | "{13,14}"
Turfman answered 9/9, 2014 at 22:11 Comment(4)
And: what what was question? (brilliant exposure, though ...)Adigranth
the query that I provided above is incorrect-- what is the correct query? am I also missing node_relation records, such as cyclic records? not sure what is missingTurfman
It's not clear what's the actual behavior of the code you have. It would make sense to describe what's th difference between actual and expected output. If it just throws some error - error message would be hepful as well.Hertzog
In what way do you want to improve your working solution?Kaylenekayley
T
10

With help from RhodiumToad on #postgresql, I've arrived at this solution:

WITH RECURSIVE node_graph AS (
    SELECT ancestor_node_id as path_start, descendant_node_id as path_end,
           array[ancestor_node_id, descendant_node_id] as path 
    FROM node_relations

    UNION ALL 

    SELECT ng.path_start, nr.descendant_node_id as path_end,
           ng.path || nr.descendant_node_id as path
    FROM node_graph ng
    JOIN node_relations nr ON ng.path_end = nr.ancestor_node_id
) 
SELECT * from node_graph order by path_start, array_length(path,1);

The result is exactly as expected.

Turfman answered 10/9, 2014 at 19:19 Comment(0)
B
3

An alternative approach would be to traverse the graph in reversed order:

WITH RECURSIVE cte AS (
   SELECT array[r.ancestor_node_id, r.descendant_node_id] AS path
   FROM   node_relations r
   LEFT   JOIN node_relations r0 ON r0.ancestor_node_id = r.descendant_node_id
   WHERE  r0.ancestor_node_id IS NULL  -- start at the end

   UNION ALL 
   SELECT r.ancestor_node_id || c.path
   FROM   cte c
   JOIN   node_relations r ON r.descendant_node_id = c.path[1]
   ) 
SELECT path
FROM   cte
ORDER  BY path;

This produces a subset with every path from each root node to its ultimate descendant. For deep trees that also spread out a lot this would entail much fewer join operations. To additionally add every sub-path, you could append a LATERAL join to the outer SELECT:

WITH RECURSIVE cte AS (
   SELECT array[r.ancestor_node_id, r.descendant_node_id] AS path
   FROM   node_relations r
   LEFT   JOIN node_relations r0 ON r0.ancestor_node_id = r.descendant_node_id
   WHERE  r0.ancestor_node_id IS NULL  -- start at the end

   UNION ALL 
   SELECT r.ancestor_node_id || c.path
   FROM   cte c
   JOIN   node_relations r ON r.descendant_node_id = c.path[1]
   ) 
SELECT l.path
FROM   cte, LATERAL (
   SELECT path[1:g] AS path
   FROM   generate_series(2, array_length(path,1)) g
   ) l
ORDER  BY l.path;

I ran a quick test, but it didn't run faster than RhodiumToad's solution. It might still be faster for big or wide tables. Try with your data.

Blaylock answered 28/9, 2015 at 2:48 Comment(0)
G
1

I see two problems with the query:

  1. the non-recursive part does not specify the root node. You need to either explicitely select that using WHERE descendant_node_id = 14 or "dynamically" using:

    SELECT ..
    FROM   node_relations nr1
    WHERE  NOT EXISTS (SELECT 1
                       FROM node_relations nr2
                       WHERE nr2.ancestor_node_id = nr1.descendant_node_id)
    
  2. with the correct starting point, the path is not complete as it will miss the final node during the aggregation in the recursive part. So in the outer query you need to append ancestor_node_id to the generated path.

So the query would look like this:

WITH RECURSIVE node_graph AS (
   SELECT nr1.id, nr1.ancestor_node_id, nr1.descendant_node_id, ARRAY[nr1.descendant_node_id] AS path, 0 as level
   FROM   node_relations nr1
   WHERE  NOT EXISTS (SELECT 1
                      FROM node_relations nr2
                      WHERE nr2.ancestor_node_id = nr1.descendant_node_id)

   UNION  ALL

   SELECT nr.id, nr.ancestor_node_id, nr.descendant_node_id, nr.descendant_node_id || ng.path, ng.level + 1 as level
   FROM node_relations nr
     JOIN node_graph ng ON ng.ancestor_node_id = nr.descendant_node_id

)
SELECT ancestor_node_id||path as path, -- add the last element of the sub-tree to the path
       level as depth
FROM   node_graph
ORDER  BY level

Here is the SQLFiddle: http://sqlfiddle.com/#!15/e646b/3

Galateah answered 10/9, 2014 at 11:46 Comment(3)
This is close but the results are not quite as I specified at the bottom of my post. I added the results recently so you may have started your response before seeing them. Apologies if so..Turfman
@Dowwie: ah, so you want the intermediate levels as well. The order of the elements can easily be changed by reversing the concatenation.Galateah
Correct, and the bottom-most node wasn't included in your initial resultsTurfman

© 2022 - 2024 — McMap. All rights reserved.