I have the following problem: I am trying to discover all possible paths from source node (node_s) to target node (node_t).
The format of the original table with graph edges is simple: | node_x | node_y | strength | , where "node_x" -> "node_y" is a direct edge with strength of the edge being "weight".
The idea is if at any point during the exploration of the paths we discover that a node among its children has target node_t, we record this path and stop exploring paths from this node, otherwise continue exploration.
The simple solution was to use PostgreSQL's recursive CTE which constructs the transitive closure of the graph:
WITH RECURSIVE transitive_closure (source, target, weight, path_string) AS
(
--non-recurive term
SELECT b.node_x, b.node_y, b.strength, b.node_x || '.' || b.node_y || '.' AS path_string
FROM best_path b
WHERE b.node_x = node_s --source_node
UNION
--recurive term
SELECT tc.source, b.node_y, least(tc.weight,b.strength), tc.path_string || b.node_y || '.' AS path_string
FROM best_path AS b JOIN transitive_closure AS tc ON b.node_x = tc.target
WHERE tc.path_string NOT LIKE '%' || b.node_y || '.%'
)
SELECT *
FROM transitive_closure tc
WHERE tc.target = node_t --target_node
The code above will discover all possible paths from source node node_s. Only after transitive closure construction, I am able to select the needed rows of paths from source node to target node (see last SELECT statement).
Example:
best_path table has the following data:
node_x | node_y | strength
--------+--------+----------
1 | 2 | 1
1 | 3 | 1
2 | 4 | 1
2 | 5 | 1
3 | 6 | 1
3 | 7 | 1
4 | 8 | 1
4 | 9 | 1
5 | 10 | 1
5 | 11 | 1
query:
find the paths from source node = 1, to target node = 4
result:
source | target | strength | path_string
--------+--------+----------+------------
1 | 2 | 1 | 1.2.
1 | 3 | 1 | 1.3.
1 | 4 | 1 | 1.2.4.
1 | 5 | 1 | 1.2.5.
1 | 6 | 1 | 1.3.6.
1 | 7 | 1 | 1.3.7.
1 | 8 | 1 | 1.2.4.8.
1 | 9 | 1 | 1.2.4.9.
1 | 10 | 1 | 1.2.5.10.
1 | 11 | 1 | 1.2.5.11.
This is not what I need. Since there is a direct edge from node 2 to node 4 (target) already, I do not need paths 1.2.5., 1.2.4.8., 1.2.4.9.,1.2.5.10.,1.2.5.11., the path exploration for node 2 should stop at the point when the path from 2 to 4 was discovered.
To sum up, I do not want to discover the paths of the node if it already has a direct edge to the target node. This means that in a recursive term of CTE, I would like to have some condition that will say the following, pseudocode follows:
WITH RECURSIVE transitive_closure (source, target, weight, path_string) AS
(
--non-recurive term (as before)
SELECT b.node_x, b.node_y, b.strength, b.node_x || '.' || b.node_y || '.' AS path_string
FROM best_path b
WHERE b.node_x = node_s --source_node
UNION
--recurive term
SELECT tc.source, b.node_y, least(tc.weight,b.strength), tc.path_string || b.node_y || '.' AS path_string
FROM best_path AS b JOIN transitive_closure AS tc ON b.node_x = tc.target
WHERE tc.path_string NOT LIKE '%' || b.node_y || '.%'
AND b.node_y = node_t
--will select only rows with direct edge to target
UNION (second union is not allowed in CTE)
SELECT those rows which do not have direct edge to target
AND which parents did not contribute to constructing the query above.
i.e. b.node_x = tc.target where there is no direct edge between b.node_x to node_t
)
As a result of a query to find the paths from source node = 1, to target node = 4, I would like to have the following:
source | target | strength | path_string
--------+--------+----------+------------
1 | 2 | 1 | 1.2.
1 | 3 | 1 | 1.3.
1 | 4 | 1 | 1.2.4.
1 | 6 | 1 | 1.3.6.
1 | 7 | 1 | 1.3.7.
Thanks in advance for your help!
I have tried many ways already: e.g. conditions in FROM/WHERE clauses, trying to pass CTE into function, but no success.
Any suggestions will be appreciated.
I have my own recursive function that achieves what I want, however, it is very slow on enormous amount of data; and PostgreSQL's CTE apparently is well-optimized, so I would like to dig into it a bit more.