Postgres recursive query with row_to_json
Asked Answered
W

3

7

I've got a table in postgres 9.3.5 that looks like this:

CREATE TABLE customer_area_node
(
  id bigserial NOT NULL,
  customer_id integer NOT NULL,
  parent_id bigint,
  name text,
  description text,

  CONSTRAINT customer_area_node_pkey PRIMARY KEY (id)
)

I query with:

WITH RECURSIVE c AS (
       SELECT *, 0 as level, name as path FROM customer_area_node WHERE customer_id = 2 and parent_id is null
       UNION ALL
       SELECT customer_area_node.*, 
       c.level + 1 as level, 
       c.path || '/' || customer_area_node.name as path
  FROM customer_area_node 
  join c ON customer_area_node.parent_id = c.id
)
SELECT * FROM c ORDER BY path;

this seems to work to build paths like building1/floor1/room1, building1/floor1/room2, etc.

What I'd like to be able to do is easily turn that into either json that represents the tree structure which I've been told I can do with row_to_json.

As a reasonable alternative, any other way I can format the data to a more efficient mechanism such that I can actually easily turn it into an actual tree structure without having a ton of string.splits on /.

Is there a reasonably easy way to do this with row_to_json?

Whidah answered 5/9, 2014 at 4:13 Comment(1)
can you provide sample data?Subjective
B
13

Sorry for the very late answer but i think i found an elegant solution that could become an accepted answer for this question.

Based on the awesome "little hack" found by @pozs, i came up with a solution that:

  • solves the "rogue leaves" situation with very little code (leveraging the NOT EXISTS predicate)
  • avoids the whole level calculation/condition stuff
WITH RECURSIVE customer_area_tree("id", "customer_id", "parent_id", "name", "description", "children") AS (
  -- tree leaves (no matching children)
  SELECT c.*, json '[]'
  FROM customer_area_node c
  WHERE NOT EXISTS(SELECT * FROM customer_area_node AS hypothetic_child WHERE hypothetic_child.parent_id = c.id)

  UNION ALL

  -- pozs's awesome "little hack"
  SELECT (parent).*, json_agg(child) AS "children"
  FROM (
    SELECT parent, child
    FROM customer_area_tree AS child
    JOIN customer_area_node parent ON parent.id = child.parent_id
  ) branch
  GROUP BY branch.parent
)
SELECT json_agg(t)
FROM customer_area_tree t
LEFT JOIN customer_area_node AS hypothetic_parent ON(hypothetic_parent.id = t.parent_id)
WHERE hypothetic_parent.id IS NULL

Update:

Tested with very simple data, it does work, but as posz pointed out in a comment, with his sample data, some rogue leaf nodes are forgotten. But, i found out that with even more complex data, the previous answer is not working either, because only rogue leaf nodes having a common ancestor with "max level" leaf nodes are caught (when "1.2.5.8" is not there, "1.2.4" and "1.2.5" are absent because they have no common ancestor with any "max level" leaf node).

So here is a new proposition, mixing posz's work with mine by extracting the NOT EXISTS subrequest and making it an internal UNION, leveraging UNION de-duplication abilities (leveraging jsonb comparison abilities):

<!-- language: sql -->
WITH RECURSIVE
c_with_level AS (

    SELECT *, 0 as lvl
    FROM   customer_area_node
    WHERE  parent_id IS NULL

    UNION ALL

    SELECT child.*, parent.lvl + 1
    FROM   customer_area_node child
    JOIN   c_with_level parent ON parent.id = child.parent_id
),
maxlvl AS (
  SELECT max(lvl) maxlvl FROM c_with_level
),
c_tree AS (
    SELECT c_with_level.*, jsonb '[]' children
    FROM   c_with_level, maxlvl
    WHERE  lvl = maxlvl

    UNION 
    (
        SELECT (branch_parent).*, jsonb_agg(branch_child)
        FROM (
            SELECT branch_parent, branch_child
            FROM c_with_level branch_parent
            JOIN c_tree branch_child ON branch_child.parent_id = branch_parent.id
        ) branch
        GROUP BY branch.branch_parent

        UNION

        SELECT c.*, jsonb '[]' children
        FROM   c_with_level c
        WHERE  NOT EXISTS (SELECT 1 FROM c_with_level hypothetical_child WHERE hypothetical_child.parent_id = c.id)
    )
)
SELECT jsonb_pretty(row_to_json(c_tree)::jsonb)
FROM c_tree
WHERE lvl = 0;

Tested on http://rextester.com/SMM38494 ;)

Beowulf answered 11/5, 2017 at 13:36 Comment(4)
I'm afraid, without the special handling of levels you'll end up as many separate "branches" in the output, as many leaves are exist in different levels. Your example data has only 0 or 1 child for every node, that's why it didn't show up.Reynaldoreynard
Thanks for your observation! I've been doing some work around this, and i found out that your trick to handle rogue leaf nodes isn't perfectly efficient either because in your example, if you don't have node "1.2.5.8", the branch "1.2" is never caught so "1.2.4" and "1.2.5" are absent from the final result. That's because you only catch rogue leaf nodes when they have a common ancestor with "max level" leaf nodes. I found a solution to this, I'll edit my answer in a minute ;)Beowulf
For anyone interested, I modified David's second attempt to instead return nested JSON objects. In Python I'd like to deserialize the JSON as nested dictionaries - why do in code what can be done in SQL ;) I'll want similar access to the data structure in javascript, too. Thank you @DavidGuillot!Orva
This approach is good, but does not cover situations where there are multiple root items and one tree goes deeper than another. It produces duplicates on root level with different children: sqlfiddle.com/#!17/022f80/8Noelianoell
R
8

You cannot do that with a usual recursive CTE, because it is almost impossible to set a json value deep in its hierarchy. But you can do it reversed: build up the tree starting from its leaves, until its root:

-- calculate node levels
WITH RECURSIVE c AS (
    SELECT *, 0 as lvl
    FROM customer_area_node
    -- use parameters here, to select the root first
    WHERE customer_id = 2 AND parent_id IS NULL
  UNION ALL
    SELECT customer_area_node.*, c.lvl + 1 as lvl
    FROM customer_area_node 
    JOIN c ON customer_area_node.parent_id = c.id
),
-- select max level
maxlvl AS (
  SELECT max(lvl) maxlvl FROM c
),
-- accumulate children
j AS (
    SELECT c.*, json '[]' children -- at max level, there are only leaves
    FROM c, maxlvl
    WHERE lvl = maxlvl
  UNION ALL
    -- a little hack, because PostgreSQL doesn't like aggregated recursive terms
    SELECT (c).*, array_to_json(array_agg(j)) children
    FROM (
      SELECT c, j
      FROM j
      JOIN c ON j.parent_id = c.id
    ) v
    GROUP BY v.c
)
-- select only root
SELECT row_to_json(j) json_tree
FROM j
WHERE lvl = 0;

And this will work even with PostgreSQL 9.2+

SQLFiddle

Update: A variant, which should handle rogue leaf nodes too (which are located with a level between 1 and max-level):

WITH RECURSIVE c AS (
    SELECT *, 0 as lvl
    FROM   customer_area_node
    WHERE  customer_id = 1 AND parent_id IS NULL
  UNION ALL
    SELECT customer_area_node.*, c.lvl + 1
    FROM   customer_area_node 
    JOIN   c ON customer_area_node.parent_id = c.id
),
maxlvl AS (
  SELECT max(lvl) maxlvl FROM c
),
j AS (
    SELECT c.*, json '[]' children
    FROM   c, maxlvl
    WHERE  lvl = maxlvl
  UNION ALL
    SELECT   (c).*, array_to_json(array_agg(j) || array(SELECT r
                                                        FROM   (SELECT l.*, json '[]' children
                                                                FROM   c l, maxlvl
                                                                WHERE  l.parent_id = (c).id
                                                                AND    l.lvl < maxlvl
                                                                AND    NOT EXISTS (SELECT 1
                                                                                   FROM   c lp
                                                                                   WHERE  lp.parent_id = l.id)) r)) children
    FROM     (SELECT c, j
              FROM   c
              JOIN   j ON j.parent_id = c.id) v
    GROUP BY v.c
)
SELECT row_to_json(j) json_tree
FROM   j
WHERE  lvl = 0;

This should work too on PostgreSQL 9.2+, however, I cannot test that. (I can only test on 9.5+ right now).

These solutions can handle any column in any hierarchical table, but will always append an int typed lvl JSON property to their output.

http://rextester.com/YNU7932

Reynaldoreynard answered 5/9, 2014 at 9:52 Comment(4)
I found that for this to work all leaves have to have the same level.Sciatica
@Sciatica in the SQLFiddle example, there are leaves at multiple level (f.ex. 1.3.7 vs 1.3.6.9) & all nodes are collected.Reynaldoreynard
@Sciatica I'm having the same issue as reported by @pozs. I've created an example SQLFiddle showing this case with the 1.4.10, which only has 2 leaves instead of 3 leaves like all the rest. So with this SQL you have to have all branches with the exact same depth.Punctilio
Sorry for the late, but I could found a solution to those leaf nodes too.Reynaldoreynard
C
0

Developed the answer of pozs a little further to get recursive leaves with their subtrees. So this answer really returns the full tree.

CREATE OR REPLACE FUNCTION pg_temp.getTree(bigint) 
    RETURNS TABLE( 
            id bigint,
            customer_id integer,
            parent_id bigint,
            name text,
            description text,
            children json
        ) 
        AS $$   

        WITH RECURSIVE relations AS ( 
            SELECT 
                can.id,
                can.customer_id,
                can.parent_id,
                can.name, 
                can.description,
                0 AS depth 
                FROM customer_area_node can 
                WHERE can.id = $1 
            UNION ALL 
            SELECT 
                can.id,
                can.customer_id,
                can.parent_id,
                can.name, 
                can.description,
                relations.depth + 1 
                FROM customer_area_node can
                JOIN relations ON can.parent_id = relations.id AND can.id != can.parent_id
        ),     

        maxdepth AS ( 
            SELECT max(depth) maxdepth FROM relations 
        ), 

        rootTree as ( 
            SELECT r.* FROM 
                relations r, maxdepth 
                WHERE depth = maxdepth 
            UNION ALL 
            SELECT r.* FROM 
                relations r, rootTree 
                WHERE r.id = rootTree.parent_id AND rootTree.id != rootTree.parent_id 
        ), 

        mainTree AS ( 
            SELECT 
                c.id,
                c.customer_id,
                c.parent_id,
                c.name, 
                c.description,
                c.depth, 
                json_build_array() children 
                FROM relations c, maxdepth 
                WHERE c.depth = maxdepth 
            UNION ALL 
            SELECT 
                (relations).*, 
                array_to_json( 
                    array_agg(mainTree) 
                    || 
                    array( 
                        SELECT t 
                            FROM ( 
                                SELECT 
                                    l.*, 
                                    json_build_array() children 
                                FROM relations l, maxdepth 
                                    WHERE l.parent_id = (relations).id 
                                    AND l.depth < maxdepth 
                                    AND l.id  NOT IN ( 
                                        SELECT id FROM rootTree 
                                    ) 
                            ) r 
                           JOIN pg_temp.getTree(r.id) t 
                            ON r.id = t.id 
                        )) 
                children 
    FROM ( 
        SELECT relations, mainTree 
            FROM relations 
        JOIN mainTree 
            ON ( 
                mainTree.parent_id = relations.id 
                AND mainTree.parent_id != mainTree.id 
            ) 
    ) v 
    GROUP BY v.relations 
    ) 

        SELECT 
            id,
            customer_id,
            parent_id,
            name, 
            description,
            children 
        FROM mainTree WHERE id = $1 
    $$ 
    LANGUAGE SQL; 

    SELECT * 
    FROM 
        customer_area_node can 
        JOIN pg_temp.getTree(can.id) t ON t.id = can.id 
    WHERE can.parent_id IS NULL;
Castello answered 29/6, 2017 at 9:39 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.