Using row_to_json() with nested joins
Asked Answered
C

3

121

I'm trying to map the results of a query to JSON using the row_to_json() function that was added in PostgreSQL 9.2.

I'm having trouble figuring out the best way to represent joined rows as nested objects (1:1 relations)

Here's what I've tried (setup code: tables, sample data, followed by query):

-- some test tables to start out with:
create table role_duties (
    id serial primary key,
    name varchar
);

create table user_roles (
    id serial primary key,
    name varchar,
    description varchar,
    duty_id int, foreign key (duty_id) references role_duties(id)
);

create table users (
    id serial primary key,
    name varchar,
    email varchar,
    user_role_id int, foreign key (user_role_id) references user_roles(id)
);

DO $$
DECLARE duty_id int;
DECLARE role_id int;
begin
insert into role_duties (name) values ('Script Execution') returning id into duty_id;
insert into user_roles (name, description, duty_id) values ('admin', 'Administrative duties in the system', duty_id) returning id into role_id;
insert into users (name, email, user_role_id) values ('Dan', '[email protected]', role_id);
END$$;

The query itself:

select row_to_json(row)
from (
    select u.*, ROW(ur.*::user_roles, ROW(d.*::role_duties)) as user_role 
    from users u
    inner join user_roles ur on ur.id = u.user_role_id
    inner join role_duties d on d.id = ur.duty_id
) row;

I found if I used ROW(), I could separate the resulting fields out into a child object, but it seems limited to a single level. I can't insert more AS XXX statements, as I think I should need in this case.

I am afforded column names, because I cast to the appropriate record type, for example with ::user_roles, in the case of that table's results.

Here's what that query returns:

{
   "id":1,
   "name":"Dan",
   "email":"[email protected]",
   "user_role_id":1,
   "user_role":{
      "f1":{
         "id":1,
         "name":"admin",
         "description":"Administrative duties in the system",
         "duty_id":1
      },
      "f2":{
         "f1":{
            "id":1,
            "name":"Script Execution"
         }
      }
   }
}

What I want to do is generate JSON for joins (again 1:1 is fine) in a way where I can add joins, and have them represented as child objects of the parents they join to, i.e. like the following:

{
   "id":1,
   "name":"Dan",
   "email":"[email protected]",
   "user_role_id":1,
   "user_role":{
         "id":1,
         "name":"admin",
         "description":"Administrative duties in the system",
         "duty_id":1
         "duty":{
            "id":1,
            "name":"Script Execution"
         }
      }
   }
}
Cohune answered 5/11, 2012 at 6:41 Comment(1)
It's there in the setup code. The inserts. I went to the trouble of setting everything up so anyone could replicate my situation.Cohune
C
244

Update: In PostgreSQL 9.4 this improves a lot with the introduction of to_json, json_build_object, json_object and json_build_array, though it's verbose due to the need to name all the fields explicitly:

select
        json_build_object(
                'id', u.id,
                'name', u.name,
                'email', u.email,
                'user_role_id', u.user_role_id,
                'user_role', json_build_object(
                        'id', ur.id,
                        'name', ur.name,
                        'description', ur.description,
                        'duty_id', ur.duty_id,
                        'duty', json_build_object(
                                'id', d.id,
                                'name', d.name
                        )
                )
    )
from users u
inner join user_roles ur on ur.id = u.user_role_id
inner join role_duties d on d.id = ur.duty_id;

For older versions, read on.


It isn't limited to a single row, it's just a bit painful. You can't alias composite rowtypes using AS, so you need to use an aliased subquery expression or CTE to achieve the effect:

select row_to_json(row)
from (
    select u.*, urd AS user_role
    from users u
    inner join (
        select ur.*, d
        from user_roles ur
        inner join role_duties d on d.id = ur.duty_id
    ) urd(id,name,description,duty_id,duty) on urd.id = u.user_role_id
) row;

produces, via http://jsonprettyprint.com/:

{
  "id": 1,
  "name": "Dan",
  "email": "[email protected]",
  "user_role_id": 1,
  "user_role": {
    "id": 1,
    "name": "admin",
    "description": "Administrative duties in the system",
    "duty_id": 1,
    "duty": {
      "id": 1,
      "name": "Script Execution"
    }
  }
}

You will want to use array_to_json(array_agg(...)) when you have a 1:many relationship, btw.

The above query should ideally be able to be written as:

select row_to_json(
    ROW(u.*, ROW(ur.*, d AS duty) AS user_role)
)
from users u
inner join user_roles ur on ur.id = u.user_role_id
inner join role_duties d on d.id = ur.duty_id;

... but PostgreSQL's ROW constructor doesn't accept AS column aliases. Sadly.

Thankfully, they optimize out the same. Compare the plans:

Because CTEs are optimisation fences, rephrasing the nested subquery version to use chained CTEs (WITH expressions) may not perform as well, and won't result in the same plan. In this case you're kind of stuck with ugly nested subqueries until we get some improvements to row_to_json or a way to override the column names in a ROW constructor more directly.


Anyway, in general, the principle is that where you want to create a json object with columns a, b, c, and you wish you could just write the illegal syntax:

ROW(a, b, c) AS outername(name1, name2, name3)

you can instead use scalar subqueries returning row-typed values:

(SELECT x FROM (SELECT a AS name1, b AS name2, c AS name3) x) AS outername

Or:

(SELECT x FROM (SELECT a, b, c) AS x(name1, name2, name3)) AS outername

Additionally, keep in mind that you can compose json values without additional quoting, e.g. if you put the output of a json_agg within a row_to_json, the inner json_agg result won't get quoted as a string, it'll be incorporated directly as json.

e.g. in the arbitrary example:

SELECT row_to_json(
        (SELECT x FROM (SELECT
                1 AS k1,
                2 AS k2,
                (SELECT json_agg( (SELECT x FROM (SELECT 1 AS a, 2 AS b) x) )
                 FROM generate_series(1,2) ) AS k3
        ) x),
        true
);

the output is:

{"k1":1,
 "k2":2,
 "k3":[{"a":1,"b":2}, 
 {"a":1,"b":2}]}

Note that the json_agg product, [{"a":1,"b":2}, {"a":1,"b":2}], hasn't been escaped again, as text would be.

This means you can compose json operations to construct rows, you don't always have to create hugely complex PostgreSQL composite types then call row_to_json on the output.

Citronellal answered 5/11, 2012 at 7:3 Comment(14)
Thanks so much! Would that be roughly the same speed of query as the original, or do you think there is an additional cost to using a subquery?Cohune
@Cohune I would be surprised if the query plans didn't work out much the same, but strongly advise you to explain analyze on a more realistic sample of data to see.Citronellal
@Cohune Question edited with explain links showing the plans are, in fact, identical.Citronellal
If I could upvote your answer a couple more times, I would. I appreciate the detail, and the bit about 1:many relationships.Cohune
@Cohune Glad to help. Thanks for making the effort of writing a good question; I'd like to bump it up a few more times too. Sample data, Pg version, expected output, actual output/error; ticks all the boxes, and is clear and easy to understand. So thanks.Citronellal
Any downsides to creating a custom type with the desired names and casting to that type (i.e. row_to_json(row(c1, c2, ...)::type_with_good_names)) to get around the unpleasant "subquery to get useful property names" stuff? The query in question isn't a one-off but it would be the only thing using the custom type.Bismarck
@muistooshort Quite harmless. Just a pg_type entry and some more pg_attribute entries. I wouldn't want to do so for thousands of queries, mainly because it'd get annoying to maintain, but that's about it.Citronellal
@muistooshort: A temp table to provide the type serves, too, and is deleted automatically ate the end of the session.Augean
Am I correct saying, that this commit by Tom Lane (branches from 9.2 up to HEAD) will solve the issue with column aliases?Pasahow
@Pasahow I don't think so; that's just a bugfix, albeit a useful one. It doesn't add support for aliasing records.Citronellal
Thank you so much for the 9.4 example. json_build_object is going to make my life much easier but somehow I didn't pick up on it when I saw the release notes. Sometimes you just need a concrete example to get you started.Geomancy
Super answer - agree that the documentation should highlight json_build_object a bit more - it's a real game changer.Lontson
@CraigRinger Would you happen to know where I can read more about scalar subqueries returning row-typed values? I know what a scalar subquery is, but I could not find documentation/info. on the part of the query that says": SELECT x FROM (...) xAcerate
@Acerate If x is a table, then SELECT x FROM x LIMIT 1 returns a single row-typed value. It's scalar, in that it returns a single concrete value. But that value is a composite type, a row. You can also use the row constructor to get a record result like SELECT ROW(1,2,3)Citronellal
B
6

I am adding this solution becasue the accepted response does not contemplate N:N relationships. aka: collections of collections of objects

If you have N:N relationships the clausula with it's your friend. In my example, I would like to build a tree view of the following hierarchy.

A Requirement - Has - TestSuites
A Test Suite - Contains - TestCases.

The following query represents the joins.

SELECT reqId ,r.description as reqDesc ,array_agg(s.id)
            s.id as suiteId , s."Name"  as suiteName,
            tc.id as tcId , tc."Title"  as testCaseTitle

from "Requirement" r 
inner join "Has"  h on r.id = h.requirementid 
inner join "TestSuite" s on s.id  = h.testsuiteid
inner join "Contains" c on c.testsuiteid  = s.id 
inner join "TestCase"  tc on tc.id = c.testcaseid
  GROUP BY r.id, s.id;

Since you can not do multiple aggregations, you need to use "WITH".

with testcases as (
select  c.testsuiteid,ts."Name" , tc.id, tc."Title"  from "TestSuite" ts
inner join "Contains" c on c.testsuiteid  = ts.id 
inner join "TestCase"  tc on tc.id = c.testcaseid

),                
requirements as (
    select r.id as reqId ,r.description as reqDesc , s.id as suiteId
    from "Requirement" r 
    inner join "Has"  h on r.id = h.requirementid 
    inner join "TestSuite" s on s.id  = h.testsuiteid

    ) 
, suitesJson as (
 select  testcases.testsuiteid,  
       json_agg(
                json_build_object('tc_id', testcases.id,'tc_title', testcases."Title" )
            ) as suiteJson
    from testcases 
    group by testcases.testsuiteid,testcases."Name"
 ),
allSuites as (
    select has.requirementid,
           json_agg(
                json_build_object('ts_id', suitesJson.testsuiteid,'name',s."Name"  , 'test_cases', suitesJson.suiteJson )
            ) as suites
            from suitesJson inner join "TestSuite" s on s.id  = suitesJson.testsuiteid
            inner join "Has" has on has.testsuiteid  = s.id
            group by has.requirementid
),
allRequirements as (
    select json_agg(
            json_build_object('req_id', r.id ,'req_description',r.description , 'test_suites', allSuites.suites )
            ) as suites
            from allSuites inner join "Requirement" r on r.id  = allSuites.requirementid

)
 select * from allRequirements

What it does is building the JSON object in small collection of items and aggregating them on each with clausules.

Result:

[
  {
    "req_id": 1,
    "req_description": "<character varying>",
    "test_suites": [
      {
        "ts_id": 1,
        "name": "TestSuite",
        "test_cases": [
          {
            "tc_id": 1,
            "tc_title": "TestCase"
          },
          {
            "tc_id": 2,
            "tc_title": "TestCase2"
          }
        ]
      },
      {
        "ts_id": 2,
        "name": "TestSuite",
        "test_cases": [
          {
            "tc_id": 2,
            "tc_title": "TestCase2"
          }
        ]
      }
    ]
  },
  {
    "req_id": 2,
    "req_description": "<character varying> 2 ",
    "test_suites": [
      {
        "ts_id": 2,
        "name": "TestSuite",
        "test_cases": [
          {
            "tc_id": 2,
            "tc_title": "TestCase2"
          }
        ]
      }
    ]
  }
]
Brythonic answered 18/4, 2020 at 7:55 Comment(0)
A
2

My suggestion for maintainability over the long term is to use a VIEW to build the coarse version of your query, and then use a function as below:

CREATE OR REPLACE FUNCTION fnc_query_prominence_users( )
RETURNS json AS $$
DECLARE
    d_result            json;
BEGIN
    SELECT      ARRAY_TO_JSON(
                    ARRAY_AGG(
                        ROW_TO_JSON(
                            CAST(ROW(users.*) AS prominence.users)
                        )
                    )
                )
        INTO    d_result
        FROM    prominence.users;
    RETURN d_result;
END; $$
LANGUAGE plpgsql
SECURITY INVOKER;

In this case, the object prominence.users is a view. Since I selected users.*, I will not have to update this function if I need to update the view to include more fields in a user record.

Acuff answered 10/2, 2016 at 23:32 Comment(1)
This stepCAST(ROW(users.*) AS prominence.users) ommit if necessary, enough with table name!Surely

© 2022 - 2024 — McMap. All rights reserved.