jsonb query with nested objects in an array
Asked Answered
B

3

13

I'm using PostgreSQL 9.4 with a table teams containing a jsonb column named json. I am looking for a query where I can get all teams which have the Players 3, 4 and 7 in their array of players.

The table contains two rows with the following json data:

First row:

{
    "id": 1,
    "name": "foobar",
    "members": {
        "coach": {
            "id": 1,
            "name": "A dude"
        },
        "players": [
            {
                "id": 2,
                "name": "B dude"
            },
            {
                "id": 3,
                "name": "C dude"
            },
            {
                "id": 4,
                "name": "D dude"
            },
            {
                "id": 6,
                "name": "F dude"
            },
            {
                "id": 7,
                "name": "G dude"
            }
        ]
    }
}

second row:

{
    "id": 2,
    "name": "bazbar",
    "members": {
        "coach": {
            "id": 11,
            "name": "A dude"
        },
        "players": [
            {
                "id": 3,
                "name": "C dude"
            },
            {
                "id": 5,
                "name": "E dude"
            },
            {
                "id": 6,
                "name": "F dude"
            },
            {
                "id": 7,
                "name": "G dude"
            },
            {
                "id": 8,
                "name": "H dude"
            }
        ]
    }
}

How does the query have to look like to get the desired list of teams? I've tried a query where I'd create an array from the member players jsonb_array_elements(json -> 'members' -> 'players')->'id' and compare them, but all I was able to accomplish is a result where any of the compared player ids was available in a team, not all of them.

Bibb answered 17/3, 2015 at 19:37 Comment(1)
You should add the query you had, even if it's not working - it may be part of a solution.Debera
D
16

You are facing two non-trivial tasks at once.

  • Process jsonb with a complex nested structure.
  • Run the equivalent of a relational division query on the document type.

First jsonb_populate_recordset() works with a registered row type. Can be the row type of any (temp) table or view, or a composite type explicitly created with CREATE TYPE. If there is none, register one. For ad-hoc use, a temporary type does the job (undocumented hack, dropped automatically at the end of the session):

CREATE TYPE pg_temp.foo AS (id int);  -- just "id"

We only need id, so don't include name. The manual:

JSON fields that do not appear in the target row type will be omitted from the output

Query with index support

If you need it fast, create a GIN index on the jsonb column. The more specialized operator class jsonb_path_ops is even faster than the default jsonb_ops:

CREATE INDEX teams_json_gin_idx ON teams USING GIN (json jsonb_path_ops);

Can be used by the "contains" operator @>:

SELECT t.json->>'id' AS team_id
     , ARRAY (SELECT * FROM jsonb_populate_recordset(null::foo, t.json#>'{members,players}')) AS players
FROM   teams t
WHERE  json @> '{"members":{"players":[{"id":3},{"id":4},{"id":7}]}}';

SQL/JSON path language in Postgres 12+ can use the same index:

SELECT t.json->>'id' AS team_id
     , ARRAY (SELECT * FROM jsonb_populate_recordset(null::foo, t.json#>'{members,players}')) AS players
FROM   teams t
WHERE  json @? '$.members ? (@.players.id == 3) ? (@.players.id == 4) ? (@.players.id == 7)';

db<>fiddle here

See:

Simple query

Without index support - unless you create a tailored expression index, see below.

SELECT t.json->>'id' AS team_id, p.players
FROM   teams t
JOIN   LATERAL (
   SELECT ARRAY (
      SELECT * FROM jsonb_populate_recordset(null::foo, t.json#>'{members,players}')
      )
   ) AS p(players) ON p.players @> '{3,4,7}';

db<>fiddle here
Old sqlfiddle

How?

Extracts the JSON array with player records:

t.json#>'{members,players}'

From these, I unnest rows with just the id with:

jsonb_populate_recordset(null::foo, t.json#>'{members,players}')

... and immediately aggregate those into a Postgres array, so we keep one row per row in the base table:

SELECT ARRAY ( ... )

All of this happens in a lateral join:

, JOIN LATERAL (SELECT ... ) AS p(players) ...
  • Immediately filter the resulting arrays in the join condition to keep only the ones we are looking for - with the "contains" array operator @>:

    ... ON p.players @> '{3,4,7}'

If you run this query a lot on a big table, you could create a fake IMMUTABLE function that extracts the array like above and create functional GIN index based on this function to make this super fast.
"Fake" because the function depends on the underlying row type, i.e. on a catalog lookup, and would change if that changes. (So make sure it does not change.) Similar to this one:

Aside:
Don't use type names like json as column names (even if that's allowed), that invites tricky syntax errors and confusing error messages.

Debera answered 18/3, 2015 at 5:7 Comment(3)
That looks very promising! Thank you for your in-depth answer. I'll check it out later tonight and let you know, but the SQL Fiddle (even though it's 9.3) looks very good.Bibb
@Erwin Brandstetter I have a very similar question here #44711699 If you can please answer the same. It will be much appreciated. Thank you!Hoplite
Figured it out and posted the answer.Hoplite
H
1

I wanted to do the same as above. Only other condition was I had to do substring matching and not an exact matching.

This is what I ended up doing(inspired from the answer above of course)

SELECT t.json->>'name' AS feature_name, f.features::text
FROM   teams t
 , LATERAL  (
     SELECT * FROM json_populate_recordset(null::foo, t.json#>'{members,features}')
   ) AS f(features)
 WHERE f.features LIKE '%dud%';

Posting it here if it is of any help.

Hoplite answered 23/6, 2017 at 6:57 Comment(2)
Thanks for sharing.Bibb
Consider the added simpler alternatives under your related question.Debera
S
1

https://www.postgresql.org/docs/release/14.0/

Subscripting can now be applied to any data type for which it is a useful notation, not only arrays. In this release, the jsonb and hstore types have gained subscripting operators. Let's use subscripting feature in postgresql 14.

with a as(
select data['id'] as teamid,
       (jsonb_array_elements( data['members']['players']))['id'] as playerid
from teams), b as( select teamid, array_agg(playerid) as playerids from a group by 1)
select b.* from b where b.playerids @> '{3,4,7}';

returns:

 teamid |  playerids
--------+-------------
 1      | {2,3,4,6,7}

 

DB fiddle

Sussman answered 22/3, 2022 at 7:49 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.