JMESPath expression to flatten array of objects, each with nested arrays of objects
Asked Answered
D

4

9

I have JSON containing an array of databases, each database has an array of users, e.g.

{"databases": [
  {"db": "db_a", "users": [{"name": "alice"}, {"name": "alex"}]},
  {"db": "db_b", "users": [{"name": "bob"}, {"name": "brienne"}]}
]}

I would like to produce a flat array of databases and users, i.e.

[
  {"db": "db_a", "name": "alice"},
  {"db": "db_a", "name": "alex"},
  {"db": "db_b", "name": "bob"},
  {"db": "db_b", "name": "brienne"}
]

In SQL terms this would be a cartesian join or cartesian product, but I'm not sure of the correct term in a tree structure. The closest I've got so far is

databases[].users[]

which produces

[{"name": "alice"}, {"name": "alex"}, {"name": "bob"}, {"name": "brienne"}]

and

databases[].{db: db, name: users[].name}

which produces

[
  {"db": "db_a", "name": ["alice", "alex"]},
  {"db": "db_b", "name": ["bob", "brienne"]}
]

Addendum: I'm happy to accept "You can't do that with JMESPath, here's why ..." as an answer. An HN Comment`` hints at this

can't reference parents when doing iteration. Why? All options for iteration, [* ] and map, all use the iterated item as the context for any expression. There's no opportunity to get any other values in

Demodulation answered 12/1, 2019 at 14:10 Comment(0)
L
1

As of JEP 18 you can use a let expression, available in jmespath-community on PyPI.

A let-expression introduces lexical scoping and lets you bind variables that are evaluated In the context of a given lexical scope. This enables queries that can refer to elements defined outside of their current element

databases[*].let $db = db in users[*].{"db": $db, "name": name}

This produces

[
  [
    {"db": "db_a", "name": "alice"},
    {"db": "db_a", "name": "alex"}
  ],
  [
    {"db": "db_b", "name": "bob"},
    {"db": "db_b", "name": "brienne"}
  ]
]

As pointed out by Alex Willmer, you can flatten this with

(databases[*].let $db = db in users[(].{"db": $db, "name": name})[]

producing

[
  { "db": "db_a", "name": "alice" },
  { "db": "db_a", "name": "alex" },
  { "db": "db_b", "name": "bob" },
  { "db": "db_b", "name": "brienne" }
]
Lease answered 14/2 at 17:29 Comment(0)
D
5

You can't do this with just JMESPath, because JMESPath expressions can only refer to a single scope. There's no way to reach the outer scope (database objects), when the current scope is a user object. JEP 11 would allow access to other scopes, but it hasn't been accepted after several years.

On Ansible it can be done with other filters (h/t Vladimir), and some ugliness

databases_users: "{{ 
    databases | subelements('users')
              | to_json | from_json
              | json_query('[*].{db: [0].db, name: [1].name}')
}}"

Explanation

As a reminder, our starting point is

[ {"db": "db_a", "users": [{"name": "alice"}, {"name": "alex"}]},
  ...]

the subelements filter transforms this into a list of Python tuple pairs

[ ({"db": "db_a", "users": [{"name": "alice"}, {"name": "alex"}]},
   {"name": "alice"}),
  ...]

to_json and from_json convert the tuple pairs to lists (JMESPath for Python ignores tuples)

[ [{"db": "db_a", "users": [{"name": "alice"}, {"name": "alex"}]},
   {"name": "alice"}],
  ...]

json_query selects the desired db and user values

[ {"db": "db_a", "name": "alice"},
  ...]
Demodulation answered 12/1, 2019 at 21:31 Comment(3)
A jq filter can do it: .databases | map({db, name: .users[].name}) (demo).Demodulation
This is no longer the case with the adoption of JEP 18Lease
Sweet, happy to accept a written up answerDemodulation
T
3

Not sure if it's an option but a custom function can do it like so:

import json
import jmespath


class CustomFunctions(jmespath.functions.Functions):
    @jmespath.functions.signature({'types': ['object']}, {'types': ['array']})
    def _func_map_merge(self, obj, arg):
        result = []
        for element in arg:
            merged_object = super()._func_merge(obj, element)
            result.append(merged_object)
        return result


options = jmespath.Options(custom_functions=CustomFunctions())


source = """
{"databases": [
  {"db": "db_a", "users": [{"name": "alice"}, {"name": "alex"}]},
  {"db": "db_b", "users": [{"name": "bob"}, {"name": "brienne"}]}
]}

"""

jmespath_expr = """
    databases[].map_merge({"db": db}, @.users[])[]
"""

result = jmespath.search(jmespath_expr, json.loads(source), options=options)
result

Producing

[{'db': 'db_a', 'name': 'alice'},
 {'db': 'db_a', 'name': 'alex'},
 {'db': 'db_b', 'name': 'bob'},
 {'db': 'db_b', 'name': 'brienne'}]
Thatcher answered 2/9, 2021 at 11:53 Comment(0)
O
2

An option would be to loop subelements

  tasks:
    - set_fact:
        my_db: "{{ my_db + [ item.0|combine(item.1) ] }}"
      loop: "{{ lookup('subelements',databases,'users') }}"
Oxidation answered 12/1, 2019 at 15:20 Comment(2)
Would you prefer I edit/expand your answer, then accept it? Or submit my own answer?Demodulation
Sure, go ahead.Oxidation
L
1

As of JEP 18 you can use a let expression, available in jmespath-community on PyPI.

A let-expression introduces lexical scoping and lets you bind variables that are evaluated In the context of a given lexical scope. This enables queries that can refer to elements defined outside of their current element

databases[*].let $db = db in users[*].{"db": $db, "name": name}

This produces

[
  [
    {"db": "db_a", "name": "alice"},
    {"db": "db_a", "name": "alex"}
  ],
  [
    {"db": "db_b", "name": "bob"},
    {"db": "db_b", "name": "brienne"}
  ]
]

As pointed out by Alex Willmer, you can flatten this with

(databases[*].let $db = db in users[(].{"db": $db, "name": name})[]

producing

[
  { "db": "db_a", "name": "alice" },
  { "db": "db_a", "name": "alex" },
  { "db": "db_b", "name": "bob" },
  { "db": "db_b", "name": "brienne" }
]
Lease answered 14/2 at 17:29 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.