XPath like query for nested python dictionaries
Asked Answered
F

11

55

Is there a way to define a XPath type query for nested python dictionaries.

Something like this:

foo = {
  'spam':'eggs',
  'morefoo': {
               'bar':'soap',
               'morebar': {'bacon' : 'foobar'}
              }
   }

print( foo.select("/morefoo/morebar") )

>> {'bacon' : 'foobar'}

I also needed to select nested lists ;)

This can be done easily with @jellybean's solution:

def xpath_get(mydict, path):
    elem = mydict
    try:
        for x in path.strip("/").split("/"):
            try:
                x = int(x)
                elem = elem[x]
            except ValueError:
                elem = elem.get(x)
    except:
        pass

    return elem

foo = {
  'spam':'eggs',
  'morefoo': [{
               'bar':'soap',
               'morebar': {
                           'bacon' : {
                                       'bla':'balbla'
                                     }
                           }
              },
              'bla'
              ]
   }

print xpath_get(foo, "/morefoo/0/morebar/bacon")

[EDIT 2016] This question and the accepted answer are ancient. The newer answers may do the job better than the original answer. However I did not test them so I won't change the accepted answer.

Fleawort answered 6/9, 2011 at 13:1 Comment(5)
Why not using foo['morefoo']['morebar'] ?Deviation
because I want to do: def bla(query): data.select(query)Fleawort
@Deviation It would be more interesting with lists where the path microlanguage would return multiple items.Disentail
@PavelŠimerda Yes, way more interesting, especially with wildcard queries (find all values under a specific key), and then - also recurse down lists or [named]tuples...Khan
This question (in Python) essentially asks for a recommendation of a 3rd party library.Fuscous
K
18

Not exactly beautiful, but you might use sth like

def xpath_get(mydict, path):
    elem = mydict
    try:
        for x in path.strip("/").split("/"):
            elem = elem.get(x)
    except:
        pass

    return elem

This doesn't support xpath stuff like indices, of course ... not to mention the / key trap unutbu indicated.

Keyhole answered 6/9, 2011 at 13:25 Comment(2)
In 2011 maybe there weren't as many options as there are today, but in 2014, I think, solving the problem this way is not elegant and should be avoided.Cornwell
@Cornwell is this just a guess or are there solutions that solve this more nicely?Roughshod
C
22

One of the best libraries I've been able to identify, which, in addition, is very actively developed, is an extracted project from boto: JMESPath. It has a very powerful syntax of doing things that would normally take pages of code to express.

Here are some examples:

search('foo | bar', {"foo": {"bar": "baz"}}) -> "baz"
search('foo[*].bar | [0]', {
    "foo": [{"bar": ["first1", "second1"]},
            {"bar": ["first2", "second2"]}]}) -> ["first1", "second1"]
search('foo | [0]', {"foo": [0, 1, 2]}) -> [0]
Cornwell answered 26/9, 2014 at 1:34 Comment(2)
but this does not allow to modify the dict :(Alva
@Alva It is possible to modify a dict using JMESPath. The real problem with JMESPath, imo, is that null values are ignored. As far as JMESPath is concerned, the following dict: {'a': null} is indistinguishable from the empty dict. There is a better alternative: JSonata. However, it doesn't have as good a python port as JMESPath does. There is no Python binding for the latest major release of JSonata.Montparnasse
S
19

There is an easier way to do this now.

http://github.com/akesterson/dpath-python

$ easy_install dpath
>>> dpath.util.search(YOUR_DICTIONARY, "morefoo/morebar")

... done. Or if you don't like getting your results back in a view (merged dictionary that retains the paths), yield them instead:

$ easy_install dpath
>>> for (path, value) in dpath.util.search(YOUR_DICTIONARY, "morefoo/morebar", yielded=True)

... and done. 'value' will hold {'bacon': 'foobar'} in that case.

Stock answered 12/5, 2013 at 13:53 Comment(1)
The iterated statement doesn't run---there's no body to the for statement.Frisk
K
18

Not exactly beautiful, but you might use sth like

def xpath_get(mydict, path):
    elem = mydict
    try:
        for x in path.strip("/").split("/"):
            elem = elem.get(x)
    except:
        pass

    return elem

This doesn't support xpath stuff like indices, of course ... not to mention the / key trap unutbu indicated.

Keyhole answered 6/9, 2011 at 13:25 Comment(2)
In 2011 maybe there weren't as many options as there are today, but in 2014, I think, solving the problem this way is not elegant and should be avoided.Cornwell
@Cornwell is this just a guess or are there solutions that solve this more nicely?Roughshod
Y
14

There is the newer jsonpath-rw library supporting a JSONPATH syntax but for python dictionaries and arrays, as you wished.

So your 1st example becomes:

from jsonpath_rw import parse

print( parse('$.morefoo.morebar').find(foo) )

And the 2nd:

print( parse("$.morefoo[0].morebar.bacon").find(foo) )

PS: An alternative simpler library also supporting dictionaries is python-json-pointer with a more XPath-like syntax.

Yoruba answered 1/1, 2014 at 10:22 Comment(1)
Note that jsonpath uses eval and jsonpath-rw looks unmaintained (it also says some features are missing, but I haven't tried it).Backchat
T
11

dict > jmespath

You can use JMESPath which is a query language for JSON, and which has a python implementation.

import jmespath # pip install jmespath

data = {'root': {'section': {'item1': 'value1', 'item2': 'value2'}}}

jmespath.search('root.section.item2', data)
Out[42]: 'value2'

The jmespath query syntax and live examples: http://jmespath.org/tutorial.html

dict > xml > xpath

Another option would be converting your dictionaries to XML using something like dicttoxml and then use regular XPath expressions e.g. via lxml or whatever other library you prefer.

from dicttoxml import dicttoxml  # pip install dicttoxml
from lxml import etree  # pip install lxml

data = {'root': {'section': {'item1': 'value1', 'item2': 'value2'}}}
xml_data = dicttoxml(data, attr_type=False)
Out[43]: b'<?xml version="1.0" encoding="UTF-8" ?><root><root><section><item1>value1</item1><item2>value2</item2></section></root></root>'

tree = etree.fromstring(xml_data)
tree.xpath('//item2/text()')
Out[44]: ['value2']

Json Pointer

Yet another option is Json Pointer which is an IETF spec that has a python implementation:

From the jsonpointer-python tutorial:

from jsonpointer import resolve_pointer

obj = {"foo": {"anArray": [ {"prop": 44}], "another prop": {"baz": "A string" }}}

resolve_pointer(obj, '') == obj
# True

resolve_pointer(obj, '/foo/another%20prop/baz') == obj['foo']['another prop']['baz']
# True

>>> resolve_pointer(obj, '/foo/anArray/0') == obj['foo']['anArray'][0]
# True

Trammel answered 8/7, 2018 at 20:55 Comment(6)
checking this, as I wouldn't want to change backend API, but to traverse the output jsonUndercoat
Converting from dict to xml and than using path doesn't seem to me as good practise.Archivist
There's a problem with JMESPath: null values are ignored. As far as JMESPath is concerned, the following dict: {'a': null} is indistinguishable from the empty dict. There is a better alternative: JSonata. However, it doesn't have as good a python port as JMESPath does. There is no Python binding for the latest major release of JSonata.Montparnasse
Correction: After studying JSonata in more depth, I have come to realize it is not better than JMESPath, but actually just as worse, if not more so, since JSONata ignores empty JSON arrays, and doesn't distinguish between a singleton JSON array and its unique element. Hence for JSONata the following two JSON objects are indistinguishable: {"a": [], "b": 1}, {"b": [1]}. Both JMESPath and JSONata claim to be supersets of JSON, and while they may indeed be supersets syntactically, neither of them preserves JSON's value semantics.Montparnasse
I now believe JSONiq is the best query language for JSON expressions. It is a true superset of JSON, both syntactically and semantically, its syntax is intuitive, imo, and its computational model is easy to grasp.Montparnasse
@EvanAad: it might well be the best for now.. but all of these solutions tend to be a solution to a problem someone had and tends to be backed by a single developer/company; they rarely become widely adopted and community-supported, most of them have a quick development spike and then become unmaintained and abandoned after a while, or at best do minimal version bumps and small fixesTrammel
O
5

If terseness is your fancy:

def xpath(root, path, sch='/'):
    return reduce(lambda acc, nxt: acc[nxt],
                  [int(x) if x.isdigit() else x for x in path.split(sch)],
                  root)

Of course, if you only have dicts, then it's simpler:

def xpath(root, path, sch='/'):
    return reduce(lambda acc, nxt: acc[nxt],
                  path.split(sch),
                  root)

Good luck finding any errors in your path spec tho ;-)

Ontine answered 1/2, 2018 at 17:48 Comment(5)
This will avoid converting things to ints if a node is a dict: def xpath(root, path, sep='/'): return reduce(lambda node, key: node[key if hasattr(node, 'keys') else int(key)], path.split(sep), root)Xenocryst
Cool solution. For Python 3, need from functools import reduce though.Aggi
I like this terseness - the parser should give a key not found error when the path spec is wrong, so that should not be very painful to debug.Bermudez
great solution, but breaks when you have a dictionary key as an integer, e.g. in d1 = {'a': {'1': {'c': {'d': {'e': 2}}}}, 'c': {'e': {}}}Curvaceous
Of course, it is impossible to distinguish when to key into a list or key into a dictionary without introducing more syntax to the xquery logic.Curvaceous
R
2

More work would have to be put into how the XPath-like selector would work. '/' is a valid dictionary key, so how would

foo={'/':{'/':'eggs'},'//':'ham'}

be handled?

foo.select("///")

would be ambiguous.

Revel answered 6/9, 2011 at 13:12 Comment(2)
Yes, you would need a parser for that. But what I am asking is for a xpath like method. "morefoo.morebar" is fine by me.Fleawort
@RickyA: '.' is also a value dictionary key. The same problem would exist. foo.select('...') would be ambiguous.Revel
D
2

Another alternative (besides that suggested by jellybean) is this:

def querydict(d, q):
  keys = q.split('/')
  nd = d
  for k in keys:
    if k == '':
      continue
    if k in nd:
      nd = nd[k]
    else:
      return None
  return nd

foo = {
  'spam':'eggs',
  'morefoo': {
               'bar':'soap',
               'morebar': {'bacon' : 'foobar'}
              }
   }
print querydict(foo, "/morefoo/morebar")
Deviation answered 6/9, 2011 at 13:30 Comment(1)
this should be the solutionCockhorse
R
1

Is there any reason for you to the query it the way like the XPath pattern? As the commenter to your question suggested, it just a dictionary, so you can access the elements in a nest manner. Also, considering that data is in the form of JSON, you can use simplejson module to load it and access the elements too.

There is this project JSONPATH, which is trying to help people do opposite of what you intend to do (given an XPATH, how to make it easily accessible via python objects), which seems more useful.

Rimbaud answered 6/9, 2011 at 13:17 Comment(4)
The reason is that I want to split the data and the query. I want to be flexible in the query part. If I access it the nested way the query is hardcoded in the program.Fleawort
@RickyA, in the other comment you say morefoo.morebar is fine. Did you check the JSONPATH project (Download and look at the source and tests).Rimbaud
I did take a look at JSONPATH, but my input is not text/json. It's nested dictionaries.Fleawort
@RickyA's question is super valuable when using mongodb, for example. If you want to iterate over nested keys in a BSON document, this is necessary.Frisk
N
0
def Dict(var, *arg, **kwarg):
  """ Return the value of an (imbricated) dictionnary, if all fields exist else return "" unless "default=new_value" specified as end argument
      Avoid TypeError: argument of type 'NoneType' is not iterable
      Ex: Dict(variable_dict, 'field1', 'field2', default = 0)
  """
  for key in arg:
    if isinstance(var, dict) and key and key in var:  var = var[key]
    else:  return kwarg['default'] if kwarg and 'default' in kwarg else ""   # Allow Dict(var, tvdbid).isdigit() for example
  return kwarg['default'] if var in (None, '', 'N/A', 'null') and kwarg and 'default' in kwarg else "" if var in (None, '', 'N/A', 'null') else var

foo = {
  'spam':'eggs',
  'morefoo': {
               'bar':'soap',
               'morebar': {'bacon' : 'foobar'}
              }
   }
print Dict(foo, 'morefoo', 'morebar')
print Dict(foo, 'morefoo', 'morebar', default=None)

Have a SaveDict(value, var, *arg) function that can even append to lists in dict...

Nicoline answered 23/9, 2018 at 23:32 Comment(0)
F
0

I reference form this link..

Following code is for json xpath base parse implemented in python :

import json
import xmltodict

# Parse the json string
class jsonprase(object):
    def __init__(self, json_value):
        try:
            self.json_value = json.loads(json_value)
        except Exception :
            raise ValueError('must be a json str value')


    def find_json_node_by_xpath(self, xpath):
        elem = self.json_value
        nodes = xpath.strip("/").split("/")
        for x in range(len(nodes)):
            try:
                elem = elem.get(nodes[x])
            except AttributeError:
                elem = [y.get(nodes[x]) for y in elem]
        return elem

    def datalength(self, xpath="/"):
        return len(self.find_json_node_by_xpath(xpath))

    @property
    def json_to_xml(self):
        try:
            root = {"root": self.json_value}
            xml = xmltodict.unparse(root, pretty=True)
        except ArithmeticError :
            pyapilog().error(e)
        return xml

Test Json :

{
    "responseHeader": {
        "zkConnected": true,
        "status": 0,
        "QTime": 2675,
        "params": {
            "q": "TxnInitTime:[2021-11-01T00:00:00Z TO 2021-11-30T23:59:59Z] AND Status:6",
            "stats": "on",
            "stats.facet": "CountryCode",
            "rows": "0",
            "wt": "json",
            "stats.field": "ItemPrice"
        }
    },
    "response": {
        "numFound": 15162439,
        "start": 0,
        "maxScore": 1.8660598,
        "docs": []
    }
}

Test Code to read the values from above input json.

numFound = jsonprase(ABOVE_INPUT_JSON).find_json_node_by_xpath('/response/numFound')
print(numFound)
Fawkes answered 9/11, 2021 at 12:2 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.