Python: Accessing YAML values using "dot notation"
Asked Answered
G

7

14

I'm using a YAML configuration file. So this is the code to load my config in Python:

import os
import yaml
with open('./config.yml') as file:
    config = yaml.safe_load(file)

This code actually creates a dictionary. Now the problem is that in order to access the values I need to use tons of brackets.

YAML:

mysql:
    user:
        pass: secret

Python:

import os
import yaml
with open('./config.yml') as file:
    config = yaml.safe_load(file)
print(config['mysql']['user']['pass']) # <--

I'd prefer something like that (dot notation):

config('mysql.user.pass')

So, my idea is to utilize the PyStache render() interface.

import os
import yaml
with open('./config.yml') as file:
    config = yaml.safe_load(file)

import pystache
def get_config_value( yml_path, config ):
    return pystache.render('{{' + yml_path + '}}', config)

get_config_value('mysql.user.pass', config)

Would that be a "good" solution? If not, what would be a better alternative?

Additional question [Solved]

I've decided to use Ilja Everilä's solution. But now I've got an additional question: How would you create a wrapper Config class around DotConf?

The following code doesn't work but I hope you get the idea what I'm trying to do:

class Config( DotDict ):
    def __init__( self ):
        with open('./config.yml') as file:
            DotDict.__init__(yaml.safe_load(file))

config = Config()
print(config.django.admin.user)

Error:

AttributeError: 'super' object has no attribute '__getattr__'

Solution

You just need to pass self to the constructor of the super class.

DotDict.__init__(self, yaml.safe_load(file))

Even better soltution (Ilja Everilä)

super().__init__(yaml.safe_load(file))
Glanders answered 13/9, 2016 at 6:57 Comment(2)
Using a template engine for this is a truly awful hack. Please don't do this in any real application!Carrefour
stackoverflow.com/questions/11049117/… seems related, or even a duplicateCarrefour
B
18

The Simple

You could use reduce to extract the value from the config:

In [41]: config = {'asdf': {'asdf': {'qwer': 1}}}

In [42]: from functools import reduce
    ...: 
    ...: def get_config_value(key, cfg):
    ...:     return reduce(lambda c, k: c[k], key.split('.'), cfg)
    ...: 

In [43]: get_config_value('asdf.asdf.qwer', config)
Out[43]: 1

This solution is easy to maintain and has very few new edge cases, if your YAML uses a very limited subset of the language.

The Correct

Use a proper YAML parser and tools, such as in this answer.


The Convoluted

On a lighter note (not to be taken too seriously), you could create a wrapper that allows using attribute access:

In [47]: class DotConfig:
    ...:     
    ...:     def __init__(self, cfg):
    ...:         self._cfg = cfg
    ...:     def __getattr__(self, k):
    ...:         v = self._cfg[k]
    ...:         if isinstance(v, dict):
    ...:             return DotConfig(v)
    ...:         return v
    ...:     

In [48]: DotConfig(config).asdf.asdf.qwer
Out[48]: 1

Do note that this fails for keywords, such as "as", "pass", "if" and the like.

Finally, you could get really crazy (read: probably not a good idea) and customize dict to handle dotted string and tuple keys as a special case, with attribute access to items thrown in the mix (with its limitations):

In [58]: class DotDict(dict):
    ...:     
    ...:     # update, __setitem__ etc. omitted, but required if
    ...:     # one tries to set items using dot notation. Essentially
    ...:     # this is a read-only view.
    ...:
    ...:     def __getattr__(self, k):
    ...:         try:
    ...:             v = self[k]
    ...:         except KeyError:
    ...:             return super().__getattr__(k)
    ...:         if isinstance(v, dict):
    ...:             return DotDict(v)
    ...:         return v
    ...:
    ...:     def __getitem__(self, k):
    ...:         if isinstance(k, str) and '.' in k:
    ...:             k = k.split('.')
    ...:         if isinstance(k, (list, tuple)):
    ...:             return reduce(lambda d, kk: d[kk], k, self)
    ...:         return super().__getitem__(k)
    ...:
    ...:     def get(self, k, default=None):
    ...:         if isinstance(k, str) and '.' in k:
    ...:             try:
    ...:                 return self[k]
    ...:             except KeyError:
    ...:                 return default
    ...:         return super().get(k, default=default)
    ...:     

In [59]: dotconf = DotDict(config)

In [60]: dotconf['asdf.asdf.qwer']
Out[60]: 1

In [61]: dotconf['asdf', 'asdf', 'qwer']
Out[61]: 1

In [62]: dotconf.asdf.asdf.qwer
Out[62]: 1

In [63]: dotconf.get('asdf.asdf.qwer')
Out[63]: 1

In [64]: dotconf.get('asdf.asdf.asdf')

In [65]: dotconf.get('asdf.asdf.asdf', 'Nope')
Out[65]: 'Nope'
Backside answered 13/9, 2016 at 7:6 Comment(9)
YMMV, I'd call having a template library as a dependency for config access bloat.Decoration
This solution is much cleaner than abusing a template engine for this.Carrefour
@Lugaxx: Remember that you can wrap your config object once using DotDict (config = DotDict(config)) and then it's simply config.asdf.asdf.qwer everywhere else in your code. It won't get any shorter than that.Holloman
This final solution is really great. Thank you very much!Glanders
I would like to ask another question which is related to your solution. I've updated my original question accordingly. (See "Additional question")Glanders
You're using the "old" style of calling super class methods. Replace DotDict.__init__(yaml.safe_load(file)) with super().__init__(yaml.safe_load(file). In your original you were calling DotDict.__init__ with the loaded configuration as self. Explicitly calling some super classes method can be useful, but perhaps not in this situation.Decoration
Could the downvoter please explain what is wrong with the answer, so that it could be corrected?Decoration
@IljaEverilä I was the downvoter, and I thought I had left a warning comment, which must have gotten lost when I started my own answer at the same time. I had pointed out that your approach runs into trouble with keys that are not strings and/or with keys that are strings that are keywords for the Python language (like pass in the OPs exampe). The latter is a problem with many extensions to python dicts to allow dotted access (and has nothing specific to do with YAML). I reversed the downvote because your edit introduced "proper YAML parser", thanks for that.Lenrow
@Lenrow I did completely overlook keywords originally and only remembered that when reading your answer, so thank you for pointing it out.Decoration
L
3

On the one hand your example takes the right approach by using get_config_value('mysql.user.pass', config) instead of solving the dotted access with attributes. I am not sure if you realised that on purpose you were not trying to do the more intuitive:

print(config.mysql.user.pass)

which you can't get to work, even when overloading __getattr__, as pass is a Python language element.

However your example describes only a very restricted subset of YAML files as it doesn't involve any sequence collections, nor any complex keys.

If you want to cover more than the tiny subset you can e.g. extend the powerful round-trip capable objects of ruamel.yaml

import ruamel.yaml

def mapping_string_access(self, s, delimiter=None, key_delim=None):
    def p(v):
        try:
            v = int(v)
        except:
            pass
        return v
       # possible extend for primitives like float, datetime, booleans, etc.

    if delimiter is None:
        delimiter = '.'
    if key_delim is None:
        key_delim = ','
    try:
        key, rest = s.split(delimiter, 1)
    except ValueError:
        key, rest = s, None
    if key_delim in key:
        key = tuple((p(key) for key in key.split(key_delim)))
    else:
        key = p(key)
    if rest is None:
        return self[key]
    return self[key].string_access(rest, delimiter, key_delim)

ruamel.yaml.comments.CommentedMap.string_access = mapping_string_access


def sequence_string_access(self, s, delimiter=None, key_delim=None):
    if delimiter is None:
        delimiter = '.'
    try:
        key, rest = s.split(delimiter, 1)
    except ValueError:
        key, rest = s, None
    key = int(key)
    if rest is None:
        return self[key]
    return self[key].string_access(rest, delimiter, key_delim)

ruamel.yaml.comments.CommentedSeq.string_access = sequence_string_access

Once that is set up you are can run the following:

yaml_str = """\
mysql:
    user:
        pass: secret
    list: [a: 1, b: 2, c: 3]
    [2016, 9, 14]: some date
    42: some answer
"""

yaml = ruamel.yaml.YAML()
config = yaml.load(yaml_str)

def get_config_value(path, data, **kw):
    return data.string_access(path, **kw)

print(get_config_value('mysql.user.pass', config))
print(get_config_value('mysql:user:pass', config, delimiter=":"))
print(get_config_value('mysql.list.1.b', config))
print(get_config_value('mysql.2016,9,14', config))
print(config.string_access('mysql.42'))

giving:

secret
secret
2
some date
some answer

showing that with a bit more forethought and very little extra work you can have flexible dotted access to many to a vast range of YAML files, and not just those consisting of recursive mappings with string scalars as keys.

  1. As shown you can directly call config.string_access(mysql.user.pass) instead of defining and using get_config_value()
  2. this works with strings and integers as mapping keys, but can be easily extended to support other key types (boolean, date, date-time).

¹ This was done using ruamel.yaml a YAML 1.2 parser, of which I am the author.

Lenrow answered 14/9, 2016 at 8:32 Comment(1)
A handy wrapper that can makes use of ruamel.yaml is python-box, see this answer.Rib
R
3

I ended up using python-box. This package provides multiple ways to read config files (yaml, csv, json, ...). And not only that, it allows you to pass dict or strings directly:

from box import Box
import yaml # Only required for different loaders

# Pass dict directly
movie_box = Box({ "Robin Hood: Men in Tights": { "imdb stars": 6.7, "length": 104 } })

# Load from yaml file
# Here it is also possible to use PyYAML arguments, 
# for example to specify different loaders e.g. SafeLoader or FullLoader
conf = Box.from_yaml(filename="./config.yaml", Loader=yaml.FullLoader) 

conf.mysql.user.pass

A lot more examples, are available in the Wiki.

Rib answered 24/7, 2020 at 10:42 Comment(0)
L
2

It's quite old question, but I came here hunting for the answer, but looking for more simpler solution. Finally, came up with my own solution using easydict library; installed using pip install easydict

  def yaml_load(fileName):
    import yaml
    from easydict import EasyDict as edict
    fc = None
    with open(fileName, 'r') as f:
      fc = edict(yaml.load(f))
      ## or use safe_load
      ## fc = edict(yaml.safe_load(f))

    return fc

Now, simply call yaml_load with the valid yaml filename:

config = yaml_load('./config.yml')

## assuming: config["mysql"]["user"]["pass"] is a valid key in config.yml
print("{}".format(config.mysql.user.pass))
Luci answered 12/1, 2019 at 13:45 Comment(0)
A
1

I had the same problem a while ago and built this getter:

 def get(self, key):
    """Tries to find the configuration value for a given key.
    :param str key: Key in dot-notation (e.g. 'foo.lol').
    :return: The configuration value. None if no value was found.
    """
    try:
        return self.__lookup(self.config, key)
    except KeyError:
        return None

def __lookup(self, dct, key):
    """Checks dct recursive to find the value for key.
    Is used by get() interanlly.
    :param dict dct: The configuration dict.
    :param str key: The key we are looking for.
    :return: The configuration value.
    :raise KeyError: If the given key is not in the configuration dict.
    """
    if '.' in key:
        key, node = key.split('.', 1)
        return self.__lookup(dct[key], node)
    else:
        return dct[key]

The getter looks-up a config value from self.config in a recursive manner (by using __lookup). If you have trouble adjusting this for your case, feel free to ask for further help.

Adrenal answered 13/9, 2016 at 7:51 Comment(0)
T
1

The Meta/Facebook's hydra library is probably too complicated here. The basis of Meta/Facebook's hydra library is omegaconf, which probably meets your needs. It's battle tested and ready to go and stores everything in ConfDict's, which I guess are hash maps and as such lightweight and fast.

from omegaconf import OmegaConf

conf = OmegaConf.load(floyd_yaml_path)

print(conf.mysql.user.pass)
Toothbrush answered 7/8, 2023 at 13:7 Comment(0)
U
0

I generally follow a best practice of converting config (any kind, not just yaml) to an in memory object.

This way the text based config is unwrapped by 1 function and the text is thrown away, giving a beautiful object to work with as against having every function to deal with the internals of the config. That way all functions only know of that one internal object interface. If any new parameter is added/renamed/deleted from the config file, the only function to change is the loader function which loads the config into the in memory object.

Below is an example i did for loading FloydHub config yaml file into an in-memory object. I feel it is a very good design pattern.

First define a config representative class like below:

class FloydYamlConfig(object):
class Input:
    def __init__(self, destination, source):
        self.destination = destination
        self.source = source

def __init__(self, floyd_yaml_dict):
    self.machine = floyd_yaml_dict['machine']
    self.env = floyd_yaml_dict['env']
    self.description = floyd_yaml_dict['description']
    self.max_runtime = floyd_yaml_dict['max_runtime']
    self.command = floyd_yaml_dict['command']
    self.input = []
    for input_conf in floyd_yaml_dict['input']:
        input_obj = self.Input(destination=input_conf['destination'], source=input_conf['source'])
        self.input.append(input_obj)

def __str__(self):
    input_str = ''
    for input_obj in self.input:
        input_str += '\ndestination: {}\n source: {}'.format(input_obj.destination, input_obj.source)

    print_str = ('machine: {}\n'
                 'env: {}\n'
                 'input: {}\n'
                 'description: {}\n'
                 'max_runtime: {}\n'
                 'command: {}\n').format(
        self.machine, self.env, input_str, self.description, self.max_runtime, self.command)
    return print_str

Then load the yaml into the object for further use:

floyd_conf = read_floyd_yaml_config(args.floyd_yaml_path)

def read_floyd_yaml_config(floyd_yaml_path) -> FloydYamlConfig:
    with open(floyd_yaml_path) as f:
        yaml_conf_dict = yaml.safe_load(f)

    floyd_conf = FloydYamlConfig(yaml_conf_dict)
    # print(floyd_conf)
    return floyd_conf

Sample yaml

# see: https://docs.floydhub.com/floyd_config

machine: gpu2
env: tensorflow-1.0
input:
  - destination: data
    source: abc/datasets/my-data/6
  - destination: config
    source: abc/datasets/my-config/1
description: this is a test
max_runtime: 3600
command: >-
  echo 'hello world'
Uniat answered 15/12, 2019 at 16:8 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.