How to auto-dump modified values in nested dictionaries using ruamel.yaml?
Asked Answered
C

1

1

When I try following solution PyYAML - Saving data to .yaml files and try to modify values in nested dictionaries using ruamel.yaml

cfg = Config("test.yaml")
cfg['setup']['a'] = 3 
print(cfg)  # I can see the change for the `dict` but it is not saved

cfg['setup']['a'] value is changed but it is not caught by the __setitem__() and not saved using updated() function.

Would it be possible to auto-dump any modified change for values in nested dict?

ex:

  • dict[in_key][out_key] = value
  • cfg['setup']['a'][b]['c'] = 3

PyYAML - Saving data to .yaml files:


class Config(dict):
    def __init__(self, filename, auto_dump=True):
        self.filename = filename
        self.auto_dump = auto_dump
        self.changed = False
        self.yaml = YAML()
        self.yaml.preserve_quotes = True
        if os.path.isfile(filename):
            with open(filename) as f:
                super(Config, self).update(self.yaml.load(f) or {})

    def dump(self, force=False):
        if not self.changed and not force:
            return
        with open(self.filename, "w") as f:
            self.yaml.dump(dict(self), f)
        self.changed = False

    def updated(self):
        if self.auto_dump:
            self.dump(force=True)
        else:
            self.changed = True

    def __setitem__(self, key, value):
        super(Config, self).__setitem__(key, value)
        self.updated()

    def update(self, *args, **kw):
        for arg in args:
            super(Config, self).update(arg)
        super(Config, self).update(**kw)
        self.updated()

Related:

Cafard answered 7/8, 2021 at 10:26 Comment(0)
C
1

You will need to make a secondary class SubConfig that behaves similar to Config. It is probably a good idea to get rid of the old style super(Config, self) before that.

Change __setitem__ to check that the value is a dict, and if so instantiate SubConfig and then setting the individual items (the SubConfig needs to do that as well, so you can have arbitrary nesting).

The SubConfig, on __init__, doesn't take a filename, but it takes a parent (of type Config or SubConfig). Subconfig itself shouldn't dump, and its updated should call the parents updated (eventually bubbling up to Config that then does a save).

In order to support doing cfg['a'] = dict(c=1) you need to implement __getitem__, and similar for del cfg['a'] implement __delitem__, to make it write the updated file.

I thought you could subclass one file fromt the other as several methods are the same, but couldn't get this to work with super() properly.

If you ever assign lists to (nested) keys, and want to autodump on updating an element in such a list you'll need to implement some SubConfigList and handle those in __setitem__

import sys
import os
from pathlib import Path
import ruamel.yaml

class SubConfig(dict):
    def __init__(self, parent):
        self.parent = parent

    def updated(self):
        self.parent.updated()

    def __setitem__(self, key, value):
        if isinstance(value, dict):
            v = SubConfig(self)
            v.update(value)
            value = v
        super().__setitem__(key, value)
        self.updated()

    def __getitem__(self, key):
        try:
            res = super().__getitem__(key)
        except KeyError:
            super().__setitem__(key, SubConfig(self))
            self.updated()
            return super().__getitem__(key)
        return res

    def __delitem__(self, key):
        res = super().__delitem__(key)
        self.updated()

    def update(self, *args, **kw):
        for arg in args:
            for k, v in arg.items():
                self[k] = v
        for k, v in kw.items():
            self[k] = v
        self.updated()
        return

_SR = ruamel.yaml.representer.SafeRepresenter
_SR.add_representer(SubConfig, _SR.represent_dict)

class Config(dict):
    def __init__(self, filename, auto_dump=True):
        self.filename = filename if hasattr(filename, 'open') else Path(filename)
        self.auto_dump = auto_dump
        self.changed = False
        self.yaml = ruamel.yaml.YAML(typ='safe')
        self.yaml.default_flow_style = False
        if self.filename.exists():
            with open(filename) as f:
                self.update(self.yaml.load(f) or {})

    def updated(self):
        if self.auto_dump:
            self.dump(force=True)
        else:
            self.changed = True

    def dump(self, force=False):
        if not self.changed and not force:
            return
        with open(self.filename, "w") as f:
            self.yaml.dump(dict(self), f)
        self.changed = False

    def __setitem__(self, key, value):
        if isinstance(value, dict):
            v = SubConfig(self)
            v.update(value)
            value = v
        super().__setitem__(key, value)
        self.updated()

    def __getitem__(self, key):
        try:
            res = super().__getitem__(key)
        except KeyError:
            super().__setitem__(key, SubConfig(self))
            self.updated()
        return super().__getitem__(key)

    def __delitem__(self, key):
        res = super().__delitem__(key)
        self.updated()

    def update(self, *args, **kw):
        for arg in args:
            for k, v in arg.items():
                self[k] = v
        for k, v in kw.items():
            self[k] = v
        self.updated()

config_file = Path('config.yaml') 

cfg = Config(config_file)
cfg['a'] = 1
cfg['b']['x'] = 2
cfg['c']['y']['z'] = 42

print(f'{config_file} 1:')
print(config_file.read_text())

cfg['b']['x'] = 3
cfg['a'] = 4

print(f'{config_file} 2:')
print(config_file.read_text())

cfg.update(a=9, d=196)
cfg['c']['y'].update(k=11, l=12)

print(f'{config_file} 3:')
print(config_file.read_text())
        
# reread config from file
cfg = Config(config_file)
assert isinstance(cfg['c']['y'], SubConfig)
assert cfg['c']['y']['z'] == 42
del cfg['c']
print(f'{config_file} 4:')
print(config_file.read_text())


# start from scratch immediately use updating
config_file.unlink()
cfg = Config(config_file)
cfg.update(a=dict(b=4))
cfg.update(c=dict(b=dict(e=5)))
assert isinstance(cfg['a'], SubConfig)
assert isinstance(cfg['c']['b'], SubConfig)
cfg['c']['b']['f'] = 22

print(f'{config_file} 5:')
print(config_file.read_text())

which gives:

config.yaml 1:
a: 1
b:
  x: 2
c:
  y:
    z: 42

config.yaml 2:
a: 4
b:
  x: 3
c:
  y:
    z: 42

config.yaml 3:
a: 9
b:
  x: 3
c:
  y:
    k: 11
    l: 12
    z: 42
d: 196

config.yaml 4:
a: 9
b:
  x: 3
d: 196

config.yaml 5:
a:
  b: 4
c:
  b:
    e: 5
    f: 22

You should consider not making these classes a subclass of dict, but have the dict as an attribute ._d (and replace super(). with self._d.). This would require a specific representer function/method.

The advantage of that is that you don't get some dict functionality unexpectedly. E.g. in the above subclassing implementation, if I hadn't implemented __delitem__, you could still do del cfg['c'] without an error, but the YAML file would not be written automatically. If the dict is an attribute, you'll get an error until you implement __delitem__.

Cornhusking answered 7/8, 2021 at 17:27 Comment(29)
What should be arguments for __setitem__(self, ...)?Cafard
__setitem__(self, key, value) both for Config and SubConfigCornhusking
Sorry, I get confused there will be multiple keys *key, it does not end up in __setitem__(self, key, value) functionCafard
I don't see any *key in your code (or mine).Cornhusking
Like for cfg['setup']['a'] = 3 there are two keys first one is setup and second one is a. But __setitem__(self, key, value) requires single key hence it does not enter into that function. It could be cfg['setup']['a'][b]['c'] = 3, which has 4 key values.Cafard
I think the call should be something like: cfg['setup', 'a', b, 'c'] = 3Cafard
No. You get one key when calling Config.__setitem__ and that returns a SubConfig and its __setitem__ is called with another key.Cornhusking
Should SubConfig be base clase where Config(SubConfig)Cafard
You could, since several methods are exactly the same, You could do it the other way around (class Subconfig(Config):) but since you probably want to add a representer for SubConfig, but not for Config that might not work out.Cornhusking
Thanks, should I also make the setting operations using update() method?Cafard
You need the update() to walk over the key/value pairs so it does the right thing when the value is a dict. I made the answer somewhat more extensive, so have a look at that. Some more testing might be needed.Cornhusking
Thanks. Works like magic. If I see somehing goes wrong, I will let you know. Why do we need _SR? How does it help?Cafard
_SR is just to make the next line shorter. You need to register SubConfig to dump like a dict, so that YAML knows how to dump it (Config doesn't need to register, as I do dict(self)Cornhusking
During the write operation if accidently I terminate the process it may end up writing into file setting all keys' value as empty ({}). Would it be possible to prevent this?Cafard
Why don't you write to a temporary file, and only when that completes successfully, unlink the real filename and rename the temporary to the real?Cornhusking
Ah that's smart simple solution, writing into config_temp.yaml and that into original file. By saying unlink the real filename , just doing mv config_temp.yaml config.yaml right?Cafard
if you do it in python there is os.rename() but it throws an error if the target name exists. So you should use os.unlink('config.yaml') or os.remove('config.yaml') first.Cornhusking
Also when don't need assignment into res right, since res is not used? res = super().__getitem__(key) could be just super().__getitem__(key)Cafard
I think I wanted to use res to return if the exception was not thrown to prevent double lookup in that case.Cornhusking
I just realize your solution does not save the comments is it normal? other than that it works solidCafard
Yes that is normal if you do YAML(typ='safe') you get the fast C based loader. that doesn't preserve comments.Cornhusking
Would be OK if I use not safe version in order to keep the comments? Is there any risk for it?Cafard
The default is the round-trip-loader, which is a subclass of the safeloader, and of the (unsafe) Loader. The round-trip-loader will not instantiate any, potentially harmful, classes based on tag (instead it makes commentedmap/seq instances and sets their tag).Cornhusking
I have changed ruamel.yaml.YAML(typ='safe') into ruamel.yaml.YAML() but now facing with following error: raise RepresenterError(_F('cannot represent an object: {data!s}', data=data)). For this approach as I understand I have to stick with safeCafard
When multiple processes read/write on the same yaml file; os.rename() may cause a problem where one of the may rename the temporary file at the same time while other is working on it. Do you advice any lock mechanisim that I can use along with ruamel.yaml? If required I can as a new question related to this? @CornhuskingCafard
No preference, but you would need to consider lock/re-load/modify/write and make sure you catch any setting of something already set by another process.Cornhusking
Yes that is possible. And additional questions are not what comments are forCornhusking
@Cafard Not sure what you try to edit, but it was unacceptable to change all of the code lines as indicated by SO. Please don't do that. Someone else had already rejected your edit, and so did I.Cornhusking
@Cafard You don't have to comment, I get notified automatically if a question tagged [ruamel.yaml] is posted.Cornhusking

© 2022 - 2024 — McMap. All rights reserved.