Fields not initialized when __post_init__() called using ruamel.yaml
Asked Answered
G

2

5

I have two dataclasses: Msg and Field. Msg has a field fields of type list[Field]. I want to assign something to a field of each Field after they have all been initialized which is more or less their relative index in the fields list.

However, when I add a __post_init__(self) method to the Msg dataclass, the fields list is empty, so I can't update the indices.

from dataclasses import dataclass
from ruamel.yaml import YAML

@dataclass
class Msg:

    id: int
    desc: str
    fields: list[Field]

    def __post_init__(self) -> None:
        idx: int = 0
        for field in self.fields: # why is this empty??
            field.index = idx
            idx += field.size

@dataclass
class Field:
    id: int
    name: str
    units: str
    size: int
    index: int = -1

y = YAML()
y.register_class(Msg)
y.register_class(Field)

msg: Msg = y.load("""\
!Msg
id: 1
desc: status
fields:
- !Field
    id: 1
    name: Temp
    units: degC
    size: 2
""")

assert(msg.fields[0].index != -1) # fails :(

Why is this? How is the Msg being initialized without fields being initialized? Is there any way to do what I am trying to do using the class system? I am using Python 3.11 with ruamel.yaml 0.18.5 on MacOS.

Goforth answered 14/11, 2023 at 22:42 Comment(1)
My guess is that this is because the Fields get filled in later, which unfortunately kind of defeats the purpose of something like __post_init__().Goforth
N
2

By default, object serializers such as YAML and pickle have no idea what to do with the attribute mapping for a user-defined object other than to assign the mapping directly to the object's attribute dictionary as-is.

This is why you can define a __setstate__ method for your class, so that ruamel.yaml's object constructor knows in this case to call the __init__ method with the mapping unpacked as arguments, which in turn calls __post_init__ for post-initialization:

@dataclass
class Msg:
    id: int
    desc: str
    fields: list[Field]

    def __post_init__(self) -> None:
        idx: int = 0
        for field in self.fields:
            field.index = idx
            idx += field.size

    def __setstate__(self, state):
        self.__init__(**state)

Demo: https://replit.com/@blhsing1/PowerfulThoseJavadocs

Nitroglycerin answered 15/11, 2023 at 2:57 Comment(6)
Yeah, that'll do it. Annoyingly this means that I end up taking O(n^2) time to instantiate each Msg, so I might end up adding some caching here or something. It makes sense, since there's no way for YAML to know when it's at the "last field". Also, it'd be nice if there was more documentation of this kind of thing - I had an inkling that setstate would solve it, but had no idea of the syntax. Thanks for the answer!Goforth
Agreed regarding the lack of documentation for ruamel.yaml. I was originally writing a custom ruamel.yaml constructor as an answer to your question when I noticed in the source code of ruamel.yaml.RoundTripConstructor.construct_yaml_object that it supports pickle's __setstate__ protocol. Regarding your performance concern, I don't see why this solution would make the construction of a Msg object O(n ^ 2) in time complexity, as the time cost should be linear to the number of Field objects the Msg object has.Nitroglycerin
Here's a demo of an alternative solution using a custom ruamel.yaml constructor in case you're interested: replit.com/@blhsing1/MealyCourteousAgentsNitroglycerin
I suppose it depends on the internal implementation - my assumption was that it walks the tree and updates the list every time it hits a new field, which means calling setstate() and iterating for every field, so n^2. But if it's smart enough to construct the list itself and then setstate() with the whole list already constructed, then that would only be n. As to the custom constructor, I saw that I could head that direction but was squicked out by the complete lack of docs :) Lucky that it was just setstate() in this case.Goforth
Yes, it is only sensible for all child elements to be constructed first before __setstate__ is called, and that is exactly what ruamel.yaml's object constructor does. Luckily the source code of ruamel.yaml (and pyyaml that it is based on) is structured in logical and reasonable enough a way that the lack of sufficient docs isn't too big of a deal to me. :-)Nitroglycerin
I recommend strongly against providing a __setstate__ for dataclasses, as that wil interfere with ruamel.yaml calling _post_init__. (which does with appropriate InitVar parameters t if you have such fields). Up until 0.18.5 it "just" didn't initialise non-scalar (i.e. collection) values correctly because of a bug.Travertine
T
4

You discovered a bug in ruamel.yaml, in that, up to 0.18.5, the fields for the dataclass were not properly initialised when consisting of a collection, and only worked and were tested) with simple scalars.

This is going to be fixed in 0.18.6, but until then you can do:

from __future__ import annotations

from dataclasses import dataclass
import ruamel.yaml

from ruamel.yaml.constructor import SafeConstructor

class MyConstructor(ruamel.yaml.RoundTripConstructor):
    def construct_yaml_object(self, node, cls):
        from dataclasses import is_dataclass, InitVar, MISSING

        data = cls.__new__(cls)
        yield data
        if hasattr(data, '__setstate__'):
            state = SafeConstructor.construct_mapping(self, node, deep=True)
            data.__setstate__(state)
        elif is_dataclass(data):
            mapping = SafeConstructor.construct_mapping(self, node, deep=True)
            #                                                       ^^ missing in 0.18.5
            init_var_defaults = {}
            for field in data.__dataclass_fields__.values():
                if (
                    isinstance(field.type, InitVar) or field.type is  InitVar or
                    (isinstance(field.type, str) and field.type.startswith('InitVar'))
                ) and field.default is not MISSING:
                    init_var_defaults[field.name] = field.default
            for attr, value in mapping.items():
                if attr not in init_var_defaults:
                    setattr(data, attr, value)
            post_init = getattr(data, '__post_init__', None)
            if post_init is not None:
                kw = {}
                for name, default in init_var_defaults.items():
                    kw[name] = mapping.get(name, default)
                post_init(**kw)
        else:
            state = SafeConstructor.construct_mapping(self, node, deep=True)
            if hasattr(data, '__attrs_attrs__'):  # issue 394
                data.__init__(**state)
            else:
                data.__dict__.update(state)
        if node.anchor:
            from ruamel.yaml.serializer import templated_id
            from ruamel.yaml.anchor import Anchor

            if not templated_id(node.anchor):
                if not hasattr(data, Anchor.attrib):
                    a = Anchor()
                    setattr(data, Anchor.attrib, a)
                else:
                    a = getattr(data, Anchor.attrib)
                a.value = node.anchor

@dataclass
class Msg:

    id: int
    desc: str
    fields: list[Field]

    def __post_init__(self) -> None:
        idx: int = 0
        for field in self.fields:  # why is this empty??
            field.index = idx
            idx += field.size

@dataclass
class Field:
    id: int
    name: str
    units: str
    size: int
    index: int = -1

y = ruamel.yaml.YAML()
y.Constructor = MyConstructor
y.register_class(Msg)
y.register_class(Field)

msg: Msg = y.load("""\
!Msg
id: 1
desc: status
fields:
- !Field
    id: 1
    name: Temp
    units: degC
    size: 2
""")

assert msg.fields[0].index != -1  # no longer fails :-)
print('ok')

which prints:

ok

It also looks like your code was missing

from __future__ import annotations

at the top, so the Field typing was initialisation was not postponed, and you get a NameError

Travertine answered 15/11, 2023 at 7:0 Comment(1)
Gotcha. This code is still in testing, so hopefully by the time we start looking at putting it into production 0.18.6 will be released. Until then I can use the setstate bodge (this is not a complex codebase), and when it's fixed I'll mark this response as the best answer :)Goforth
N
2

By default, object serializers such as YAML and pickle have no idea what to do with the attribute mapping for a user-defined object other than to assign the mapping directly to the object's attribute dictionary as-is.

This is why you can define a __setstate__ method for your class, so that ruamel.yaml's object constructor knows in this case to call the __init__ method with the mapping unpacked as arguments, which in turn calls __post_init__ for post-initialization:

@dataclass
class Msg:
    id: int
    desc: str
    fields: list[Field]

    def __post_init__(self) -> None:
        idx: int = 0
        for field in self.fields:
            field.index = idx
            idx += field.size

    def __setstate__(self, state):
        self.__init__(**state)

Demo: https://replit.com/@blhsing1/PowerfulThoseJavadocs

Nitroglycerin answered 15/11, 2023 at 2:57 Comment(6)
Yeah, that'll do it. Annoyingly this means that I end up taking O(n^2) time to instantiate each Msg, so I might end up adding some caching here or something. It makes sense, since there's no way for YAML to know when it's at the "last field". Also, it'd be nice if there was more documentation of this kind of thing - I had an inkling that setstate would solve it, but had no idea of the syntax. Thanks for the answer!Goforth
Agreed regarding the lack of documentation for ruamel.yaml. I was originally writing a custom ruamel.yaml constructor as an answer to your question when I noticed in the source code of ruamel.yaml.RoundTripConstructor.construct_yaml_object that it supports pickle's __setstate__ protocol. Regarding your performance concern, I don't see why this solution would make the construction of a Msg object O(n ^ 2) in time complexity, as the time cost should be linear to the number of Field objects the Msg object has.Nitroglycerin
Here's a demo of an alternative solution using a custom ruamel.yaml constructor in case you're interested: replit.com/@blhsing1/MealyCourteousAgentsNitroglycerin
I suppose it depends on the internal implementation - my assumption was that it walks the tree and updates the list every time it hits a new field, which means calling setstate() and iterating for every field, so n^2. But if it's smart enough to construct the list itself and then setstate() with the whole list already constructed, then that would only be n. As to the custom constructor, I saw that I could head that direction but was squicked out by the complete lack of docs :) Lucky that it was just setstate() in this case.Goforth
Yes, it is only sensible for all child elements to be constructed first before __setstate__ is called, and that is exactly what ruamel.yaml's object constructor does. Luckily the source code of ruamel.yaml (and pyyaml that it is based on) is structured in logical and reasonable enough a way that the lack of sufficient docs isn't too big of a deal to me. :-)Nitroglycerin
I recommend strongly against providing a __setstate__ for dataclasses, as that wil interfere with ruamel.yaml calling _post_init__. (which does with appropriate InitVar parameters t if you have such fields). Up until 0.18.5 it "just" didn't initialise non-scalar (i.e. collection) values correctly because of a bug.Travertine

© 2022 - 2024 — McMap. All rights reserved.