I've been reading the PyYAML source code to try to understand how to define a proper constructor function that I can add with add_constructor
. I have a pretty good understanding of how that code works now, but I still don't understand why the default YAML constructors in the SafeConstructor
are generators. For example, the method construct_yaml_map
of SafeConstructor
:
def construct_yaml_map(self, node):
data = {}
yield data
value = self.construct_mapping(node)
data.update(value)
I understand how the generator is used in BaseConstructor.construct_object
as follows to stub out an object and only populate it with data from the node if deep=False
is passed to construct_mapping
:
if isinstance(data, types.GeneratorType):
generator = data
data = generator.next()
if self.deep_construct:
for dummy in generator:
pass
else:
self.state_generators.append(generator)
And I understand how the data is generated in BaseConstructor.construct_document
in the case where deep=False
for construct_mapping
.
def construct_document(self, node):
data = self.construct_object(node)
while self.state_generators:
state_generators = self.state_generators
self.state_generators = []
for generator in state_generators:
for dummy in generator:
pass
What I don't understand is the benefit of stubbing out the data objects and working down through the objects by iterating over the generators in construct_document
. Does this have to be done to support something in the YAML spec, or does it provide a performance benefit?
This answer on another question was somewhat helpful, but I don't understand why that answer does this:
def foo_constructor(loader, node):
instance = Foo.__new__(Foo)
yield instance
state = loader.construct_mapping(node, deep=True)
instance.__init__(**state)
instead of this:
def foo_constructor(loader, node):
state = loader.construct_mapping(node, deep=True)
return Foo(**state)
I've tested that the latter form works for the examples posted on that other answer, but perhaps I am missing some edge case.
I am using version 3.10 of PyYAML, but it looks like the code in question is the same in the latest version (3.12) of PyYAML.
foo_constructor
from your answer as described in my question, I seem to see the correct output? That answer has self-references in its examples. Can you include in your answer an example YAML document that would have problems if I editedfoo_constructor
to not be a generator as shown in my question? – Nicolas