fb-hydra: How to get inner configurations to inherit outer configuration fields?
Asked Answered
A

1

5

I am trying to write a hierarchical configuration structure such that config files in the inner directories inherit from the config files in the outer directories. For example, in the following scenario

upper_config
|
|-middle_config
|   |
|   |-lower_config

I would like middle_config to be able to inherit & override the parameters of upper_config, and lower_config to be able to inherit & override the parameters of both middle_config and upper_config.

One solution would be to write a configuration parser such that outer modules are read first, and as inner modules are read they overwrite the fields in the outer modules.

However, I would like to use Hydra (or some other tool, open to suggestions) for all of the added conveniences. I've read the documentation front to back a few times, and though it feels like either config groups or packages directives should be able to handle this, I can't quite piece it together.

I believe this post asks a very similar question, but the answer hasn't enlightened me, and it seems that the person who asked the question decided to implement a version of the config parser I described above.

I am hoping that there is a way for an inner config file's package directive to be changed to point to a parent configuration and somehow inherit its default list that way.

Altercate answered 27/5, 2021 at 3:11 Comment(1)
As an alternative to my answer below (which uses Hydra), this might be achieved using plain OmegaConf as follows: 1) Use Python's os.walk function to get a list of yaml files, 2) Use some heuristic to order these files according to your preference for which should be merged on top of which others, 3) Call OmegaConf.load(filename) on each file to produce a list of OmegaConf objects, and 4) call cfg = OmegaConf.merge(*list_of_config_objects) to merge the collected config objects.Haemophilia
H
9

Suppose we have the following files:

my_app.py
outer/conf1.yaml
outer/middle/conf2.yaml
outer/middle/inner/conf3.yaml

To make things concrete, here are the contents of my_app.py:

import hydra, omegaconf

@hydra.main(config_path="outer", config_name="conf1")
def my_app(cfg) -> None:
    print(omegaconf.OmegaConf.to_yaml(cfg))

my_app()

TLDR

If your yaml files just contain plain data (i.e. no defaults lists or package directives), the most flexible approach to dynamically composing your config at the command line looks like this:

$ python my_app.py +middle@_global_=conf2 +middle/inner@_global_=conf3

This will merge outer/middle/conf2.yaml on top of outer/conf1.yaml, then merge outer/middle/inner/conf3.yaml on top of that. The @_global_ keyword means that the input configs should be merged at the top level instead of being nested according to the names of their containing directories.

Now for the details...

In answering this question, I might use some features from the recent release candidate for Hydra 1.1:

>>> import hydra
>>> hydra.__version__
'1.1.0.rc1'

There are a few approaches we could take to overriding our outer configuration with middle/inner configuration:

  • Use the defaults list to specify a package.
  • Use a package header to specify a package.
  • Use a command-line package override to specify a package (this is the method used in the TLDR section above)

Here are the details for each approach:

Use the defaults list to specify a package.

Suppose we have the following: In outer/conf1.yaml:

defaults:
  - _self_
  - middle@_here_: conf2
a: 1
b: 2

In outer/middle/conf1.yaml:

defaults:
  - _self_
  - inner@_here_: conf3
b: 3
c: 4

In outer/middle/inner/conf3.yaml:

c: 5
d: 6

With these yaml files, running my_app.py gives the following result:

$ python my_app.py
a: 1
b: 3
c: 5
d: 6

As you can see, conf1 is being overridden by conf2, which is in turn being overridden by conf3. So, how does this work? The defaults list is used to specify the order in which each configuration object is composed. In conf1, the @_here_ package keyword is used to specify that the conf2 should be merged info the current config group instead of being included in the middle package. This is documented in Default List package keywords. Also of interest is the @_global_ keyword. Note that one could just-as-well write - middle@foo: conf2 instead of - middle@_here_: conf2 in the defaults list, in which case a "foo" key would appear in the output config with the contents of conf2 nested under it.

Just as in conf1.yaml, conf2.yaml is using the defaults list to specify that conf3 should be merged into conf2 instead of being merged into a package named "inner" (which would have been the default behavior, as is documented here).

What is the - _self_ keyword doing? In a defaults list, this keyword allows for control of the order in which the current config is merged with other input configs specified in the defaults list. For example, in the conf2.yaml defaults list, writing - _self_ before - inner@_here_: conf3 ensures that conf3 will be merged into conf2, and not the other way around. This _self_ keyword is documented here. If - _self_ is not specified in the defaults list, then the order in which the defaults are merged with the current config is:

  • using Hydra 1.0: input configs from the defaults list are merged into the current config
  • using Hydra 1.1: the current config is merged last, overwriting the other configs specified in the defaults list

For reference, see these migration instructions for moving from version 1.0 to 1.1.

Use a package header to specify a package.

Using a package directive at the top of a yaml file can achieve a similar result:

In outer/conf1.yaml:

defaults:
  - _self_
  - middle: conf2
a: 1
b: 2

In outer/middle/conf2.yaml:

# @package _global_
defaults:
  - _self_
  - inner: conf3
b: 3
c: 4

In outer/middle/inner/conf3.yaml

# @package _global_
c: 5
d: 6

The # @package <PACKAGE> directive specifies where the contents of the current input config should be placed.

$ python my_app.py
a: 1
b: 3
c: 5
d: 6

This works much the same way as using an @<PACKAGE> keyword in the defaults list (as detailed in the previous section), and the result at the command-line is identical. One difference between these two approaches is that a package header applies to all contents of the given input config, whereas using an @<PACKAGE> keyword in the defaults list gives more granular control over which input configs should be placed into which packages.

Using the - _self_ keyword in the defaults list is still necessary to ensure that the merge happens in the correct order (see the previous section for notes on _self_).

Hydra's treatment of package headers is different in Hydra 1.0 vs 1.1.

Use a command-line package override to specify a package

The most elegant and flexible way to achieve the desired result is using a command-line package override: Given outer/conf1.yaml as follows:

a: 1
b: 2

And outer/middle/conf2.yaml thus:

b: 3
c: 4

and outer/middle/inner/conf3.yaml:

c: 5
d: 6

We can use Hydra's powerful command-line override syntax to compose the output config:

$ python my_app.py +middle@_global_=conf2 +middle/inner@_global_=conf3
a: 1
b: 3
c: 5
d: 6

Using the _self_ keyword is not necessary with this approach because the +<group>@<package>=<option> has the effect of appending to the defaults list (here is a reference) as opposed to prepending.

Haemophilia answered 8/6, 2021 at 8:34 Comment(2)
Thank you for this very detailed answer, it has helped me confirm and improve a lot of my understanding about Hydra that I've developed since asking this question 2 weeks ago. The solution I came up with follows along the lines of your last suggestion, having directives at the top of the inner yaml files and overriding them through the call to compose. However, after your explanation of how to use the @ package directive in the defaults list along with _self_ to guarantee overriding happens in the right order I think I will restructure my approach a bit.Altercate
Glad that I could help :)Haemophilia

© 2022 - 2024 — McMap. All rights reserved.