Suppose we have the following files:
my_app.py
outer/conf1.yaml
outer/middle/conf2.yaml
outer/middle/inner/conf3.yaml
To make things concrete, here are the contents of my_app.py
:
import hydra, omegaconf
@hydra.main(config_path="outer", config_name="conf1")
def my_app(cfg) -> None:
print(omegaconf.OmegaConf.to_yaml(cfg))
my_app()
TLDR
If your yaml
files just contain plain data (i.e. no defaults lists or package directives), the most flexible approach to dynamically composing your config at the command line looks like this:
$ python my_app.py +middle@_global_=conf2 +middle/inner@_global_=conf3
This will merge outer/middle/conf2.yaml
on top of outer/conf1.yaml
, then merge outer/middle/inner/conf3.yaml
on top of that. The @_global_
keyword means that the input configs should be merged at the top level instead of being nested according to the names of their containing directories.
Now for the details...
In answering this question, I might use some features from the recent release candidate for Hydra 1.1:
>>> import hydra
>>> hydra.__version__
'1.1.0.rc1'
There are a few approaches we could take to overriding our outer configuration with middle/inner configuration:
- Use the defaults list to specify a package.
- Use a package header to specify a package.
- Use a command-line package override to specify a package (this is the method used in the TLDR section above)
Here are the details for each approach:
Use the defaults list to specify a package.
Suppose we have the following:
In outer/conf1.yaml
:
defaults:
- _self_
- middle@_here_: conf2
a: 1
b: 2
In outer/middle/conf1.yaml
:
defaults:
- _self_
- inner@_here_: conf3
b: 3
c: 4
In outer/middle/inner/conf3.yaml
:
c: 5
d: 6
With these yaml files, running my_app.py
gives the following result:
$ python my_app.py
a: 1
b: 3
c: 5
d: 6
As you can see, conf1
is being overridden by conf2
, which is in turn being
overridden by conf3
. So, how does this work? The defaults list is used to specify the order in which each configuration object
is composed. In conf1
, the @_here_
package keyword is used to specify that
the conf2
should be merged info the current config group instead of being
included in the middle
package. This is documented in Default List package
keywords.
Also of interest is the @_global_
keyword. Note that one could just-as-well
write - middle@foo: conf2
instead of - middle@_here_: conf2
in the defaults list, in which case
a "foo"
key would appear in the output config with the contents of conf2
nested under it.
Just as in conf1.yaml
, conf2.yaml
is using the defaults list to specify
that conf3
should be merged into conf2
instead of being merged into a
package named "inner"
(which would have been the default behavior, as is
documented
here).
What is the - _self_
keyword doing?
In a defaults list, this keyword allows for control of the order in which the
current config is merged with other input configs specified in the defaults
list. For example, in the conf2.yaml
defaults list, writing - _self_
before - inner@_here_: conf3
ensures that conf3
will be merged into
conf2
, and not the other way around. This _self_
keyword is documented
here. If - _self_
is not specified in the
defaults list, then the order in which the defaults are merged with the current
config is:
- using Hydra 1.0: input configs from the
defaults list are merged into the current config
- using Hydra 1.1: the current config is merged last, overwriting the other configs specified in the defaults list
For reference, see these migration
instructions
for moving from version 1.0 to 1.1.
Use a package header to specify a package.
Using a package
directive
at the top of a yaml file can achieve a similar result:
In outer/conf1.yaml
:
defaults:
- _self_
- middle: conf2
a: 1
b: 2
In outer/middle/conf2.yaml
:
# @package _global_
defaults:
- _self_
- inner: conf3
b: 3
c: 4
In outer/middle/inner/conf3.yaml
# @package _global_
c: 5
d: 6
The # @package <PACKAGE>
directive specifies where the contents of the
current input config should be placed.
$ python my_app.py
a: 1
b: 3
c: 5
d: 6
This works much the same way as using an @<PACKAGE>
keyword in the defaults
list (as detailed in the previous section), and the result at the command-line is
identical. One difference between these two approaches is that a package header
applies to all contents of the given input config, whereas using an
@<PACKAGE>
keyword in the defaults list gives more granular
control over which input configs should be placed into which packages.
Using the - _self_
keyword in the defaults list is still necessary to ensure
that the merge happens in the correct order (see the previous section for notes
on _self_
).
Hydra's treatment of package headers is different in Hydra 1.0 vs
1.1.
Use a command-line package override to specify a package
The most elegant and flexible way to achieve the desired result is using a command-line package override:
Given outer/conf1.yaml
as follows:
a: 1
b: 2
And outer/middle/conf2.yaml
thus:
b: 3
c: 4
and outer/middle/inner/conf3.yaml
:
c: 5
d: 6
We can use Hydra's powerful command-line override syntax
to compose the output config:
$ python my_app.py +middle@_global_=conf2 +middle/inner@_global_=conf3
a: 1
b: 3
c: 5
d: 6
Using the _self_
keyword is not necessary with this approach because the
+<group>@<package>=<option>
has the effect of appending to the defaults
list (here is a
reference) as opposed to prepending.
os.walk
function to get a list of yaml files, 2) Use some heuristic to order these files according to your preference for which should be merged on top of which others, 3) CallOmegaConf.load(filename)
on each file to produce a list of OmegaConf objects, and 4) callcfg = OmegaConf.merge(*list_of_config_objects)
to merge the collected config objects. – Haemophilia