How to do file over-rides in hydra?
Asked Answered
H

3

6

I have a main config file, let's say config.yaml:

num_layers: 4
embedding_size: 512
learning_rate: 0.2
max_steps: 200000

I'd like to be able to override this, on the command-line, with another file, like say big_model.yaml, which I'd use conceptually like:

python my_script.py --override big_model.yaml

and big_model.yaml might look like:

num_layers: 8
embedding_size: 1024

I'd like to be able to override with an arbitrary number of such files, each one taking priority over the last. Let's say I also have fast_learn.yaml

learning_rate: 2.0

And so I'd then want to conceptually do something like:

python my_script.py --override big_model.yaml --override fast_learn.yaml

What is the easiest/most standard way to do this in hydra? (or potentially in omegaconf perhaps?)

(note that I'd like these override files to ideally just be standard yaml files, that override the earlier yaml files, ideally; though if I have to write using override DSL instead, I can do that, if that's the easiest/best/most standard way)

Heti answered 29/10, 2020 at 15:9 Comment(1)
You may be interested in my answer to this question.Allsopp
W
3

Refer to the basic tutorial and read about config groups.

You can create arbitrary config groups, and select one option from each (As of Hydra 1.0, config groups options are mutually exclusive), you will need two config groups here: one can be model, with a normal, small and big model, and another can trainer, with maybe normal and fast options.

Config groups can also override things in other config groups. You can also always append to the defaults list from the command line - so you can also add additional config groups that are only used in the command line. an example for that can an 'experiment' config group. You can use it as:

$ python train.py +experiment=exp1

In such config groups that are overriding things across the entire config you should use the global package (read more about packages in the docs).

# @package _global_
num_layers: 8
embedding_size: 1024
learning_rate: 2.0
Wing answered 29/10, 2020 at 16:19 Comment(7)
Thanks! Will have a try :)Heti
I'd like these files to be fairly arbitrary files on the whole. Like, imagine I have some default configuration for training a model. Then I have a bunch of experiments where I change just certain values. I don't want to create a config group for each experiment. Nor do I want to copy and paste the entire giant config. I'd like to just be able to point to experiment-specific yaml files on the commandline, which will override the main config file.Heti
maybe I should use omegaconf directly for this?Heti
I suggested that you create ONE config group for all experiments. files in it can overrides specific values in the base config.Wing
Ah. Makes sense. For now, I've ended up using argparse to read in a list of default configs, additional configs, and manual overrides, then use omegaconf to load the config files, load the manual overrides, and merge these. This works ok. It also avoids issues with working directory changing, and hydra.yaml being saved into hydra. I'm using hydra now uniquely for instantiation.Heti
@OmryYadan Is the experiment config group declared through the directory structure? If so, why is there a + sign, isn't that only for introducing new arguments? If it is not declared and is only added over the command line? how can hydra know that exp1 is the name of a config group and not just a string? Is there maybe a complete example of this approach I could take a look at?Marchellemarcher
+ is needed if experiment is not mentioned in the defaults list (which is typical for this use case). hydra.cc/docs/patterns/configuring_experimentsWing
M
1

It sounds like package override might be the a good solution for you.

The documentation can be found here: https://hydra.cc/docs/next/advanced/overriding_packages

an example application can be found here: https://github.com/facebookresearch/hydra/tree/master/examples/advanced/package_overrides

using the example application as an example, you can achieve the override by doing something like

$ python simple.py db=postgresql db.pass=helloworld
db:
  driver: postgresql
  user: postgre_user
  pass: helloworld
  timeout: 10

Mesquite answered 29/10, 2020 at 16:1 Comment(7)
Ok. I've seen the concept of 'config groups', where one can choose eg a specific dataset yaml file, or a specific database yaml file. Is there a way of mixing and matching arbitrary yaml files, without eg creating a folder/group for each of those files?Heti
You can use this: hydra.cc/docs/tutorials/basic/your_first_app/… But this can only be specified in a defaults list (in a file). You can also override the config name via the command line with --config-name, which will allow you to select different default lists.Wing
@OmryYadan ok, can the values specified in the default lists override values in earlier config files? or can those only form new child nodes in the config hierarchy, and thus wont override earlier values?Heti
Read about the defaults list in the docs. the defaults list is not overriding config values.Wing
@HughPerkins, re-reading your question and my answer - I think this is best addressed in a chat. The content of elements composed via the defaults list can definitely be used to override config values.Wing
@HughPerkins Did you ever resolve whether your original question could be addressed using package overriding? I believe what I'm asking here: #67715671 is very similar to your problem, and it seems to me that package overriding is the only possibility, I think groups don't really address this.Iceberg
I switched to using omegaconf, which hydra runs on top of, directly instead, in the end.Heti
P
0

Omry's (the library author) answer is correct and very concise. This answer expands on his by directly answering your scenario.

Direct answer

First, we have the following file structure:

my_app.py
conf/
  config.yaml
  variants/
    size/
      big_model.yaml
    train/
      fast_learn.yaml

my_app.py:

from omegaconf import DictConfig, OmegaConf
import hydra

@hydra.main(version_base=None, config_path="conf", config_name="config")
def my_app(cfg: DictConfig) -> None:
    print(OmegaConf.to_yaml(cfg))

if __name__ == "__main__":
    my_app()

conf/config.yaml:

num_layers: 4
embedding_size: 512
learning_rate: 0.2
max_steps: 200000

conf/variants/size/big_model.yaml:

# @package _global_
num_layers: 8
embedding_size: 1024

conf/variants/train/fast_learn.yaml:

# @package _global_
learning_rate: 2.0

You need to now run the following (previously explained by Omry):

python my_app.py +variants/size=big_model +variants/train=fast_learn

The following is the output:

num_layers: 8
embedding_size: 1024
learning_rate: 2.0
max_steps: 200000

The difference with Omry's answer is that we are using nested yaml files instead of having all the changes in the same experiment/exp1.yaml.


Concise writing

Instead, you could be more concise:

conf/
  config.yaml
  big_model/
    c.yaml
  fast_learn/
    c.yaml

Then, you would only need to write:

python my_app.py +big_model=c +fast_learn=c

Although it is more concise, I think the previous one should be preferred as it is more explicit.


Order of execution

The order of execution is first to last yaml files specified.

Let's say we modify fast_learn.yaml to the following:

# @package _global_
learning_rate: 2.0
num_layers: 20000

num_layers is used in both fast_learn.yaml and big_model.yaml. Thus, one will override the other. If you run:

python my_app.py +variants/size=big_model +variants/train=fast_learn

The output will be:

num_layers: 20000
embedding_size: 1024
learning_rate: 2.0
max_steps: 200000

If you run:

python my_app.py +variants/train=fast_learn  +variants/size=big_model

You will get:

num_layers: 8
embedding_size: 1024
learning_rate: 2.0
max_steps: 200000
Preciosity answered 21/9, 2023 at 3:29 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.