What are Hydra advantages vs using a regular configuration file
Asked Answered
G

2

5

I wonder what are the advantages of using Hydra to manage my configuration files, versus loading .yaml configuration file directly (using import yaml)?

Guilford answered 6/10, 2022 at 17:27 Comment(0)
F
16

TL; DR

If you're working on a project, that has many configurable parameters, then indeed using Hydra makes sense. If not, then it'll do more harm than help, as it's an extra requirement to be included with your project, requires other developers to learn how to use it, and instantiating the configuration files sometimes is a headache. For smaller projects using .py, "pure" .yaml, or even .ini files often makes more sense.

Hydra Main Features

Aside from the points mentioned in Jasha's Answer, there are two additional features that I personally use a lot from Hydra.

Object Instantiation

The first feature is the ability to instantiate objects, like classes, and functions by specifying the import path to the object as a key named _target_, alongside the values for the parameters that the object requires. For example, consider the following .yaml configuration file:

# conf/config.yaml
defaults:
  - db:
    - base
    - sqlite
  - /hydra/callbacks:
    - helper_callback
  - override hydra/help: opt_help
  - override hydra/job_logging: custom
  - _self_

# Same as using:
# from dateutil.relativedelta import relativedelta, FR
# relative_date = relativedelta(weeks=3, weekday=FR(1))

relative_date:
  _target_: dateutil.relativedelta.relativedelta
  weeks: 3
  weekday:
    _target_: dateutil.relativedelta.FR
    n: 1

Then you could instantiate relative_date using something like:

from hydra import compose, initialize
from hydra.utils import instantiate


initialize(config_path='./conf')
cfg = compose(config_name="config")

# Same as: relative_date = relativedelta(weeks=3, weekday=FR(1))
relative_date = instantiate(cfg['relative_date'])

Or:

# foo.py
import hydra
from hydra.utils import instantiate


@hydra.main(config_path="./conf", config_name="config", version_base=hydra.__version__)
def main(cfg):
    print(instantiate(cfg['relative_date']))


if __name__ == '__main__':
    main()

And executing:

$ python foo.py
relativedelta(days=+21, weekday=FR(+1))

Note: first option works on interactive python environments, like Jupyter, whereas the second approach won't.

Retrieve Environment Variables

Some projects make use of environment variables. These variables are part of the environment in which a process runs (i.e. your computer). Environment variables can also be found in a project level, inside a file named .env. Hydra enables you to use such variables, like so:

main:
  source: file
  debug: True
  testing: True
  user: ${oc.env:USER}          # <-- Access an environment variable named "USER"
  src_dir: ${oc.env:SRC_DIR}/   # <-- Access an environment variable named "SRC_DIR"

Note: to be fair, this is a feature from OmegaConf, which is the package that Hydra uses under the hood.

Real Project Example

The Tree view below shows an example of a project I've developed, that had a huge number of configurable parameters, that makes use of Hydra:

conf
├── config.yaml
├── optimization.yaml
├── maintenance.yaml
├── sentry_config.yaml
├── alignment_conf
│   ├── extras.yaml
│   └── alignment.yaml
├── constraints
│   ├── air_capacity.yaml
│   ├── delivery.yaml
│   └── handling.yaml
├── db
│   ├── base.yaml
│   ├── hana_dev.yaml
│   ├── hana_prod.yaml
│   └── sqlite.yaml
├── hydra
│   ├── callbacks
│   │   └── helper_callback.yaml
│   ├── help
│   │   └── opt_help.yaml
│   └── job_logging
│       └── custom.yaml
└── solvers
    ├── cbc_cmd.yaml
    ├── choco_cmd.yaml
    ├── cplex.yaml
    ├── glpk_cmd.yaml
    ├── gurobi.yaml
    ├── mosek.yaml
    └── scip.yaml
Fraga answered 20/10, 2022 at 1:59 Comment(1)
Is your real project example open sourced by any chance? I'd love to look at it as a reference.Quarterdeck
A
3

Hydra provides a framework for config composition and instantiation.

The "config composition" part means that the data from yaml files can be combined and modified in a flexible way. You can use directives and "defaults lists" in your yaml files to include yaml files into eachother, and you can use Hydra's command-line grammar to modify how your yaml data are composed when you invoke the app from your terminal. This allows for e.g. changing hyperparameter settings or swapping out different implementations of a class from the command line in a way that is more flexible and fluent than traditional solutions such as python's argparse. I recommend following Hydra's "Your first Hydra app" tutorial to get a feel for config composition.

The "instantiation" part means that you can turn a composed config into instances of your application's classes. The creation of objects that would traditionally be done in a program's "main" routine can instead be represented as yaml and later animated using Hydra's instantiate API. This extra layer of abstraction on top of your "main" routine opens up new possibilities for flexible object creation and composition.

There are several built-in convenience features such as logging support, command-line tab completion that makes it easy to discover how to modify your app's configuration at the command line, and automatic saving of a snapshot of the app's configuration in the logging directory.

Hydra has a plugin framework. There are several "sweeper" plugins that provide support for hyperparameter optimization, as well as "launcher" plugins that provide support for e.g. launching jobs remotely.

The fact that Hydra uses OmegaConf as a backend comes with several benefits:

  • OmegaConf supports variable interpolation, which are like "pointers" in your config object. For example, in a yaml file you could write something like this:
foo: 123
bar: ${foo}

and then later in your python code you could assert cfg.bar == 123.

  • OmegaConf's "custom resolver" feature allows you register python functions that can be invoked inline in your yaml file, essentially allowing users to define a domain-specific language for manipulating configuration data. For example, you could register a python function add_one that adds 1 to a given number, and then use this function in a yaml file as so:
baz: ${add_one: 123}
qux: ${add_one: ${foo}}  # nested interpolations work too

This would result in cfg.baz == 124 and cfg.qux == 124.

Appellant answered 20/10, 2022 at 0:48 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.