Should the Conda (base) environment be kept up to date?

I'm happily using Conda via the miniconda install to manage python environments.

After install, I leave the base environment alone and create new environments for new projects. Then I conda env update these environments as needed. However, I'm not sure this is the right approach.

Should the base environment be conda env updateed before creating new environments?

I think this would keep disk usage lower as, my possibly incorrect understanding is, Conda links packages to the base environment when creating new environments if the package and dependencies exactly match.

Although... that doesn't make much sense as they could easily get out of sync. Maybe it just saves on bandwidth as matching packages can be copied instead of downloaded?

If every project has it's own environment does it matter if the base environment is kept up to date?

Conda links all packages to the pkgs folder, which is shared by all envs and is not associated with base in any special way. Whenever any env installs or upgrades packages they'll go there, and there isn't any explicit effort to source from existing packages - if the dependency solver happens to resolve to a cached package it will use it. Currently, there is no mechanism for maintaining synchronization of packages across envs, so one would have to design a workflow to achieve it.

Potential Workflow

One could, in theory, use Conda's env cloning to maximize package version synchronization. To this end, you could conceptually organize your envs into three categories:

base env: only used for core infrastructure, e.g., conda, jupyter, git, etc.. This you would freely update whenever you wanted new commandline software or need to conda update conda. It should have little to no overlap with other envs.
template env: centralizes common sets of packages, typically grouped by version restrictions. For example, one might have a py27-tmpl, py36-tmpl, and py37-tmpl for different versions of Python that you might require for different projects. Here you would install the greatest common subset of packages you require across projects. The main purpose of a template env would be to make a...
project env: associated with a specific development project, and derived initially as a clone of a template env. Most of the core software in these would come from the template, and then additional software should be installed here. Once you start one for a project, you keep it relatively fixed, in order to maintain development stability.

Such a structure would maximize the reuse of existing package versions. Starting with Conda v4.7, the dependency solver defaults to a first-stage solve with an implicit --freeze-installed|--no-update-deps flag, which attempts to install the requested packages without having to change existing packages. If keeping sychronized with template env is your goal, then you may want to always use the --freeze-installed when installing. One could also use package pinning which explicitly prevent specified packages from upgrading away from the template. However, this could restrict installation of some latest versions for other packages.

Unfortunately, you'd still run into a similar synchronization problem as you intuited: while you could update these template envs before making new clones, that won't update the ones previously derived from them. But for project envs, I think best practice would be not to manipulate them once you start working. If you're concerned with space, there's no substitute for getting your modular projects completed and then archiving and deleting project envs after use. That, and occasionally running conda clean.

Potential Workflow

Recommended topics

Hot tags