Does it make sense to use Conda + Poetry?
Asked Answered
M

4

169

Does it make sense to use Conda + Poetry for a Machine Learning project? Allow me to share my (novice) understanding and please correct or enlighten me:

As far as I understand, Conda and Poetry have different purposes but are largely redundant:

  • Conda is primarily a environment manager (in fact not necessarily Python), but it can also manage packages and dependencies.
  • Poetry is primarily a Python package manager (say, an upgrade of pip), but it can also create and manage Python environments (say, an upgrade of Pyenv).

My idea is to use both and compartmentalize their roles: let Conda be the environment manager and Poetry the package manager. My reasoning is that (it sounds like) Conda is best for managing environments and can be used for compiling and installing non-python packages, especially CUDA drivers (for GPU capability), while Poetry is more powerful than Conda as a Python package manager.

I've managed to make this work fairly easily by using Poetry within a Conda environment. The trick is to not use Poetry to manage the Python environment: I'm not using commands like poetry shell or poetry run, only poetry init, poetry install etc (after activating the Conda environment).

For full disclosure, my environment.yml file (for Conda) looks like this:

name: N

channels:
  - defaults
  - conda-forge

dependencies:
  - python=3.9
  - cudatoolkit
  - cudnn

and my poetry.toml file looks like that:

[tool.poetry]
name = "N"
authors = ["B"]

[tool.poetry.dependencies]
python = "3.9"
torch = "^1.10.1"

[build-system]
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"

To be honest, one of the reasons I proceeded this way is that I was struggling to install CUDA (for GPU support) without Conda.

Does this project design look reasonable to you?

Moa answered 25/1, 2022 at 15:9 Comment(8)
From your description alone, it sounds overly complicated. Is there anything that you need from poetry that you feel like conda and pip are not able to provide for youTerrill
Seems a bit opinion-prone as a question (maybe better for reddit?), but generally appears fine. Hopefully some heavy Poetry users can weigh in, but on the Conda side I don't seem any red flags.Brothers
@Terrill You might be right. In my situation I just think of Poetry as an upgrade of pip: it's more powerful, makes it easier to keep track of dependencies and save a configuration. Conda can do that too, but not as well as Poetry (maybe). But yeah, the downside is that I have to juggle Conda + Poetry. Although I can write a script to automate that.Moa
@Moa I've been using a very similar Conda + Poetry setup for the last year, and it's been working fine.Aq
I'm in pretty much the exact same boat. Prefer poetry for package management, but installing CUDA on an HPC cluster with no sudo access is not good for my health.Extract
for your second bullet point, i'd say poetry is an upgrade of pipenv, not pyenv. for example, it does dependency resolution (figuring out the latest versions of all dependencies that are compatible with each other).Ghent
I would check if using mamba alone is the better solution for your use case. Some of conda's weaknesses as a package manager are solved with mamba, especially package resolution speed. That said, I've succesfully used conda + poetry for a major ML project.Resistant
I'm struggling with GDAL installation using Poetry, so I'm considering the same approach. But I do get a little confused regarding when to use conda install, pip install, poetry add, etc. Also uncertain about using poetry run, poetry shell, or none of these commands in favor of conda commands. A full-blown article about this topic would be nice to have, particularly because each env and/or package manager has their functionalities. I like Poetry with Pyenv more, but GDAL forced me to try conda, though.Elicia
A
216

2024-04-05 update:

It looks like my tips proved to be useful to many people, but they are not needed anymore. Just use Pixi. It's still alpha, but it works great, and provides the features of the Conda + Poetry setup in a simpler and more unified way. In particular, Pixi supports:

  • installing packages both from Conda channels and from PyPi,
  • lockfiles,
  • creating multiple features and environments (prod, dev, etc.),
  • very efficient package version resolution, not just faster than Conda (which is very slow), but in my experience also faster than Mamba, Poetry and pip.

Making a Pixi env look like a Conda env

One non-obvious tip about Pixi is that you can easily make your project's Pixi environment visible as a Conda environment, which may be useful e.g. in VS Code, which allows choosing Python interpreters and Jupyter kernels from detected Conda environments. All you need to do is something like:

ln -s /path/to/my/project/.pixi/envs/default /path/to/conda/base/envs/conda-name-of-my-env

The first path is the path to your Pixi environment, which resides in your project directory, under .pixi/envs, and the second path needs to be within one of Conda's environment directories, which can be found with conda config --show envs_dirs.

Original answer:

I have experience with a Conda + Poetry setup, and it's been working fine. The great majority of my dependencies are specified in pyproject.toml, but when there's something that's unavailable in PyPI, or installing it with Conda is easier, I add it to environment.yml. Moreover, Conda is used as a virtual environment manager, which works well with Poetry: there is no need to use poetry run or poetry shell, it is enough to activate the right Conda environment.

Tips for creating a reproducible environment

  1. Add Poetry, possibly with a version number (if needed), as a dependency in environment.yml, so that you get Poetry installed when you run conda create, along with Python and other non-PyPI dependencies.
  2. Add conda-lock, which gives you lock files for Conda dependencies, just like you have poetry.lock for Poetry dependencies.
  3. Consider using mamba which is generally compatible with conda, but is better at resolving conflicts, and is also much faster. An additional benefit is that all users of your setup will use the same package resolver, independent from the locally-installed version of Conda.
  4. By default, use Poetry for adding Python dependencies. Install packages via Conda if there's a reason to do so (e.g. in order to get a CUDA-enabled version). In such a case, it is best to specify the package's exact version in environment.yml, and after it's installed, to add an entry with the same version specification to Poetry's pyproject.toml (without ^ or ~ before the version number). This will let Poetry know that the package is there and should not be upgraded.
  5. If you use a different channels that provide the same packages, it might be not obvious which channel a particular package will be downloaded from. One solution is to specify the channel for the package using the :: notation (see the pytorch entry below), and another solution is to enable strict channel priority. Unfortunately, in Conda 4.x there is no way to enable this option through environment.yml.
  6. Note that Python adds user site-packages to sys.path, which may cause lack of reproducibility if the user has installed Python packages outside Conda environments. One possible solution is to make sure that the PYTHONNOUSERSITE environment variable is set to True (or to any other non-empty value).

Example

environment.yml:

name: my_project_env
channels:
  - pytorch
  - conda-forge
  # We want to have a reproducible setup, so we don't want default channels,
  # which may be different for different users. All required channels should
  # be listed explicitly here.
  - nodefaults
dependencies:
  - python=3.10.*  # or don't specify the version and use the latest stable Python
  - mamba
  - pip  # pip must be mentioned explicitly, or conda-lock will fail
  - poetry=1.*  # or 1.1.*, or no version at all -- as you want
  - tensorflow=2.8.0
  - pytorch::pytorch=1.11.0
  - pytorch::torchaudio=0.11.0
  - pytorch::torchvision=0.12.0

# Non-standard section listing target platforms for conda-lock:
platforms:
  - linux-64

virtual-packages.yml (may be used e.g. when we want conda-lock to generate CUDA-enabled lock files even on platforms without CUDA):

subdirs:
  linux-64:
    packages:
      __cuda: 11.5

First-time setup

You can avoid playing with the bootstrap env and simplify the example below if you have conda-lock, mamba and poetry already installed outside your target environment.

# Create a bootstrap env
conda create -p /tmp/bootstrap -c conda-forge mamba conda-lock poetry='1.*'
conda activate /tmp/bootstrap

# Create Conda lock file(s) from environment.yml
conda-lock -k explicit --conda mamba
# Set up Poetry
poetry init --python=~3.10  # version spec should match the one from environment.yml
# Fix package versions installed by Conda to prevent upgrades
poetry add --lock tensorflow=2.8.0 torch=1.11.0 torchaudio=0.11.0 torchvision=0.12.0
# Add conda-lock (and other packages, as needed) to pyproject.toml and poetry.lock
poetry add --lock conda-lock

# Remove the bootstrap env
conda deactivate
rm -rf /tmp/bootstrap

# Add Conda spec and lock files
git add environment.yml virtual-packages.yml conda-linux-64.lock
# Add Poetry spec and lock files
git add pyproject.toml poetry.lock
git commit

Usage

The above setup may seem complex, but it can be used in a fairly simple way.

Creating the environment

conda create --name my_project_env --file conda-linux-64.lock
conda activate my_project_env
poetry install

Activating the environment

conda activate my_project_env

Updating the environment

# Re-generate Conda lock file(s) based on environment.yml
conda-lock -k explicit --conda mamba
# Update Conda packages based on re-generated lock file
mamba update --file conda-linux-64.lock
# Update Poetry packages and re-generate poetry.lock
poetry update
Aq answered 14/2, 2022 at 10:4 Comment(18)
Is poetry installing into the conda environment, or its own virtual environment? I see you are not altering poetry config, e.g. setting poetry config virtualenvs.create false --local, so I'm presuming poetry will create its own venv? If not, what is stopping poetry making a venv? And would this behaviour be the same if poetry were installed globally (rather than in the conda env)?Borlase
@JamesOwers Poetry is installing into the Conda environment. Poetry detects when a Conda env is activated, and doesn't create a venv then. That would probably work with Poetry installed globally, too, but I think it's better to specify Poetry in environment.yml and install it within the Conda env, so that all project dependencies are listed explicitly and their versions can be tracked.Aq
Thank you so much for that link. Just to confirm, when writing in your environment.yml poetry=1.*, when you run your install via the conda.lock file, it's equivalent to running conda install poetry -c conda-forge?Borlase
Ah...I've just had a thought - installing in this manner, is poetry isolated from the project dependencies, or will poetry's python dependencies interact with the project's python dependencies a la this warning about installing with pipBorlase
@JamesOwers It is an equivalent to conda install poetry='1.*' -c conda-forge. As for the possible dependency conflict, well, is it a problem that happens in practice? I haven't come across such issues. The disadvantage of having poetry installed outside the project is that you don't have a way to enforce a specific version, which means that different project contributors might use different versions of Poetry with slightly different resolvers, ways of parsing/updating pyproject.toml/poetry.conf, different bugs, etc. And that is something I have encountered in practice.Aq
This is very true - I haven’t had issues with the former, but have with the latter.Borlase
@michau did you experience this: I added a dependency in my environments.yml. Then I followed the step in Updating the environment. However when I check in the conda-osx-64.lock file for example, I see that my dependency was commented out and therefore poetry update doesn't do anything. How come?Truckle
@michau any tips on doing this in a Dockerfile? I'm having issues due to the fact that you need the conda environment activated to get the poetry dependencies installed... :(Outnumber
@KevinPauli I think this should helpAq
Yes indeed SHELL ["conda", "run", "-n", "myenv", "/bin/bash", "-c"] did the trick in the Dockerfile! Thx!!!Outnumber
I named the package: some_package_name.a and as a result Poetry generated a pyproject.toml with packages=[{include=some_package_name}] which doesn't exist. I had to remove it via: sed -i '/packages =/ s/\[.*\]/\[\]/g' pyproject.toml.Birdt
Also, to keep the conda-lock file consistent during setup of the environments I had to do: poetry add --lock conda-lock=$(python -c 'import conda_lock; print(conda_lock.__version__)')Birdt
@michau Is there any way to have just a single lock file? AFAIU your solution produces one conda lock file per platform and one poetry.lock file for the pip dependencies, right? Can I have conda-lock incorporate the pip dependencies in the platform lock files?Notation
@tahesse Conda lock files might be unnecessary if you specify exact package versions in environment.yml, or if you're fine with their versions being untracked. Otherwise, you need both Conda and Poetry lock files.Aq
FWIW, Tensorflow install instructions now suggest using pip (e.g., poetry) to install Tensorflow. I tested this with a conda + poetry setup as described in this answer (except using poetry instead of conda to install tensorflow) and it worked great w/ Nvidia GPU acceleration! I only needed conda to install cudatoolkit, otherwise I could get rid of conda altogether.Finespun
conda lock has updated, replacing conda-lock -k explicit --conda mamba with following conda-lock -f environment.yml -p linux-64 -p <if any> --conda mambaWallow
As @JamesOwers pointed out, this is generally a very bad idea to install Poetry in the same virtual environment as your project since it will eventually cause dependencies issues. This information is displayed in a big red warning right at the beginning of the Poetry Installation guide.Compelling
I am finding that for certain dependencies that were installed with conda, such as numpy and pillow, even though I specify the same version in pyproject.yaml, poetry does NOT detect that it was already installed and "Updates" it by pulling down the exact same version. And thus sadly this conda environment is now inconsistent and no longer able to be compressed with conda-pack. At my wit's end!Outnumber
R
9

To anyone using @michau's answer but having issues including poetry in the environment.yml. Currently, poetry versions 1.2 or greater aren't supported by conda-forge. You can still include poetry v1.2 in the .yml with the below as an alternative:

dependencies:
  - python=3.9.*
  - mamba
  - pip 
  - pip:
    - "poetry>=1.2"
Redoubtable answered 3/11, 2022 at 1:18 Comment(1)
conda create -n <your_name> -c conda-forge python poetry=1.3 does work for me.Spoils
A
0

You should avoid using multiple package or environment managers together, as it can cause incompatibilities or errors.

I think this is incorrect, however:

My idea is to use both and compartmentalize their roles--let Conda be the environment manager and Poetry the package manager.

Doing this is basically pointless; you're just installing two different computer programs to do the same thing.

The real difference between the two is that Poetry is a Python-only package manager, while Conda is mostly-Python (but has some support for other languages) with fewer packages. Nearly all the big packages will be available from Conda, but if you need to use a niche package, you might not find it there.

There's also a lot less quality control on PythonPI than on Conda channels, so Poetry may install incorrect/conflicting versions of packages, although it does a much better job than pip.

Annunciate answered 7/11, 2023 at 18:11 Comment(1)
OP mentioned he needed Conda to install CUDA drivers. Presumably, OP otherwise prefers Poetry.Evalynevan
B
0

I don't know the best answer to this, but when I worked at a company with around 1,000 employees, we used conda + poetry and did not have big issues.

One thing I had is when I installed conda after working on a project with virtual environment set up with python built-in venv module and poetry to manage the packages, conda will override the path. (poetry will install packages in a new path instead of in the environment for the project, even when you have it activated)

Bandbox answered 27/3 at 19:34 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.