mlflow R installation MLFLOW_PYTHON_BIN
Asked Answered
D

2

6

I am trying to install mlflow in R and im getting this error message saying

mlflow::install_mlflow() Error in mlflow_conda_bin() : Unable to find conda binary. Is Anaconda installed? If you are not using conda, you can set the environment variable MLFLOW_PYTHON_BIN to the path of yourpython executable.

I have tried the following

export MLFLOW_PYTHON_BIN="/usr/bin/python" 
source ~/.bashrc
echo $MLFLOW_PYTHON_BIN  -> this prints the /usr/bin/python.

or in R,

sys.setenv(MLFLOW_PYTHON_BIN="/usr/bin/python")
sys.getenv() -> prints MLFLOW_PYTHON_BIN is set to /usr/bin/python.

however, it still does not work

I do not want to use conda environment.

how to I get past this error?

Duct answered 11/3, 2020 at 17:17 Comment(1)
Did you solve it?Shin
P
4

The install_mlflow command only works with conda right now, sorry about the confusing message. You can either:

  • install conda - this is the recommended way of installing and using mlflow

or

  • install mlflow python package yourself via pip

To install mlflow yourself, pip install correct (matching the the R package) python version of mlflow and set the MLFLOW_PYTHON_BIN environment variable as well as MLFLOW_BIN evn variable: e.g.

library(mlflow)
system(paste("pip install -U mlflow==", mlflow:::mlflow_version(), sep=""))
Sys.setenv(MLFLOW_BIN=system("which mlflow"))
Sys.setenv(MLFLOW_PYTHON_BIN=system("which python"))
Phytogeography answered 18/3, 2020 at 18:3 Comment(2)
I tried the second method but if keeps appearing the same message... Any solution? Thank youShutz
You need to set intern=TRUE in the system call in order to properly set tne environment variable: Sys.setenv(MLFLOW_BIN=system("which mlflow", intern=TRUE)) Sys.setenv(MLFLOW_PYTHON_BIN=system("which python", intern=TRUE))Vespucci
V
2

Just ran across this, and the accepted answer by @Tomas was very helpful. I added a comment above but, for some additional context, I wanted to create a more thorough response if any other Enterprise Databricks R users run across this post trying to use the MLflow package for R on Databricks.

The Databricks MLflow quickstart guide will tell you that you need to run the following:

library(mlflow)
install_mlflow()

However, for Enterprise Databricks users, the install_mlflow() function will fail if your cluster doesn't have outside connectivity privileges (as most probably don't) and can't connect to the Anaconda repo to download the necessary packages. You'll likely get an error like this:

CondaHTTPError: HTTP 000 CONNECTION FAILED for url https://conda.anaconda.org/conda-forge/linux-64/current_repodata.js

The good news is that MLflow should already be installed on your Databricks runtime. So you can reference that install instead, and then as @Tomas mentioned, use it to set your R environment variables for MLFLOW_BIN and MLFLOW_PYTHON_BIN. From there, the R MLflow API works as specified (in my experience, but ymmv).

The only catch from the above solution is that when you use the system()function in R, you need to set intern=TRUE in order capture the output of the command. The default behavior of the system() function is intern=FALSE. Thus if you do not explicitly set intern=TRUE, then the exit code 0 will be returned from your system() call (or perhaps another exit code upon an error) and Sys.setenv() will set the environment variable to 0!

### intern=True missing ###
Sys.setenv(MLFLOW_BIN=system("which mlflow"))
Sys.setenv(MLFLOW_PYTHON_BIN=system("which python"))

Example output (you can see the the environment variables did not get set correctly):

s <- Sys.getenv()  
s[grep("MLFLOW", names(s))]
  
MLFLOW_BIN              0
MLFLOW_CONDA_HOME       /databricks/conda
MLFLOW_PYTHON_BIN       0
MLFLOW_PYTHON_EXECUTABLE
                        /databricks/python/bin/python
MLFLOW_TRACKING_URI     databricks

However, when intern=TRUE, you'll get the correct environment variables:

### intern=True set ###
Sys.setenv(MLFLOW_BIN=system("which mlflow", intern=TRUE))
Sys.setenv(MLFLOW_PYTHON_BIN=system("which python", intern=TRUE))

Example output:

s <- Sys.getenv()
s[grep("MLFLOW", names(s))]

MLFLOW_BIN              /databricks/python3/bin/mlflow
MLFLOW_CONDA_HOME       /databricks/conda
MLFLOW_PYTHON_BIN       /databricks/python3/bin/python
MLFLOW_PYTHON_EXECUTABLE
                        /databricks/python/bin/python
MLFLOW_TRACKING_URI     databricks

Note: This was using Databricks runtime 9.1 LTS ML. This may or may not work on other Databricks runtime configurations.

Vespucci answered 11/10, 2021 at 20:29 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.