Just ran across this, and the accepted answer by @Tomas was very helpful. I added a comment above but, for some additional context, I wanted to create a more thorough response if any other Enterprise Databricks R users run across this post trying to use the MLflow package for R on Databricks.
The Databricks MLflow quickstart guide will tell you that you need to run the following:
library(mlflow)
install_mlflow()
However, for Enterprise Databricks users, the install_mlflow()
function will fail if your cluster doesn't have outside connectivity privileges (as most probably don't) and can't connect to the Anaconda repo to download the necessary packages. You'll likely get an error like this:
CondaHTTPError: HTTP 000 CONNECTION FAILED for url https://conda.anaconda.org/conda-forge/linux-64/current_repodata.js
The good news is that MLflow should already be installed on your Databricks runtime. So you can reference that install instead, and then as @Tomas mentioned, use it to set your R environment variables for MLFLOW_BIN
and MLFLOW_PYTHON_BIN
. From there, the R MLflow API works as specified (in my experience, but ymmv).
The only catch from the above solution is that when you use the system()
function in R, you need to set intern=TRUE
in order capture the output of the command. The default behavior of the system()
function is intern=FALSE
. Thus if you do not explicitly set intern=TRUE
, then the exit code 0 will be returned from your system()
call (or perhaps another exit code upon an error) and Sys.setenv()
will set the environment variable to 0!
### intern=True missing ###
Sys.setenv(MLFLOW_BIN=system("which mlflow"))
Sys.setenv(MLFLOW_PYTHON_BIN=system("which python"))
Example output (you can see the the environment variables did not get set correctly):
s <- Sys.getenv()
s[grep("MLFLOW", names(s))]
MLFLOW_BIN 0
MLFLOW_CONDA_HOME /databricks/conda
MLFLOW_PYTHON_BIN 0
MLFLOW_PYTHON_EXECUTABLE
/databricks/python/bin/python
MLFLOW_TRACKING_URI databricks
However, when intern=TRUE
, you'll get the correct environment variables:
### intern=True set ###
Sys.setenv(MLFLOW_BIN=system("which mlflow", intern=TRUE))
Sys.setenv(MLFLOW_PYTHON_BIN=system("which python", intern=TRUE))
Example output:
s <- Sys.getenv()
s[grep("MLFLOW", names(s))]
MLFLOW_BIN /databricks/python3/bin/mlflow
MLFLOW_CONDA_HOME /databricks/conda
MLFLOW_PYTHON_BIN /databricks/python3/bin/python
MLFLOW_PYTHON_EXECUTABLE
/databricks/python/bin/python
MLFLOW_TRACKING_URI databricks
Note: This was using Databricks runtime 9.1 LTS ML. This may or may not work on other Databricks runtime configurations.