I am trying to save with mlflow a sklearn machine-learning model, which is a pipeline containing a custom transformer I have defined, and load it in another project. My custom transformer inherits from BaseEstimator and TransformerMixin.
Let's say I have 2 projects:
- train_project: it has the custom transformers in src.ml.transformers.py
- use_project: it has other things in src, or has no src catalog at all
So in my train_project I do :
mlflow.sklearn.log_model(preprocess_pipe, 'model/preprocess_pipe')
and then when I try to load it into use_project :
preprocess_pipe = mlflow.sklearn.load_model(f'{ref_model_path}/preprocess_pipe')
An error occurs :
[...]
File "/home/quentin/anaconda3/envs/api_env/lib/python3.7/site-packages/mlflow/sklearn.py", line 210, in _load_model_from_local_file
return pickle.load(f)
ModuleNotFoundError: No module named 'train_project'
I tried to use format mlflow.sklearn.SERIALIZATION_FORMAT_CLOUDPICKLE :
mlflow.sklearn.log_model(preprocess_pipe, 'model/preprocess_pipe', serialization_format=mlflow.sklearn.SERIALIZATION_FORMAT_CLOUDPICKLE)
but I get the same error during load.
I saw option code_path into mlflow.pyfunc.log_model but its use and purpose is not clear to me.
I thought mlflow provide a easy way to save model and serialize them so they can be used anywhere, Is that true only if you have native sklearn models (or keras, ...)?
It's seem that this issue is more related to pickle functioning (mlflow use it and pickle needs to have all dependencies installed).
The only solution I found so far is to make my transformer a package, import it in both project. Save version of my transformer library with conda_env argument of log_model, and check if it's same version when I load the model into my use_project. But it's painfull if I have to change my transformer or debug in it...
Is anybody have a better solution? More elegent? Maybe there is some mlflow functionality I would have missed?
other informations :
working on linux (ubuntu)
mlflow=1.5.0
python=3.7.3
I saw in test of mlflow.sklearn api that they do a test with custom transformer, but they load it into the same file so it seems not resolve my issue but maybe it can helps other poeple :
https://github.com/mlflow/mlflow/blob/master/tests/sklearn/test_sklearn_model_export.py