Import sklearn2pmml generated .pmml back into ScikitLearn or Python
Asked Answered
S

4

7

Apologies if this may have been answered somewhere but I've been looking for about an hour and can't find a good answer.

I have a simple Logistic Regression model trained in Scikit-Learn that I'm exporting to a .pmml file.

  from sklearn2pmml import PMMLPipeline, sklearn2pmml
  my_pipeline = PMMLPipeline(
  ( classifier", LogisticRegression() )
      )
  my_pipeline.fit(blah blah)
  sklearn2pmml(my_pipeline, "filename.pmml")

etc....

So what I'm wondering is if/how I can import this file back into Python (2.7 preferably) or Scikit-Learn to use as I would in Java/Scala. Something along the lines of

"import (filename.pmml) as pm pm.predict(data)

Thanks for any help!

Shaia answered 16/9, 2017 at 14:59 Comment(4)
Were you going to export it, change it, and then you want to reload it back into python? Or you just want to reopen the original at some point?Unthinkable
Hi Tony. No changes, just reload it back into Python and then perform a simple prediction. So if somebody built a simple regression, and emailed me a .pmml file, I could open that .pmml file in my own Jupyter notebook or Python REPL and hand it some data and then make a prediction. You can do it in something like Spark, but I haven't seen it done in Python (yet).Shaia
I'm not familiar with pmml, but have you tried pickle, or another example of sklearn+pickleUnthinkable
Thanks Tony. We were just trying to do it in PMML for a proof of concept.Shaia
W
7

Scikit-learn does not offer support for importing PMML files, so what you're trying to achieve cannot be done I'm afraid.

The concept of using libraries such as sklearn2pmml is really to extend the functionality that sklearn does not have when it comes to supporting the model export to a PMML format.

Typically, those who use sklearn2pmml are really looking to re-use the PMML models in other platforms (e.g. IBM's SPSS, Apache Spark ML, Weka or any other consumer as listed in the Data Mining Group's website).

If you're looking to save a model created with scikit-learn and re-use it afterwards with scikit-learn as well then you should explore its native persistence model mechanism named Pickle, which uses a binary data format.

You can read more about how to save/load models in Pickle format (together with its known issues) here.

William answered 16/9, 2017 at 17:24 Comment(2)
Thank you very much. I'm aware of Pickle and we have been using PMML for Apache Spark, and was curious if this could be achieved in Python. Thanks again!Shaia
I don't think this is correct answer. You can import pmml to pythonGoalie
M
2

I created a simple solution to generate sklearn kmeans models from pmml files which i exported from knime analytics platform. You can check it out pmml2sklearn

Margarettamargarette answered 20/5, 2019 at 13:47 Comment(0)
F
2

You could use PyPMML to make predictions on a new dataset using PMML in Python, for example:

from pypmml import Model

model = Model.fromFile('the/pmml/file/path')
result = model.predict(data)

The data could be dict, json, Series or DataFrame of Pandas.

Farrow answered 25/7, 2019 at 4:29 Comment(0)
G
0

I believe you can Import/Export a pmml file with python. After you load back your model you can predict again with out any problem. However output file formats can differ, like 1d array, or nxn panda tables etc.

from sklearn2pmml import make_pmml_pipeline, sklearn2pmml
from pypmml import Model

#Extract as pmml
yourModelPipeline = make_pmml_pipeline(yourModelObjectGoesHere)
sklearn2pmml(yourModelPipeline, "yourModel.pmml")

#Load from pmml
yourModelLoaded = Model.fromFile('yourModel.pmml')
prediction = yourModelLoaded.predict(yourPredictionDataSet)

Lastly reproducing result make take long time, don't let it discourage you :). I would like to share developers comment about the issue: https://github.com/autodeployai/pypmml/issues/53

Goalie answered 13/12, 2022 at 14:8 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.