I built an scikit-learn model and I want to reuse in a daily python cron job (NB: no other platforms are involved - no R, no Java &c).
I pickled it (actually, I pickled my own object whose one field is a GradientBoostingClassifier
), and I un-pickle it in the cron job. So far so good (and has been discussed in Save classifier to disk in scikit-learn and Model persistence in Scikit-Learn?).
However, I upgraded sklearn
and now I get these warnings:
.../.local/lib/python2.7/site-packages/sklearn/base.py:315:
UserWarning: Trying to unpickle estimator DecisionTreeRegressor from version 0.18.1 when using version 0.18.2. This might lead to breaking code or invalid results. Use at your own risk.
UserWarning)
.../.local/lib/python2.7/site-packages/sklearn/base.py:315:
UserWarning: Trying to unpickle estimator PriorProbabilityEstimator from version 0.18.1 when using version 0.18.2. This might lead to breaking code or invalid results. Use at your own risk.
UserWarning)
.../.local/lib/python2.7/site-packages/sklearn/base.py:315:
UserWarning: Trying to unpickle estimator GradientBoostingClassifier from version 0.18.1 when using version 0.18.2. This might lead to breaking code or invalid results. Use at your own risk.
UserWarning)
What do I do now?
I can downgrage to 0.18.1 and stick with it until I am ready to rebuild the model. For various reasons I find this unacceptable.
I can un-pickle the file and re-pickle it again. This worked with 0.18.2, but breaks with 0.19. NFG.
joblib
looks no better.I wish I could save the data in a version-independent ASCII format (e.g., JSON or XML). This is, obviously, the optimal solution, but there seems to be NO way to do that (see also Sklearn - model persistence without pkl file).
I could save the model to PMML, but its support is lukewarm at best: I can use
sklearn2pmml
to save the model (although not easily), andaugustus
/lightpmmlpredictor
to apply (although not load) the model. However, none of those is available topip
directly, which makes deployment a nightmare. Also, theaugustus
&lightpmmlpredictor
projects seem to be dead. Importing PMML models into Python (Scikit-learn) - nope.A variant of the above: save PMML using
sklearn2pmml
, and useopenscoring
for scoring. Requires interfacing with an external process. Yuk.
Suggestions?
sklearn
changes the Class definition (e.g., drops or renames a slot), I will have to rewrite theserialize_*
anddeserialize_*
function and, more important, write deserializer that converts the serialization of the old version to the new version. I agree that this is probably better than the pickle nightmare, but hardly much. – Penknife