I've developed a spam classifier using pandas and scikit learn to the point where it's ready for integration into our hadoop-based system. To this end, I need to export my classifier to a more common format than pickling.
The Predictive Model Markup Language (PMML) is my preferred export format. It plays exceedingly well with Cascading, which we already use. However, I surprisingly cannot find any python libraries that export scikit-learn models into PMML.
Has anyone had experience with this use case? Is there any sort of alternative to PMML that would lend interoperability between scikit-learn and hadoop? What about a solid PMML export library?