Apache Spark MLlib: How to import model from PMML
Asked Answered
M

3

8

I have a PMML file which encodes a logistic regression model that was NOT exported from MLlib.

How can I import the model from PMML using MLlib in Java for evaluation/prediction?

(I know that MLlib can export to PMML, but I need to import from PMML)

Matteson answered 29/1, 2017 at 11:58 Comment(0)
S
1

You could use PMML4S-Spark to import PMML as a SparkML transformer, then make predictions/evaluations in Scala, for example:

import org.pmml4s.spark.ScoreModel

val model = ScoreModel.fromFile("the/pmml/model/path")
val scoreDf = model.transform(df)

If you use PySpark, you could use PyPMML-Spark, for example:

from pypmml_spark import ScoreModel

model = ScoreModel.fromFile('the/pmml/model/path')
score_df = model.transform(df)
Supermundane answered 25/7, 2019 at 5:7 Comment(0)
S
0

To import, you need to perform PMML export operations in the reverse order:

  1. Extract the intercept and feature coefficients from PMML's RegressionModel/RegressionTable element.
  2. Instantiate Spark ML's LogisticRegressionModel object using those values.

This is my second time posting this answer. I wonder why the first answer was deleted (without any discussion/explanation)?

Seaway answered 31/1, 2017 at 9:46 Comment(5)
Maybe the OP who asked you the question has deleted it thus your answer was also deleted with it. This tends to happen when an answer hasn't been acceptedLedoux
@Seaway OP here. You didn't post an answer, only a comment. I deleted your comment because it was not constructive and very vague as well. Thanks for your answer, but it is not exactly what I was asking. I need a way to import PMML directly into MLlib without having to parse the features myself and then instantiate the model.Matteson
@Matteson There is no more "direct" way. Apache Spark and PMML use different concepts/data structures to represent logistic regression models. You must perform manual translation between the two, there's no magical "cast operator" for that. Alternatively, why don't you score PMML models on Apache Spark just as they are - there are ready to use Java libraries for that.Seaway
@Seaway Hi, what is the best way to extract the intercept and feature coefficients from PMML in java?Picofarad
@Picofarad The best way is to do PMML export operations in reverse order - literally. Open Apache Spark's logistic regression class, scroll to PMMLExportable implementation, take the code block, and reverse its lines. No need to introduce 3rd party dependencies or invent new application logic, it's all there.Seaway
C
0

Have you considered using a PMML loader such as jpmml-spark? You can have interoperability issues depending on where you built the model and which pmml exporter you used. I believe sklearn2pmml is based on jpmml library so you should have good interoperability if you use those in combination.

Cording answered 14/2, 2017 at 17:18 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.