How to load a PMML model?

Asked 15/6, 2016 at 14:35 Answered 9/12, 2021 at 11:11

scala apache-spark apache-spark-mllib pmml

I'm following the instructions of PMML model export - spark.mllib to create a K-means model.

val numClusters = 10
val numIterations = 10
val clusters = KMeans.train(data, numClusters, numIterations)
// Save and load model: export to PMML
println("PMML Model:\n" + clusters.toPMML("/kmeans.xml"))

But I don't know how to load the PMML after that.

I'm trying

val sameModel = KMeansModel.load(sc, "/kmeans.xml")

and appears:

org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/kmeans.xml/metadata

Any idea?

Best regards

Lodgings answered 15/6, 2016 at 14:35 Comment(3)

Seems like there aren't any method to import a PMML model. So, I changed the way to create the model, with KMeansModel.save – Lodgings 20/6, 2016 at 11:24

Is it important to store the model in PMML format only? Because you can just save the model and then reload it. – Conductivity 25/5, 2017 at 11:21

I would recommend you look at this project, which adds more fully-featured PMML-functionality to Spark: github.com/jpmml/jpmml-spark. – Its 25/5, 2017 at 13:40

As stated in the documentation (for the version you seem to be interested it - 1.6.1 and also for the latest available - 2.1.0) Spark supports exporting to PMML only. The load method actually expects to retrieve a model saved in Spark own format and this is why the load method expects a certain path to be there and why the exception has been thrown.

If you trained the model with Spark, you can save it and load it later.

If you need to load a model that has not been trained in Spark and has been saved as PMML you can use jpmml-spark to load and evaluate it.

Inseverable answered 30/5, 2017 at 13:13 Comment(0)

My limited experience in this spark.mllib's KMeans space is that it is not possible, but you could develop the feature yourself.

spark.mllib's KMeansModel is PMMLExportable:

class KMeansModel @Since("1.1.0") (@Since("1.0.0") val clusterCenters: Array[Vector])
  extends Saveable with Serializable with PMMLExportable {

That's why you can use toPMML that saves a model into the PMML XML format.

(Again I've got a very little experience in Spark MLlib) My understanding is that KMeans is all about centroids and that's what is loaded when you do KMeansModel.load that in turn uses KMeansModel.SaveLoadV1_0.load that reads the centroids and creates a KMeansModel:

new KMeansModel(localCentroids.sortBy(_.id).map(_.point))

For KMeansModel.toPMML, Spark MLlib uses pmml-model's PMML (as you can see here):

new PMML("4.2", header, null)

I'd recommend exploring pmml-model's PMML how to do saving and loading as that's beyond Spark's realm.

Side notes

Why would you even want to use Spark to have the model after you trained it? It is indeed possible, but you may be wasting your cluster resources for Spark to host the model.

In my limited understanding, the sole purpose of Spark MLlib is to use Spark's features like distribution and parallelism to handle large datasets to build models and use them without the Spark machinery afterwards.

I must be missing something important in my narrow view...

Kaplan answered 31/5, 2017 at 7:8 Comment(1)

Hi Jacek, I was using that to a Near Real Time analysis. So I used the Lambda Architecture: I made a batch model (that need a lot of time) and we I needed to analysed the data I wanted to load that model. – Lodgings 2/6, 2017 at 15:49

You could use PMML4S-Spark to load a PMML model to evaluate it in Spark, for example:

import org.pmml4s.spark.ScoreModel

val model = ScoreModel.fromFile("/kmeans.xml")

The model is a SparkML transformer, so you can make prediction against a dataframe:

val scoreDf = model.transform(df)

Brockwell answered 22/8, 2019 at 1:29 Comment(0)

PMML files are actually xml files with schemas defined by Data Mining Consortium. For that reason you can either define a deserializer based on the contract given at DMC and PMML web page here or use 3rd party libraries.

I am researching on jpmml library for incorporation python prepared models in Spring application.

Information here: https://github.com/jpmml http://dmg.org/pmml/v4-1/GeneralStructure.html

Lundeen answered 9/12, 2021 at 11:11 Comment(0)

Side notes

Recommended topics

Hot tags