My limited experience in this spark.mllib's KMeans space is that it is not possible, but you could develop the feature yourself.
spark.mllib's KMeansModel
is PMMLExportable
:
class KMeansModel @Since("1.1.0") (@Since("1.0.0") val clusterCenters: Array[Vector])
extends Saveable with Serializable with PMMLExportable {
That's why you can use toPMML that saves a model into the PMML XML format.
(Again I've got a very little experience in Spark MLlib) My understanding is that KMeans is all about centroids and that's what is loaded when you do KMeansModel.load that in turn uses KMeansModel.SaveLoadV1_0.load that reads the centroids and creates a KMeansModel
:
new KMeansModel(localCentroids.sortBy(_.id).map(_.point))
For KMeansModel.toPMML
, Spark MLlib uses pmml-model's PMML
(as you can see here):
new PMML("4.2", header, null)
I'd recommend exploring pmml-model's PMML
how to do saving and loading as that's beyond Spark's realm.
Side notes
Why would you even want to use Spark to have the model after you trained it? It is indeed possible, but you may be wasting your cluster resources for Spark to host the model.
In my limited understanding, the sole purpose of Spark MLlib is to use Spark's features like distribution and parallelism to handle large datasets to build models and use them without the Spark machinery afterwards.
I must be missing something important in my narrow view...
KMeansModel.save
– Lodgings