"empty collection" error when trying to load a saved Spark model using pyspark

I'm building a Random Forest model using Spark and I want to save it to use again later. I'm running this on pyspark (Spark 2.0.1) without HDFS, so the files are saved to the local file system.

I've tried to do it like so:

import pyspark.sql.types as T
from pyspark.ml.feature import VectorAssembler
from pyspark.ml.classification import RandomForestClassifier

data = [[0, 0, 0.],
        [0, 1, 1.],
        [1, 0, 1.],
        [1, 1, 0.]]

schema = T.StructType([
    T.StructField('a', T.IntegerType(), True),
    T.StructField('b', T.IntegerType(), True),
    T.StructField('label', T.DoubleType(), True)])

df = sqlContext.createDataFrame(data, schema)

assembler = VectorAssembler(inputCols=['a', 'b'], outputCol='features')
df = assembler.transform(df)

classifier = RandomForestClassifier(numTrees=10, maxDepth=15, labelCol='label', featuresCol='features')
model = classifier.fit(df)

model.write().overwrite().save('saved_model')

And then, to load the model:

from pyspark.ml.classification import RandomForestClassificationModel

loaded_model = RandomForestClassificationModel.load('saved_model')

But I get this error:

Py4JJavaError: An error occurred while calling o108.load.
: java.lang.UnsupportedOperationException: empty collection

I'm not sure to which collection it is referring to. Any ideas how to properly load (or save) the model?

Recommended topics

Hot tags