Spark add new fitted stage to a exitsting PipelineModel without fitting again
Asked Answered
B

1

4

I have a saved PipelineModel:

pipe_model = pipe.fit(df_train)
pipe_model.write().overwrite().save("/user/pipe_text_2")

And now I want to add to this Pipe a new already fited PipelineModel:

pipe_model = PipelineModel.load("/user/pipe_text_2")
df2 = pipe_model.transform(df1)

kmeans = KMeans(k=20)
pipe2 = Pipeline(stages=[kmeans])
pipe_model2 = pipe2.fit(df2)

Is that possible without fitting it again? In order to obtain a new PipelineModel but not a new Pipeline. The ideal thing would be the following:

pipe_model_new = pipe_model + pipe_model2
TypeError: unsupported operand type(s) for +: 'PipelineModel' and 'PipelineModel'

I've found Join two Spark mllib pipelines together but with this solution you need to fit the whole Pipe again. That is what I'm trying to avoid.

Boult answered 17/3, 2018 at 14:8 Comment(0)
O
8

Since PipelineModels are valid stages for a PipelieModel class, you should be able to use this which does not require fiting again:

pipe_model_new = PipelineModel(stages = [pipe_model , pipe_model2])
final_df = pipe_model_new.transform(df1)
Ordination answered 22/3, 2018 at 6:55 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.