Model Probability Calibration in Pyspark
Asked Answered
F

0

6

I am using PySpark to implement a Churn classification model for a business problem and the dataset I have is imbalanced. So when I train the model, I randomly select a dataset with equal numbers of 1's and 0's. Then I applied the model in a real-time data and the number of predicted 1's and 0's were obviously equal.

Now, I need to calibrate my trained model. But I couldn't find a way to do it in PySpark. Does anyone have an idea how to calibrate a model in PySpark, May be something like CalibratedClassifierCV ?

Fritz answered 27/9, 2020 at 10:1 Comment(1)
Hey, this is an old question, but you can basically do the calibration "by-hand". Platt scaling is the most common calibration method and it consists in fitting a new Logistic Regression model where X = model scores and y = real targetPancreas

© 2022 - 2024 — McMap. All rights reserved.