How to set a custom loss function in Spark MLlib
Asked Answered
M

1

8

I would like to use my own loss function instead of the squared loss for the linear regression model in spark MLlib. So far can't find any part in the documentation that mentions if it is even possible.

Metzgar answered 14/11, 2017 at 17:34 Comment(0)
A
2

TLDR; it is not easy to use a custom loss function because you can not simply pass a loss function to spark models. However, you can easily write a customized model for yourself.

Long answer:
If you look at the code of LinearRegressionWithSGD you will see:

class LinearRegressionWithSGD private[mllib] (
    private var stepSize: Double,
    private var numIterations: Int,
    private var regParam: Double,
    private var miniBatchFraction: Double)
  extends GeneralizedLinearAlgorithm[LinearRegressionModel] with Serializable {

  private val gradient = new LeastSquaresGradient() #Loss Function
  private val updater = new SimpleUpdater()
  @Since("0.8.0")
  override val optimizer = new GradientDescent(gradient, updater) #Optimizer
    .setStepSize(stepSize)
    .setNumIterations(numIterations)
    .setRegParam(regParam)
    .setMiniBatchFraction(miniBatchFraction)

So, let's look at how the least squared loss function is implemented here:

class LeastSquaresGradient extends Gradient {
  override def compute(data: Vector, label: Double, weights: Vector): (Vector, Double) = {
    val diff = dot(data, weights) - label
    val loss = diff * diff / 2.0
    val gradient = data.copy
    scal(diff, gradient)
    (gradient, loss)
  }

  override def compute(
      data: Vector,
      label: Double,
      weights: Vector,
      cumGradient: Vector): Double = {
    val diff = dot(data, weights) - label
    axpy(diff, data, cumGradient)
    diff * diff / 2.0
  }
}

So, you can simply write a class like LeastSquaresGradient and implement the compute function and use it in your LinearRegressionWithSGD model.

Astto answered 9/7, 2019 at 2:29 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.