Need help implementing a custom loss function in lightGBM (Zero-inflated Log Normal Loss)
Asked Answered
D

1

8

Im trying to implement this zero-inflated log normal loss function based on this paper in lightGBM (https://arxiv.org/pdf/1912.07753.pdf) (page 5). But, admittedly, I just don’t know how. I don’t understand how to get the gradient and hessian of this function in order to implement it in LGBM and I’ve never needed to implement a custom loss function in the past.

The authors of this paper have open sourced their code, and the function is available in tensorflow (https://github.com/google/lifetime_value/blob/master/lifetime_value/zero_inflated_lognormal.py), but I’m unable to translate this to fit the parameters required for a custom loss function in LightGBM. An example of how LGBM accepts custom loss functions— loglikelihood loss would be written as:

def loglikelihood(preds, train_data):
    labels = train_data.get_label()
    preds = 1. / (1. + np.exp(-preds))
    grad = preds - labels
    hess = preds * (1. - preds)
    return grad, hess

Similarly, I would need to define a custom eval metric to accompany it, such as:

def binary_error(preds, train_data):
    labels = train_data.get_label()
    preds = 1. / (1. + np.exp(-preds))
    return 'error', np.mean(labels != (preds > 0.5)), False

Both of the above two examples are taken from the following repository:

https://github.com/microsoft/LightGBM/blob/e83042f20633d7f74dda0d18624721447a610c8b/examples/python-guide/advanced_example.py#L136

Would appreciate any help on this, and especially detailed guidance to help me learn how to do this on my own.

According to the LGBM documentation for custom loss functions:

It should have the signature objective(y_true, y_pred) -> grad, hess or objective(y_true, y_pred, group) -> grad, hess:

y_true: numpy 1-D array of shape = [n_samples]
The target values.

y_pred: numpy 1-D array of shape = [n_samples] or numpy 2-D array of shape = [n_samples, n_classes] (for multi-class task)
The predicted values. Predicted values are returned before any transformation, e.g. they are raw margin instead of probability of positive class for binary task.

group: numpy 1-D array
Group/query data. Only used in the learning-to-rank task. sum(group) = n_samples. For example, if you have a 100-document dataset with group = [10, 20, 40, 10, 10, 10], that means that you have 6 groups, where the first 10 records are in the first group, records 11-30 are in the second group, records 31-70 are in the third group, etc.

grad: numpy 1-D array of shape = [n_samples] or numpy 2-D array of shape = [n_samples, n_classes] (for multi-class task)
The value of the first order derivative (gradient) of the loss with respect to the elements of y_pred for each sample point.

hess: numpy 1-D array of shape = [n_samples] or numpy 2-D array of shape = [n_samples, n_classes] (for multi-class task)
The value of the second order derivative (Hessian) of the loss with respect to the elements of y_pred for each sample point.
Desiccant answered 25/5, 2022 at 17:6 Comment(3)
Can you put a minimum example of lightgbm? It has features and desired output. This allows your helpers to focus on custom loss.Pieria
lightGBM model has a loss function called Tweedie which is very similar to what your custom loss function would do. Can read more about Tweedie loss function using link below. medium.com/@royravid/…Lavine
I have to admit that I am not fully sure about what is the problematic part for you. I posted the whole translation step by step (i.e. including all the accessory functions) but didn't adapt it to your case as I find unclear what exact output you need. I believe this should be enough to get you going as you have everything there (if you know your math). Please feel free to ask if you need extra detailsAllhallowtide
A
1

This is the "translation", as you defined it, of the tensorflow implementation. Most of the work is just defining the functions yourself (i.e. softplus, crossentropy, etc.)

The mean absolute percentage error is used in the linked paper, not sure if that is the eval metric you want to use.

import math
import numpy as np
epsilon = 1e-7
def sigmoid(x):
  return 1 / (1 + math.exp(-x))

def softplus(beta=1, threshold=20):
  return 1 / beta* math.log(1 + math.exp(beta*x))

def BinaryCrossEntropy(y_true, y_pred):
    y_pred = np.clip(y_pred, epsilon, 1 - epsilon)
    term_0 = (1-y_true) * np.log(1-y_pred + epsilon)
    term_1 = y_true * np.log(y_pred + epsilon)
    return -np.mean(term_0+term_1, axis=0)

def zero_inflated_lognormal_pred(logits):
  positive_probs = sigmoid(logits[..., :1])
  loc = logits[..., 1:2]
  scale = softplus(logits[..., 2:])
  preds = (
      positive_probs *
      np.exp(loc + 0.5 * np.square(scale)))
  return preds

def mean_abs_pct_error(preds, train_data):
    labels = train_data.get_label()
    decile_labels=np.percentile(labels,np.linspace(10,100,10))
    decile_preds=np.percentile(preds,np.linspace(10,100,10))
    MAPE = sum(np.absolute(decile_preds - decile_labels)/decile_labels)
    return 'error', MAPE, False

def zero_inflated_lognormal_loss(train_data,
                                 logits):

  labels = train_data.get_label()
  positive = labels > 0

  positive_logits = logits[..., :1]
  classification_loss = BinaryCrossEntropy(
      y_true=positive, y_pred=positive_logits)

  loc = logits[..., 1:2]
  scale = math.maximum(
      softplus(logits[..., 2:]),
      math.sqrt(epsilon))
  safe_labels = positive * labels + (
      1 - positive) * np.ones(labels.shape)
  regression_loss = -np.mean(
      positive * np.LogNormal(mean=loc, stdev=scale).log_prob(safe_labels),
      axis=-1)
  return classification_loss + regression_loss
Allhallowtide answered 3/6, 2022 at 12:35 Comment(3)
Thank you for your response. LightGBM requires that any custom loss function return the gradient and the hessian of the function, similar to the example provided. In your answer, could you please provide guidance as to how one can implement your solution to be used as a custom loss function in LightGBM regression? Based on the documentation in LightGBM, zero_inflated_lognormal_loss needs to output the hessian and the gradient. For clarity, I've updated my question to include the LGBM requirements.Desiccant
Why is there a negative sign in front of the regression loss?Toxin
@NegativeCorrelation: You can likely compute the gradient and hessian automatically using github.com/HIPS/autogradSidewalk

© 2022 - 2024 — McMap. All rights reserved.