Need help implementing a custom loss function in lightGBM (Zero-inflated Log Normal Loss)

Im trying to implement this zero-inflated log normal loss function based on this paper in lightGBM (https://arxiv.org/pdf/1912.07753.pdf) (page 5). But, admittedly, I just don’t know how. I don’t understand how to get the gradient and hessian of this function in order to implement it in LGBM and I’ve never needed to implement a custom loss function in the past.

The authors of this paper have open sourced their code, and the function is available in tensorflow (https://github.com/google/lifetime_value/blob/master/lifetime_value/zero_inflated_lognormal.py), but I’m unable to translate this to fit the parameters required for a custom loss function in LightGBM. An example of how LGBM accepts custom loss functions— loglikelihood loss would be written as:

def loglikelihood(preds, train_data):
    labels = train_data.get_label()
    preds = 1. / (1. + np.exp(-preds))
    grad = preds - labels
    hess = preds * (1. - preds)
    return grad, hess

Similarly, I would need to define a custom eval metric to accompany it, such as:

def binary_error(preds, train_data):
    labels = train_data.get_label()
    preds = 1. / (1. + np.exp(-preds))
    return 'error', np.mean(labels != (preds > 0.5)), False

Both of the above two examples are taken from the following repository:

https://github.com/microsoft/LightGBM/blob/e83042f20633d7f74dda0d18624721447a610c8b/examples/python-guide/advanced_example.py#L136

Would appreciate any help on this, and especially detailed guidance to help me learn how to do this on my own.

According to the LGBM documentation for custom loss functions:

It should have the signature objective(y_true, y_pred) -> grad, hess or objective(y_true, y_pred, group) -> grad, hess:

y_true: numpy 1-D array of shape = [n_samples]
The target values.

y_pred: numpy 1-D array of shape = [n_samples] or numpy 2-D array of shape = [n_samples, n_classes] (for multi-class task)
The predicted values. Predicted values are returned before any transformation, e.g. they are raw margin instead of probability of positive class for binary task.

group: numpy 1-D array
Group/query data. Only used in the learning-to-rank task. sum(group) = n_samples. For example, if you have a 100-document dataset with group = [10, 20, 40, 10, 10, 10], that means that you have 6 groups, where the first 10 records are in the first group, records 11-30 are in the second group, records 31-70 are in the third group, etc.

grad: numpy 1-D array of shape = [n_samples] or numpy 2-D array of shape = [n_samples, n_classes] (for multi-class task)
The value of the first order derivative (gradient) of the loss with respect to the elements of y_pred for each sample point.

hess: numpy 1-D array of shape = [n_samples] or numpy 2-D array of shape = [n_samples, n_classes] (for multi-class task)
The value of the second order derivative (Hessian) of the loss with respect to the elements of y_pred for each sample point.

import math import numpy as np epsilon = 1e-7 def sigmoid(x): return 1 / (1 + math.exp(-x)) def softplus(beta=1, threshold=20): return 1 / beta* math.log(1 + math.exp(beta*x)) def BinaryCrossEntropy(y_true, y_pred): y_pred = np.clip(y_pred, epsilon, 1 - epsilon) term_0 = (1-y_true) * np.log(1-y_pred + epsilon) term_1 = y_true * np.log(y_pred + epsilon) return -np.mean(term_0+term_1, axis=0) def zero_inflated_lognormal_pred(logits): positive_probs = sigmoid(logits[..., :1]) loc = logits[..., 1:2] scale = softplus(logits[..., 2:]) preds = ( positive_probs * np.exp(loc + 0.5 * np.square(scale))) return preds def mean_abs_pct_error(preds, train_data): labels = train_data.get_label() decile_labels=np.percentile(labels,np.linspace(10,100,10)) decile_preds=np.percentile(preds,np.linspace(10,100,10)) MAPE = sum(np.absolute(decile_preds - decile_labels)/decile_labels) return 'error', MAPE, False def zero_inflated_lognormal_loss(train_data, logits): labels = train_data.get_label() positive = labels > 0 positive_logits = logits[..., :1] classification_loss = BinaryCrossEntropy( y_true=positive, y_pred=positive_logits) loc = logits[..., 1:2] scale = math.maximum( softplus(logits[..., 2:]), math.sqrt(epsilon)) safe_labels = positive * labels + ( 1 - positive) * np.ones(labels.shape) regression_loss = -np.mean( positive * np.LogNormal(mean=loc, stdev=scale).log_prob(safe_labels), axis=-1) return classification_loss + regression_loss

Recommended topics

Hot tags