Higher loss penalty for true non-zero predictions
Asked Answered
W

1

8

I am building a deep regression network (CNN) to predict a (1000,1) target vector from images (7,11). The target usually consists of about 90 % zeros and only 10 % non-zero values. The distribution of (non-) zero values in the targets vary from sample to sample (i.e. there is no global class imbalance).

Using mean sqaured error loss, this led to the network predicting only zeros, which I don't find surprising.

My best guess is to write a custom loss function that penalizes errors regarding non-zero values more than the prediction of zero-values.

I have tried this loss function with the intend to implement what I have guessed could work above. It is a mean squared error loss in which the predictions of non-zero targets are penalized less (w=0.1).

def my_loss(y_true, y_pred):
    # weights true zero predictions less than true nonzero predictions
    w = 0.1
    y_pred_of_nonzeros = tf.where(tf.equal(y_true, 0), y_pred-y_pred, y_pred)
    return K.mean(K.square(y_true-y_pred_of_nonzeros)) + K.mean(K.square(y_true-y_pred))*w

The network is able to learn without getting stuck with only-zero predictions. However, this solution seems quite unclean. Is there a better way to deal with this type of problem? Any advice on improving the custom loss function? Any suggestions are welcome, thank you in advance!

Best, Lukas

Walling answered 23/8, 2019 at 0:31 Comment(3)
Hey Lukas, thank you for asking this question! I am dealing with a similar problem. Can I ask the range of your target values and if you used any kind of normalization on the target vector cells? In my problem, the vector cells have different scales so I had to independently normalize each target vector cell to have a more balanced loss. Did you encounter a similar issue? Thank you!Quadrisect
Hi dogadikbayir! Yes, I normalize the target vector to be between 0 and 1. My output vector is homogeneous with all the same scale, wherefore I don't have the same problem. Individual normalization seems fine, though, what is the problem you are facing? Best, LukasWalling
Thank you for the response! Since my target vector cell values can differ by several orders of magnitude, the loss function simply favours contributions made by bigger magnitude values. By independently normalizing each cell, I have improved the performance. I was just wondering if you had a similar issue and had a better solution :)Quadrisect
C
4

Not sure there is anything better than a custom loss just like you did, but there is a cleaner way:

def weightedLoss(w):

    def loss(true, pred):

        error = K.square(true - pred)
        error = K.switch(K.equal(true, 0), w * error , error)

        return error 

    return loss

You may also return K.mean(error), but without mean you can still profit from other Keras options like adding sample weights and other things.

Select the weight when compiling:

model.compile(loss = weightedLoss(0.1), ...)

If you have the entire data in an array, you can do:

w = K.mean(y_train)
w = w / (1 - w) #this line compesates the lack of the 90% weights for class 1

Another solution that can avoid using a custom loss, but requires changes in the data and the model is:

  • Transform your y into a 2-class problem for each output. Shape = (batch, originalClasses, 2).

For the zero values, make the first of the two classes = 1
For the one values, make the second of the two classes = 1

newY = np.stack([1-oldY, oldY], axis=-1)    

Adjust the model to output this new shape.

...
model.add(Dense(2*classes))
model.add(Reshape((classes,2)))
model.add(Activation('softmax'))

Make sure you are using a softmax and a categorical_crossentropy as loss.

Then use the argument class_weight={0: w, 1: 1} in fit.

Communicant answered 23/8, 2019 at 0:44 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.