How does keras handle multiple losses?

Asked 21/3, 2018 at 10:48 Answered 26/3 at 2:36

Solved deep-learning keras backpropagation loss-function

If I have something like:

model = Model(inputs = input, outputs = [y1,y2])

l1 = 0.5
l2 = 0.3
model.compile(loss = [loss1,loss2], loss_weights = [l1,l2], ...)

what does Keras do with the losses to obtain the final loss? Is it something like:

final_loss = l1*loss1 + l2*loss2

Also, what does it mean during training? Is the loss2 only used to update the weights on layers where y2 comes from? Or is it used for all the model's layers?

Ricard answered 21/3, 2018 at 10:48 Comment(0)

From model documentation:

loss: String (name of objective function) or objective function. See losses. If the model has multiple outputs, you can use a different loss on each output by passing a dictionary or a list of losses. The loss value that will be minimized by the model will then be the sum of all individual losses.

...

loss_weights: Optional list or dictionary specifying scalar coefficients (Python floats) to weight the loss contributions of different model outputs. The loss value that will be minimized by the model will then be the weighted sum of all individual losses, weighted by the loss_weights coefficients. If a list, it is expected to have a 1:1 mapping to the model's outputs. If a tensor, it is expected to map output names (strings) to scalar coefficients.

So, yes, the final loss will be the "weighted sum of all individual losses, weighted by the loss_weights coeffiecients".

You can check the code where the loss is calculated.

Also, what does it mean during training? Is the loss2 only used to update the weights on layers where y2 comes from? Or is it used for all the model's layers?

The weights are updated through backpropagation, so each loss will affect only layers that connect the input to the loss.

For example:

                        +----+         
                        > C  |-->loss1 
                       /+----+         
                      /                
                     /                 
    +----+    +----+/                  
 -->| A  |--->| B  |\                  
    +----+    +----+ \                 
                      \                
                       \+----+         
                        > D  |-->loss2 
                        +----+

loss1 will affect A, B, and C.
loss2 will affect A, B, and D.

Bairam answered 21/3, 2018 at 12:20 Comment(2)

PAY ATTENTION!!! from the documentation: 'If the model has multiple outputs... The loss value that will be minimized by the model will then be the sum of all individual losses' (as reasonable) tensorflow.org/api_docs/python/tf/keras/Model#compile so the final loss will affect ALL – Whiplash 7/10, 2020 at 16:54

@MarcoCerliani Notice that " the sum of all individual losses" , loss1 part of total loss has gradient only with A B C not D, and loss2 part of total loss has gradient only with A B D not C. – Jit 26/3 at 2:24

For multiple outputs to back propagate, I think it is not a complete answer from what's mentioned by Fábio Perez.

Also, what does it mean during training? Is the loss2 only used to update the weights on layers where y2 comes from? Or is it used for all the model's layers?

For output C and output D, keras will compute a final loss F_loss=w1 * loss1 + w2 * loss2. And then, the final loss F_loss is applied to both output C and output D. Finally comes the backpropagation from output C and output D using the same F_loss to back propagate.

Snap answered 13/5, 2018 at 3:3 Comment(1)

only w1*loss1 part have gradient with C and only w2*loss2 part have gradient with D – Jit 26/3 at 2:40

what happens behind model.fit?

loss calculation:

loss_values = [] 
for loss_obj, loss_weight in zip(*zip_args):
    loss_value *= loss_weight
    loss_values.append(loss_value)
    reg_loss = math_ops.add_n(regularization_losses)
    loss_values.append(reg_loss)
total_loss = math_ops.add_n(loss_values)

Gradient calculate and weight update:

gradients = tape.gradient(loss, trainable_variables)
optimizer.apply_gradients(zip(gradients, trainable_variables))

Notice that tape.gradient will auto gradient, here loss is the node in graph, auto gradient will trace all gradient in the graph，total_loss = loss1 + loss2`

$\begin{aligned} \frac{\partial{loss1}}{\partial{D}} &= 0 \ \frac{\partial{loss2}}{\partial{C}} &= 0 \end{aligned}$

so:

loss1 will affect A, B, and C.
loss2 will affect A, B, and D.

Jit answered 26/3 at 2:36 Comment(0)

Recommended topics

Hot tags