What is the purpose of the add_loss function in Keras?

Asked 27/4, 2018 at 13:35 Answered 5/3, 2021 at 4:0

Currently I stumbled across variational autoencoders and tried to make them work on MNIST using keras. I found a tutorial on github.

My question concerns the following lines of code:

# Build model
vae = Model(x, x_decoded_mean)

# Calculate custom loss
xent_loss = original_dim * metrics.binary_crossentropy(x, x_decoded_mean)
kl_loss = - 0.5 * K.sum(1 + z_log_var - K.square(z_mean) - K.exp(z_log_var), axis=-1)
vae_loss = K.mean(xent_loss + kl_loss)

# Compile
vae.add_loss(vae_loss)
vae.compile(optimizer='rmsprop')

Why is add_loss used instead of specifying it as compile option? Something like vae.compile(optimizer='rmsprop', loss=vae_loss) does not seem to work and throws the following error:

ValueError: The model cannot be compiled because it has no loss to optimize.

What is the difference between this function and a custom loss function, that I can add as an argument for Model.fit()?

Thanks in advance!

P.S.: I know there are several issues concerning this on github, but most of them were open and uncommented. If this has been resolved already, please share the link!

Edit 1

I removed the line which adds the loss to the model and used the loss argument of the compile function. It looks like this now:

# Build model
vae = Model(x, x_decoded_mean)

# Calculate custom loss
xent_loss = original_dim * metrics.binary_crossentropy(x, x_decoded_mean)
kl_loss = - 0.5 * K.sum(1 + z_log_var - K.square(z_mean) - K.exp(z_log_var), axis=-1)
vae_loss = K.mean(xent_loss + kl_loss)

# Compile
vae.compile(optimizer='rmsprop', loss=vae_loss)

This throws an TypeError:

TypeError: Using a 'tf.Tensor' as a Python 'bool' is not allowed. Use 'if t is not None:' instead of 'if t:' to test if a tensor is defined, and use TensorFlow ops such as tf.cond to execute subgraphs conditioned on the value of a tensor.

Edit 2

Thanks to @MarioZ's efforts, I was able to figure out a workaround for this.

# Build model
vae = Model(x, x_decoded_mean)

# Calculate custom loss in separate function
def vae_loss(x, x_decoded_mean):
    xent_loss = original_dim * metrics.binary_crossentropy(x, x_decoded_mean)
    kl_loss = - 0.5 * K.sum(1 + z_log_var - K.square(z_mean) - K.exp(z_log_var), axis=-1)
    vae_loss = K.mean(xent_loss + kl_loss)
    return vae_loss

# Compile
vae.compile(optimizer='rmsprop', loss=vae_loss)

...

vae.fit(x_train, 
    x_train,        # <-- did not need this previously
    shuffle=True,
    epochs=epochs,
    batch_size=batch_size,
    validation_data=(x_test, x_test))     # <-- worked with (x_test, None) before

For some strange reason, I had to explicitly specify y and y_test while fitting the model. Originally, I didn't need to do this. The produced samples seem reasonable to me.

Although I could resolve this, I still don't know what the differences and disadvantages of these two methods are (other than needing a different syntax). Can someone give me more insight?

Hadden answered 27/4, 2018 at 13:35 Comment(2)

Since I struggled a bit with this - my version of Keras refused to compile without specifying a loss, and the solution apparently was to add loss=None to the compile() statement. – Buckra 20/8, 2019 at 10:36

The link to the original code is broken. I think this is where the original code came. – Jarboe 9/11, 2021 at 15:21

I'll try to answer the original question of why model.add_loss() is being used instead of specifying a custom loss function to model.compile(loss=...).

All loss functions in Keras always take two parameters y_true and y_pred. Have a look at the definition of the various standard loss functions available in Keras, they all have these two parameters. They are the 'targets' (the Y variable in many textbooks) and the actual output of the model. Most standard loss functions can be written as an expression of these two tensors. But some more complex losses cannot be written in that way. For your VAE example this is the case because the loss function also depends on additional tensors, namely z_log_var and z_mean, which are not available to the loss functions. Using model.add_loss() has no such restriction and allows you to write much more complex losses that depend on many other tensors, but it has the inconvenience of being more dependent on the model, whereas the standard loss functions work with just any model.

(Note: The code proposed in other answers here are somewhat cheating in as much as they just use global variables to sneak in the additional required dependencies. This makes the loss function not a true function in the mathematical sense. I consider this to be much less clean code and I expect it to be more error-prone.)

Cola answered 6/10, 2018 at 21:32 Comment(4)

An even more model-dependent template for loss can be found in the image_ocr example. Here a loss function is wrapped in a lambda loss layer, an extra model is instantiated with the loss_layer as output using extra inputs to the loss calculation and this model is compiled with a dummy lambda loss function that just returns as loss the output of the model. All the while, the data generator produces dummy y samples for the loss. – Bawbee 17/9, 2019 at 3:8

But if set z_log_var and z_mean as variable that can be accessed by the costumed loss function, then does add_loss is same with model.compile(loss=...)? – Geomorphic 29/6, 2020 at 3:59

@Geomorphic Yes it would result in the same. It's just not as clean since the loss function is dependent on the model. – Cola 29/6, 2020 at 7:37

In more complex models, is there a way to use both model.add_loss() for 1 loss that needs internal tensors (e.g., for the KL-divergence here) and model.compile(loss=...) for 1 loss that needs the user to pass in target y_true? For example, in a modified VAE where the latent code z is also used to regress against a target. – Felic 10/12, 2020 at 17:59

JIH's answer is right of course but maybe it is useful to add:

model.add_loss() has no restrictions, but it also removes the comfort of using for example targets in the model.fit().

If you have a loss that depends on additional parameters of the model, of other models or external variables, you can still use a Keras type encapsulated loss function by having an encapsulating function where you pass all the additional parameters:

def loss_carrier(extra_param1, extra_param2):
    def loss(y_true, y_pred):
        #x = complicated math involving extra_param1, extraparam2, y_true, y_pred
        #remember to use tensor objects, so for example keras.sum, keras.square, keras.mean
        #also remember that if extra_param1, extra_maram2 are variable tensors instead of simple floats,
        #you need to have them defined as inputs=(main,extra_param1, extraparam2) in your keras.model instantiation.
        #and have them defind as keras.Input or tf.placeholder with the right shape.
        return x
    return loss

model.compile(optimizer='adam', loss=loss_carrier)

The trick is the last row where you return a function as Keras expects them with just two parameters y_true and y_pred.

Possibly looks more complicated than the model.add_loss version, but the loss stays modular.

Phia answered 9/6, 2019 at 12:18 Comment(3)

But how do you pass the parameters extra_param1 and extra_param2? Can you provide a complete and working example that can be executed? – Bolme 7/11, 2019 at 20:23

This example is actually wrong. You call the loss function in compile like model.compile(optimizer='adam', loss=loss_carrier(1.0, 2.0)). You can also pass layer or intermediate tensors, i.e. for vaes. However you also need to set experimental_run_tf_function=False in compile. However this method is no longer working in tf2.2 when passing tf.Tensor to this kind of wrapper loss functions. – Stucker 3/7, 2020 at 8:45

Not working in tf2.4, throw this error: "Cannot convert a symbolic Keras input/output to a numpy array". model.add_loss version works. – Gonfalon 28/2, 2021 at 15:59

I was also wondering about the same query and some related stuff like how to add loss function within the intermediate layers. Here I'm sharing some of the observed information, hope it may help others. It's true that standard keras loss functions only take two arguments, y_true and y_pred. But during the experiment, there can some cases where we need some external parameter or coefficient while computing with these two values (y_true, y_pred). This can be needed at the last layer as usual or somewhere in the middle of the model's layer.

`model.add_loss()`

The accepted answer correctly said about the model.add_loss() functions. It potentially depends on the layer inputs (tensor). According to the official doc, when writing the call method of a custom layer or a subclassed model, we may want to compute scalar quantities that we want to minimize during training (e.g. regularization losses). We can use the add_loss() layer method to keep track of such loss terms. For instance, activity regularization losses dependent on the inputs passed when calling a layer. Here's an example of a layer that adds a sparsity regularization loss based on the L2 norm of the inputs:

from tensorflow.keras.layers import Layer

class MyActivityRegularizer(Layer):
  """Layer that creates an activity sparsity regularization loss."""

  def __init__(self, rate=1e-2):
    super(MyActivityRegularizer, self).__init__()
    self.rate = rate

  def call(self, inputs):
    # We use `add_loss` to create a regularization loss
    # that depends on the inputs.
    self.add_loss(self.rate * tf.reduce_sum(tf.square(inputs)))
    return inputs

Loss values added via add_loss can be retrieved in the .losses list property of any Layer or Model (they are recursively retrieved from every underlying layer):

from tensorflow.keras import layers

class SparseMLP(Layer):
  """Stack of Linear layers with a sparsity regularization loss."""

  def __init__(self, output_dim):
      super(SparseMLP, self).__init__()
      self.dense_1 = layers.Dense(32, activation=tf.nn.relu)
      self.regularization = MyActivityRegularizer(1e-2)
      self.dense_2 = layers.Dense(output_dim)

  def call(self, inputs):
      x = self.dense_1(inputs)
      x = self.regularization(x)
      return self.dense_2(x)


mlp = SparseMLP(1)
y = mlp(tf.ones((10, 10)))

print(mlp.losses)  # List containing one float32 scalar

Also note, when using model.fit(), such loss terms are handled automatically. When writing a custom training loop, we should retrieve these terms by hand from model.losses, like this:

loss_fn = tf.keras.losses.CategoricalCrossentropy(from_logits=True)
optimizer = tf.keras.optimizers.Adam()

# Iterate over the batches of a dataset.
for x, y in dataset:
    with tf.GradientTape() as tape:
        # Forward pass.
        logits = model(x)
        # Loss value for this batch.
        loss_value = loss_fn(y, logits)
        # Add extra loss terms to the loss value.
        loss_value += sum(model.losses) # < ------------- HERE ---------

    # Update the weights of the model to minimize the loss value.
    gradients = tape.gradient(loss_value, model.trainable_weights)
    optimizer.apply_gradients(zip(gradients, model.trainable_weights))

`Custom losses`

With model.add_loss(), (AFAIK), we can use it somewhere in the middle of the network. Here we no longer bound with only two parameters i.e. y_true, y_pred. But what if we also want to impute external parameter or coefficient to the last layer loss functions of the network. Nric answer is correct. But it can also be implemented by subclassing the tf.keras.losses.Loss class by implementing the following two methods:

__init__(self): accept parameters to pass during the call of your loss function
call(self, y_true, y_pred): use the targets (y_true) and the model predictions (y_pred) to compute the model's loss

Here is an example of a custom MSE by subclassing the tf.keras.losses.Loss class. And here we also no longer bound only two parameters i.e. y_ture, y_pred.

class CustomMSE(keras.losses.Loss):
    def __init__(self, regularization_factor=0.1, name="custom_mse"):
        super().__init__(name=name)
        self.regularization_factor = regularization_factor

    def call(self, y_true, y_pred):
        mse = tf.math.reduce_mean(tf.square(y_true - y_pred))
        reg = tf.math.reduce_mean(tf.square(0.5 - y_pred))
        return mse + reg * self.regularization_factor

model.compile(optimizer=..., loss=CustomMSE())

Everetteverette answered 5/3, 2021 at 4:0 Comment(4)

Your answer is really incredible. Thank you. – Priestess 18/10, 2021 at 7:26

Thank you @M.Innat ! This was extremely helpful! Your final example is just what I needed but had not been able to find. (Suggestion: for completeness, perhaps add from tensorflow.keras.losses import Loss and then use class CustomMSE(Loss):) – Isaiasisak 10/12, 2021 at 0:16

@Everetteverette But what if you need both things: the access to y_true & y_pred like in a standard loss in compile(loss=...), and the access to previous layers, say, inputs of a model - which you have when using add_loss(), but then you lose the access to y_true (!?) – Pencel 26/1, 2023 at 11:52

Both can be used, I think. If you have posted question to look at, that would be great. See this doc, in the last sections, it has end-to-end example. – Everetteverette 26/1, 2023 at 12:40

Try this:

import pandas as pd
import numpy as np
import pickle
import matplotlib.pyplot as plt
from scipy import stats
import tensorflow as tf
import seaborn as sns
from pylab import rcParams
from sklearn.model_selection import train_test_split
from keras.models import Model, load_model, Sequential
from keras.layers import Input, Lambda, Dense, Dropout, Layer, Bidirectional, Embedding, Lambda, LSTM, RepeatVector, TimeDistributed, BatchNormalization, Activation, Merge
from keras.callbacks import ModelCheckpoint, TensorBoard
from keras import regularizers
from keras import backend as K
from keras import metrics
from scipy.stats import norm
from keras.utils import to_categorical
from keras import initializers
bias = bias_initializer='zeros'

from keras import objectives




np.random.seed(22)



data1 = np.array([0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,
       1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0], dtype='int32')

data2 = np.array([1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,
       1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0], dtype='int32')


data3 = np.array([0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,
       1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0], dtype='int32')

#train = np.zeros(shape=(992,54))
#test = np.zeros(shape=(921,54))

train = np.zeros(shape=(300,54))
test = np.zeros(shape=(300,54))

for n, i in enumerate(train):
    if (n<=100):
        train[n] = data1
    elif (n>100 and n<=200):
        train[n] = data2
    elif(n>200):
        train[n] = data3


for n, i in enumerate(test):
    if (n<=100):
        test[n] = data1
    elif(n>100 and n<=200):
        test[n] = data2
    elif(n>200):
        test[n] = data3


batch_size = 5
original_dim = train.shape[1]

intermediate_dim45 = 45
intermediate_dim35 = 35
intermediate_dim25 = 25
intermediate_dim15 = 15
intermediate_dim10 = 10
intermediate_dim5 = 5
latent_dim = 3
epochs = 50
epsilon_std = 1.0

def sampling(args):
    z_mean, z_log_var = args
    epsilon = K.random_normal(shape=(K.shape(z_mean)[0], latent_dim), mean=0.,
                              stddev=epsilon_std)
    return z_mean + K.exp(z_log_var / 2) * epsilon

x = Input(shape=(original_dim,), name = 'first_input_mario')

h1 = Dense(intermediate_dim45, activation='relu', name='h1')(x)
hD = Dropout(0.5)(h1)
h2 = Dense(intermediate_dim25, activation='relu', name='h2')(hD)
h3 = Dense(intermediate_dim10, activation='relu', name='h3')(h2)
h = Dense(intermediate_dim5, activation='relu', name='h')(h3) #bilo je relu
h = Dropout(0.1)(h)

z_mean = Dense(latent_dim, activation='relu')(h)
z_log_var = Dense(latent_dim, activation='relu')(h)

z = Lambda(sampling, output_shape=(latent_dim,))([z_mean, z_log_var])

decoder_h = Dense(latent_dim, activation='relu')
decoder_h1 = Dense(intermediate_dim5, activation='relu')
decoder_h2 = Dense(intermediate_dim10, activation='relu')
decoder_h3 = Dense(intermediate_dim25, activation='relu')
decoder_h4 = Dense(intermediate_dim45, activation='relu')

decoder_mean = Dense(original_dim, activation='sigmoid')


h_decoded = decoder_h(z)
h_decoded1 = decoder_h1(h_decoded)
h_decoded2 = decoder_h2(h_decoded1)
h_decoded3 = decoder_h3(h_decoded2)
h_decoded4 = decoder_h4(h_decoded3)

x_decoded_mean = decoder_mean(h_decoded4)

vae = Model(x, x_decoded_mean)


def vae_loss(x, x_decoded_mean):
    xent_loss = objectives.binary_crossentropy(x, x_decoded_mean)
    kl_loss = -0.5 * K.mean(1 + z_log_var - K.square(z_mean) - K.exp(z_log_var))
    loss = xent_loss + kl_loss
    return loss

vae.compile(optimizer='rmsprop', loss=vae_loss)

vae.fit(train, train, batch_size = batch_size, epochs=epochs, shuffle=True,
        validation_data=(test, test))


vae = Model(x, x_decoded_mean)

encoder = Model(x, z_mean)

decoder_input = Input(shape=(latent_dim,))

_h_decoded = decoder_h  (decoder_input)
_h_decoded1 = decoder_h1  (_h_decoded)
_h_decoded2 = decoder_h2  (_h_decoded1)
_h_decoded3 = decoder_h3  (_h_decoded2)
_h_decoded4 = decoder_h4  (_h_decoded3)

_x_decoded_mean = decoder_mean(_h_decoded4)
generator = Model(decoder_input, _x_decoded_mean)
generator.summary()

Helli answered 3/5, 2018 at 8:29 Comment(3)

Thanks, but unfortunately your script doesn't work. You don't seem to define X_train. Please edit your example so I can run it as a standalone script. – Hadden 8/5, 2018 at 11:47

I edited the code and tried in jupyter notebook, python 3. Now it is working. – Helli 8/5, 2018 at 18:0

Thanks for the update. It runs on my machine now, but unfortunately, the autoencoder does not seem to encode the digits in a meaningful way. When I sample from the learned distribution, ALL "digits" look like a mix of all digits stacked on top of each other and very similar. However, thanks to your effort, I was able to figure out the probable cause of the problem. See question edit. – Hadden 14/5, 2018 at 9:58

-3

You need to change the compile row to

vae.compile(optimizer='rmsprop', loss=vae_loss)

Helli answered 29/4, 2018 at 12:7 Comment(6)

I already mentioned that it doesn't work. Thanks for participating, though. – Hadden 30/4, 2018 at 9:33

'vae.compile(optimizer='rmsprop', loss=vae_loss)' without vae.add... or 'vae.add(vae_loss) vae.compile(optimizer='rmsprop', loss=None)' – Helli 30/4, 2018 at 14:26

For my tests, I had already removed vae.add_loss(vae_loss) and just specified the loss during the compile operation. It throws an TypeError. I edited the error into my question. – Hadden 3/5, 2018 at 7:19

def vae_loss(x, x_decoded_mean):     xent_loss = objectives.binary_crossentropy(x, x_decoded_mean)     kl_loss = -0.5 * K.mean(1 + z_log_var - K.square(z_mean) - K.exp(z_log_var))     loss = xent_loss + kl_loss     return loss

and then vae.compile(optimizer='rmsprop', loss=vae_loss) – Helli 3/5, 2018 at 8:0

I have also tried this, but defining the custom loss this way throws another error: AttributeError: 'NoneType' object has no attribute 'shape'. I am currently researching on how to implement custom loss functions. This has already been discussed here. Unfortunately, it gives me no insight what the difference between the two methods are. – Hadden 3/5, 2018 at 8:10

I will paste the code which definitely work in my jupyter notebook. Just check the python version. I suppose to be 3. Maybe some imports are not neccessary. – Helli 3/5, 2018 at 8:28

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Edit 1

Edit 2

model.add_loss()

Custom losses

Recommended topics

Hot tags

`model.add_loss()`

`Custom losses`