Change loss function dynamically during training in Keras, without recompiling other model properties like optimizer
Asked Answered
S

1

20

Is it possible to set model.loss in a callback without re-compiling model.compile(...) after (since then the optimizer states are reset), and just recompiling model.loss, like for example:

class NewCallback(Callback):

        def __init__(self):
            super(NewCallback,self).__init__()

        def on_epoch_end(self, epoch, logs={}):
            self.model.loss=[loss_wrapper(t_change, current_epoch=epoch)]
            self.model.compile_only_loss() # is there a version or hack of 
                                           # model.compile(...) like this?

To expand more with previous examples on stackoverflow:

To achieve a loss function which depends on the epoch number, like (as in this stackoverflow question):

def loss_wrapper(t_change, current_epoch):
    def custom_loss(y_true, y_pred):
        c_epoch = K.get_value(current_epoch)
        if c_epoch < t_change:
            # compute loss_1
        else:
            # compute loss_2
    return custom_loss

where "current_epoch" is a Keras variable updated with a callback:

current_epoch = K.variable(0.)
model.compile(optimizer=opt, loss=loss_wrapper(5, current_epoch), 
metrics=...)

class NewCallback(Callback):
    def __init__(self, current_epoch):
        self.current_epoch = current_epoch

    def on_epoch_end(self, epoch, logs={}):
        K.set_value(self.current_epoch, epoch)

One can essentially turn python code into compositions of backend functions for the loss to work as follows:

def loss_wrapper(t_change, current_epoch):
    def custom_loss(y_true, y_pred):
        # compute loss_1 and loss_2
        bool_case_1=K.less(current_epoch,t_change)
        num_case_1=K.cast(bool_case_1,"float32")
        loss = (num_case_1)*loss_1 + (1-num_case_1)*loss_2
        return loss
    return custom_loss
it works.

I am not satisfied with these hacks, and wonder, is it possible to set model.loss in a callback without re-compiling model.compile(...) after (since then the optimizer states are reset), and just recompiling model.loss?

Soilasoilage answered 4/5, 2019 at 3:22 Comment(4)
Did you solve this? Do you need to keep the whole optimizer state or just weights? If just weights, perhaps save them, recompile, then load them. There's Model.load_weights(..., by_name=True) to load into a different model to what they were saved from. There's also saving/loading whole state like #49504248 but I'm not sure if it allows you to change the architecture at all.Unbreathed
Did you find any solutions to this ? I have exactly the same problem.Benitez
I think using dynamic computational graph or eager execution mode with tf 2.0 will solve this issue eager executionAllhallowtide
I don't find it too hacky to have a single loss function cased out by epoch, per your last approach. You can also use model.add_loss() to do a similar thing without using a wrapper.Dorladorlisa
S
1

I hope you have found a solution to your problem by now but using tensorflow I think you can solve this by building a custom training loop (here is the doc). this will not override the loss attribute as you requested however you can probably achieve what you are looking for.

example

initializing variable

modifying the example from the documentation, with a model and dataset as such:

inputs = tf.keras.Input(shape=(784,), name="digits")
x1 = tf.keras.layers.Dense(64, activation="relu")(inputs)
x2 = tf.keras.layers.Dense(64, activation="relu")(x1)
outputs = tf.keras.layers.Dense(10, name="predictions")(x2)
model = tf.keras.Model(inputs=inputs, outputs=outputs)


# Prepare the training dataset.
batch_size = 64
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
x_train = np.reshape(x_train, (-1, 784))
x_test = np.reshape(x_test, (-1, 784))
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(batch_size)

we can define our two loss functions (the two I chose make no sense from a scientific point of view but allow us to check the code works)

# Instantiate an optimizer.
optimizer = tf.keras.optimizers.Adam(learning_rate=1e-3)
# Instantiate a loss function.
loss_1 = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
loss_2 = lambda y_true, y_pred: -1 * loss_1(y_true, y_pred)

training loop

we can then execute our custom training loop:

epochs = 10
for epoch in range(epochs):
    print("\nStart of epoch %d" % (epoch,))

    # Iterate over the batches of the dataset.
    for step, (x_batch_train, y_batch_train) in enumerate(train_dataset):

        # Open a GradientTape to record the operations run
        # during the forward pass, which enables auto-differentiation.
        loss_fn = loss_1 if epoch % 2 else loss_2
        with tf.GradientTape() as tape:

            # Run the forward pass of the layer.
            # The operations that the layer applies
            # to its inputs are going to be recorded
            # on the GradientTape.
            logits = model(x_batch_train, training=True)  # Logits for this minibatch

            # Compute the loss value for this minibatch.
            loss_value = loss_fn(y_batch_train, logits)

        # Use the gradient tape to automatically retrieve
        # the gradients of the trainable variables with respect to the loss.
        grads = tape.gradient(loss_value, model.trainable_weights)

        # Run one step of gradient descent by updating
        # the value of the variables to minimize the loss.
        optimizer.apply_gradients(zip(grads, model.trainable_weights))
         # Log every 200 batches.
        if step % 200 == 0:
            print(
                "Training loss (for one batch) at step %d: %.4f"
                % (step, float(loss_value))
            )
            print("Seen so far: %s samples" % ((step + 1) * 64))

and we check the output is what we want (alternate positive and negative losses)

Start of epoch 0
Training loss (for one batch) at step 0: -96.1003
Seen so far: 64 samples
Training loss (for one batch) at step 200: -3383849.5000
Seen so far: 12864 samples
Training loss (for one batch) at step 400: -40419124.0000
Seen so far: 25664 samples
Training loss (for one batch) at step 600: -149133008.0000
Seen so far: 38464 samples
Training loss (for one batch) at step 800: -328322816.0000
Seen so far: 51264 samples

Start of epoch 1
Training loss (for one batch) at step 0: 580457984.0000
Seen so far: 64 samples
Training loss (for one batch) at step 200: 297710528.0000
Seen so far: 12864 samples
Training loss (for one batch) at step 400: 213328544.0000
Seen so far: 25664 samples
Training loss (for one batch) at step 600: 159328976.0000
Seen so far: 38464 samples
Training loss (for one batch) at step 800: 105737024.0000
Seen so far: 51264 samples

drawbacks and further improvments

the problem with writing custom loops as such is that you will loose the convenience of keras's fit method. I think you can manage this by defining a custom model and overriding the train_step as shown here in the documentation

If you really need to have the loss attribute of your model changed, you can set the compiled_loss attribute using a keras.engine.compile_utils.LossesContainer (here is the reference) and set model.train_function to model.make_train_function() (so that the new loss gets taken into account).

Sansculotte answered 29/11, 2021 at 20:22 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.