Save and load model optimizer state
Asked Answered
A

7

26

I have a set of fairly complicated models that I am training and I am looking for a way to save and load the model optimizer states. The "trainer models" consist of different combinations of several other "weight models", of which some have shared weights, some have frozen weights depending on the trainer, etc. It is a bit too complicated of an example to share, but in short, I am not able to use model.save('model_file.h5') and keras.models.load_model('model_file.h5') when stopping and starting my training.

Using model.load_weights('weight_file.h5') works fine for testing my model if the training has finished, but if I attempt to continue training the model using this method, the loss does not come even close to returning to its last location. I have read that this is because the optimizer state is not saved using this method which makes sense. However, I need a method for saving and loading the states of the optimizers of my trainer models. It seems as though keras once had a model.optimizer.get_sate() and model.optimizer.set_sate() that would accomplish what I am after, but that does not seem to be the case anymore (at least for the Adam optimizer). Are there any other solutions with the current Keras?

Abandon answered 27/3, 2018 at 3:6 Comment(4)
Will obtaining the states using model.optimizer.get_config(), saving this dictionary, and then setting each of these values to the trainer model optimizers before retraining accomplish this?Abandon
Not likely. get_config() only gets properties like lr, decay, etc. The internal weights would not be returned by it.Couscous
I can't see get_sate() on keras.__version__ 2.1.6 and also in master github.com/keras-team/keras/blob/… Looks like they were removed github.com/keras-team/keras/pull/437Gilliam
As of tensorflow 2.5, if you set the optimizer of a keras model with model.compile, then model.save_weights and model.load_weights seem to preserve the optimizer state with no problem.Stemware
C
38

You can extract the important lines from the load_model and save_model functions.

For saving optimizer states, in save_model:

# Save optimizer weights.
symbolic_weights = getattr(model.optimizer, 'weights')
if symbolic_weights:
    optimizer_weights_group = f.create_group('optimizer_weights')
    weight_values = K.batch_get_value(symbolic_weights)

For loading optimizer states, in load_model:

# Set optimizer weights.
if 'optimizer_weights' in f:
    # Build train function (to get weight updates).
    if isinstance(model, Sequential):
        model.model._make_train_function()
    else:
        model._make_train_function()

    # ...

    try:
        model.optimizer.set_weights(optimizer_weight_values)

Combining the lines above, here's an example:

  1. First fit the model for 5 epochs.
X, y = np.random.rand(100, 50), np.random.randint(2, size=100)
x = Input((50,))
out = Dense(1, activation='sigmoid')(x)
model = Model(x, out)
model.compile(optimizer='adam', loss='binary_crossentropy')
model.fit(X, y, epochs=5)

Epoch 1/5
100/100 [==============================] - 0s 4ms/step - loss: 0.7716
Epoch 2/5
100/100 [==============================] - 0s 64us/step - loss: 0.7678
Epoch 3/5
100/100 [==============================] - 0s 82us/step - loss: 0.7665
Epoch 4/5
100/100 [==============================] - 0s 56us/step - loss: 0.7647
Epoch 5/5
100/100 [==============================] - 0s 76us/step - loss: 0.7638
  1. Now save the weights and optimizer states.
model.save_weights('weights.h5')
symbolic_weights = getattr(model.optimizer, 'weights')
weight_values = K.batch_get_value(symbolic_weights)
with open('optimizer.pkl', 'wb') as f:
    pickle.dump(weight_values, f)
  1. Rebuild the model in another python session, and load weights.
x = Input((50,))
out = Dense(1, activation='sigmoid')(x)
model = Model(x, out)
model.compile(optimizer='adam', loss='binary_crossentropy')

model.load_weights('weights.h5')
model._make_train_function()
with open('optimizer.pkl', 'rb') as f:
    weight_values = pickle.load(f)
model.optimizer.set_weights(weight_values)
  1. Continue model training.
model.fit(X, y, epochs=5)

Epoch 1/5
100/100 [==============================] - 0s 674us/step - loss: 0.7629
Epoch 2/5
100/100 [==============================] - 0s 49us/step - loss: 0.7617
Epoch 3/5
100/100 [==============================] - 0s 49us/step - loss: 0.7611
Epoch 4/5
100/100 [==============================] - 0s 55us/step - loss: 0.7601
Epoch 5/5
100/100 [==============================] - 0s 49us/step - loss: 0.7594
Couscous answered 27/3, 2018 at 4:29 Comment(8)
I believe this appears to be working, at least the loss is not blowing up as it was before. Now it seems to start a bit higher than where it left off an descend back down a bit faster. Thanks @Yu-Yang. I ended up using the save_model and load_model functions and just removed the saving and loading of weightsAbandon
What is K here? import keras.backend as K?Umbles
@Umbles Yes, it's the Keras backend module.Couscous
What is optimizer weights ?Gilliam
What is here model._make_train_function()? Because I get as an error: "AttributeError: 'Model' object has no attribute '_make_train_function'"Casia
@Couscous - following up on @DvD_95's comment. I think _make_train_function no longer exists (at least in TF2.3). That said there is model.make_train_function() (without the underscore). But when I use this on an Adam Optimizer I get: ValueError: You called set_weights(weights) on optimizer Adam with a weight list of length 255, but the optimizer was expecting 0 weights. I checked the src code and it does seem like set_weights should work. Any thoughts on this?Inextricable
@Inextricable have you solved this issue? I have the same problem as you have.Modernistic
TF2 more and more becoming CONFUSING themselves, buggy, terrible documentation . I will switch to Pytorch soon ! This is wasting time and energy. Why would they have to make thing complicated while Keras was so beautifully simple ?Heterochromatic
R
18

For those who are not using model.compile and instead performing automatic differentiation to apply the gradients manually with optimizer.apply_gradients, I think I have a solution.

First, save the optimizer weights: np.save(path, optimizer.get_weights())

Then, when you are ready to reload the optimizer, show the newly instantiated optimizer the size of the weights it will update by calling optimizer.apply_gradients on a list of tensors of the size of the variables for which you calculate gradients. It is extremely important to then set the weights of the model AFTER you set the weights of the optimizer because momentum-based optimizers like Adam will update the weights of the model even if we give it gradients which are zero.

import tensorflow as tf
import numpy as np

model = # instantiate model (functional or subclass of tf.keras.Model)

# Get saved weights
opt_weights = np.load('/path/to/saved/opt/weights.npy', allow_pickle=True)

grad_vars = model.trainable_weights
# This need not be model.trainable_weights; it must be a correctly-ordered list of 
# grad_vars corresponding to how you usually call the optimizer.

optimizer = tf.keras.optimizers.Adam(lrate)

zero_grads = [tf.zeros_like(w) for w in grad_vars]

# Apply gradients which don't do nothing with Adam
optimizer.apply_gradients(zip(zero_grads, grad_vars))

# Set the weights of the optimizer
optimizer.set_weights(opt_weights)

# NOW set the trainable weights of the model
model_weights = np.load('/path/to/saved/model/weights.npy', allow_pickle=True)
model.set_weights(model_weights)

Note that if we try to set the weights before calling apply_gradients for the first time, an error is thrown that the optimizer expects a weight list of length zero.

Regazzi answered 25/7, 2020 at 13:57 Comment(6)
This was helpful and saved me many hours of re-training, thanks!Lawful
Yes, it should work for any optimizer, but it only makes sense to use it for optimizers who have weights which depend on the size of the variables being calculatedRegazzi
I btw found a solution to avoid apply_gradients and zero_grads calculation. The solution is to apply the optimizer._create_all_weights(model.trainable_variables) inside with tf.name_scope(optimizer._name): and with tf.init_scope():. The solution can be found in the source code of the apply_gradients() method. See source at line 516-519.Sybilsybila
Works flawlessly :) Thank you !Heterochromatic
BEWARE: this does NOT work with TF2 multi GPU 2.4.1 !!! Any idea please ?Heterochromatic
optimizer.get_weights() is no longer accessible in version 2.11Bookrack
B
4

From version 2.11 optimizer.get_weights() is no longer accessible. You can eventually switch to tf.optimizers.legacy classes but it is not recommended.

Instead, The class tf.train.Checkpoint is specially designed for saving both model and optimizer weights:

checkpoint = tf.train.Checkpoint(model=model,optim=optim)
checkpoint.save(path='saved_model/ckpt-1')
...
checkpoint.restore(path='saved_model/ckpt-1')

Finally, then class tf.train.CheckpointManager manages multiple checkpoint versions and make it very easy:

checkpoint = tf.train.Checkpoint(model=model,optim=optim)
checkpoint_manager = tf.train.CheckpointManager(checkpoint, 'saved_model', max_to_keep = 5)
checkpoint_manager.restore_or_initialize()
...
checkpoint_manager.save()
Bookrack answered 10/2, 2023 at 1:13 Comment(0)
R
3

Completing Alex Trevithick answer, it is possible to avoid re calling model.set_weights, simply by saving the state of the variables before applying the gradient and then reloading. This can useful when loading a model from an h5 file, and looks cleaner (imo).

The saving/loading functions are the following (thanks Alex again):

def save_optimizer_state(optimizer, save_path, save_name):
    '''
    Save keras.optimizers object state.

    Arguments:
    optimizer --- Optimizer object.
    save_path --- Path to save location.
    save_name --- Name of the .npy file to be created.

    '''

    # Create folder if it does not exists
    if not os.path.exists(save_path):
        os.makedirs(save_path)
    
    # save weights
    np.save(os.path.join(save_path, save_name), optimizer.get_weights())

    return

def load_optimizer_state(optimizer, load_path, load_name, model_train_vars):
    '''
    Loads keras.optimizers object state.

    Arguments:
    optimizer --- Optimizer object to be loaded.
    load_path --- Path to save location.
    load_name --- Name of the .npy file to be read.
    model_train_vars --- List of model variables (obtained using Model.trainable_variables)

    '''

    # Load optimizer weights
    opt_weights = np.load(os.path.join(load_path, load_name)+'.npy', allow_pickle=True)

    # dummy zero gradients
    zero_grads = [tf.zeros_like(w) for w in model_train_vars]
    # save current state of variables
    saved_vars = [tf.identity(w) for w in model_train_vars]

    # Apply gradients which don't do nothing with Adam
    optimizer.apply_gradients(zip(zero_grads, model_train_vars))

    # Reload variables
    [x.assign(y) for x,y in zip(model_train_vars, saved_vars)]

    # Set the weights of the optimizer
    optimizer.set_weights(opt_weights)


    return
Relume answered 3/11, 2020 at 21:47 Comment(0)
H
2

upgrading Keras to 2.2.4 and using pickle solved this issue for me. with keras release 2.2.3 Keras models can now be safely pickled.

Hecker answered 7/10, 2018 at 18:48 Comment(0)
T
2

Anyone trying to use @Yu-Yang's solution in a distributed setting might run in the following error:


ValueError: Trying to create optimizer slot variable under the scope for tf.distribute.Strategy (<tensorflow.python.distribute.distribute_lib._DefaultDistributionStrategy object at 0x7fdf357726d8>), which is different from the scope used for the original variable (MirroredVariable:{
  0: <tf.Variable 'conv2d_1/kernel:0' shape=(1, 1, 1, 1) dtype=float32, numpy=array([[[[-0.9592359]]]], dtype=float32)>
}). Make sure the slot variables are created under the same strategy scope. This may happen if you're restoring from a checkpoint outside the scope

or similar.

To solve this problem, you simply need to run the model's optimizer weights setting on each replica using the following:

import tensorflow as tf

strat = tf.distribute.MirroredStrategy()

with strat.scope():
    model = tf.keras.models.Sequential([tf.keras.layers.Conv2D(1, 1, padding='same')])
    model.compile(optimizer='adam', loss='mse')
    model(tf.random.normal([1, 16, 16, 1]))

    model.load_weights('model_weights.hdf5')

def model_weight_setting():
    grad_vars = model.trainable_weights
    zero_grads = [tf.zeros_like(w) for w in grad_vars]
    model.optimizer.apply_gradients(zip(zero_grads, grad_vars))
    with open('optimizer.pkl', 'rb') as f:
        weight_values = pickle.load(f)
    model.optimizer.set_weights(weight_values)

strat.run(model_weight_setting)

For some reason, this isn't needed for setting the model weights, but make sure that you create (via the call here) and load the weights of the model within the strategy scope or you might get an error along the lines of ValueError: Trying to create optimizer slot variable under the scope for tf.distribute.Strategy (<tensorflow.python.distribute.collective_all_reduce_strategy.CollectiveAllReduceStrategy object at 0x14ffdce82c50>), which is different from the scope used for the original variable.

If you want the full-on example, I created a colab showcasing this solution.

Trotline answered 18/2, 2021 at 13:47 Comment(0)
P
0

The code below works for me (Tensorflow 2.5).
I'm using the universal sentence encoder as model, together with an Adam optimizer.

Basically what I do is: I make use of a dummy input which sets the optimizer correctly.
Afterwards I set the weights.

Save the weights of the optimizer

np.save(f'{path}/optimizer.npy', optimizer.get_weights())

load the optimizer

# Load an optimizer
optimizer = tf.keras.optimizers.Adam()

# Load the optimizer weights
opt_weights = np.load(f'{path}/optimizer.npy', allow_pickle=True)

# Train a dummy record
# I'm using the universal sentence encoder which requires a string as input
with tf.GradientTape() as tape:
    # preduct a dummy record
    tmp = model('')
    # create a dummy loss
    loss = tf.reduce_mean((tmp - tmp)**2)

# calculate the gradiens and add the gradients
# the gradients should be near 0
gradients = tape.gradient(loss, model.trainable_variables)
optimizer.apply_gradients(zip(gradients, model.trainable_variables))

# set the weights
optimizer.set_weights(opt_weights)
Pastiness answered 22/7, 2021 at 20:0 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.