tensorflow:Can save best model only with val_acc available, skipping
Asked Answered
S

11

24

I have an issue with tf.callbacks.ModelChekpoint. As you can see in my log file, the warning comes always before the last iteration where the val_acc is calculated. Therefore, Modelcheckpoint never finds the val_acc

Epoch 1/30
1/8 [==>...........................] - ETA: 19s - loss: 1.4174 - accuracy: 0.3000
2/8 [======>.......................] - ETA: 8s - loss: 1.3363 - accuracy: 0.3500 
3/8 [==========>...................] - ETA: 4s - loss: 1.3994 - accuracy: 0.2667
4/8 [==============>...............] - ETA: 3s - loss: 1.3527 - accuracy: 0.3250
6/8 [=====================>........] - ETA: 1s - loss: 1.3042 - accuracy: 0.3333
WARNING:tensorflow:Can save best model only with val_acc available, skipping.
8/8 [==============================] - 4s 482ms/step - loss: 1.2846 - accuracy: 0.3375 - val_loss: 1.3512 - val_accuracy: 0.5000

Epoch 2/30
1/8 [==>...........................] - ETA: 0s - loss: 1.0098 - accuracy: 0.5000
3/8 [==========>...................] - ETA: 0s - loss: 0.8916 - accuracy: 0.5333
5/8 [=================>............] - ETA: 0s - loss: 0.9533 - accuracy: 0.5600
6/8 [=====================>........] - ETA: 0s - loss: 0.9523 - accuracy: 0.5667
7/8 [=========================>....] - ETA: 0s - loss: 0.9377 - accuracy: 0.5714
WARNING:tensorflow:Can save best model only with val_acc available, skipping.
8/8 [==============================] - 1s 98ms/step - loss: 0.9229 - accuracy: 0.5750 - val_loss: 1.2507 - val_accuracy: 0.5000

This is my code for training the CNN.

callbacks = [
        TensorBoard(log_dir=r'C:\Users\reda\Desktop\logs\{}'.format(Name),
                    histogram_freq=1),
        ModelCheckpoint(filepath=r"C:\Users\reda\Desktop\checkpoints\{}".format(Name), monitor='val_acc',
                        verbose=2, save_best_only=True, mode='max')]
history = model.fit_generator(
        train_data_gen, 
        steps_per_epoch=total_train // batch_size,
        epochs=epochs,
        validation_data=val_data_gen,
        validation_steps=total_val // batch_size,
        callbacks=callbacks)
Styptic answered 29/4, 2020 at 15:38 Comment(2)
Change monitor value from 'val_acc' to 'val_accuracy'.Dalessio
is this for Tensorflow 2.x? In tensorflow 1.x, it also has this issue, and changing to 'val_accuracy' won't solve the issue.Ichneumon
H
21

I know how frustrating these things can be sometimes..but tensorflow requires that you explicitly write out the name of metric you are wanting to calculate

You will need to actually say 'val_accuracy'

metric = 'val_accuracy'
ModelCheckpoint(filepath=r"C:\Users\reda.elhail\Desktop\checkpoints\{}".format(Name), monitor=metric,
                    verbose=2, save_best_only=True, mode='max')]

Hope this helps =)

*** As later noted by BlueTurtle (Give their answer a thumbs up please, likely still beneath this) you also need to use the full metric name to match your model.compile, ModelCheckpoint, and EarlyStopping.

Heigho answered 29/4, 2020 at 19:49 Comment(3)
Yes it did. Thank you!Styptic
@Brain_Mark_Anderson, I tried your steps but still the issue remains same filepath = "/content/table_net.h5" model_checkpoint = tf.keras.callbacks.ModelCheckpoint( "/content/table_net.h5", monitor = "val_accuracy", save_best_only=True, verbose = 0, mode="min") es = tf.keras.callbacks.EarlyStopping(monitor='val_accuracy', mode='min', patience=5,)Mistakable
Hi Pravin, could you please look at the answer submitted by BlueTurtle beneath? That could help @MistakableHeigho
A
16

To add to the accepted answer as I just struggled with this. Not only do you have to use the full the metric name, it must match for your model.compile, ModelCheckpoint, and EarlyStopping. I had one set to accuracy and the other two set to val_accuracy and it did not work.

Antrum answered 11/3, 2021 at 16:43 Comment(2)
Good point. In my case, for the "monitor" argument in ModelCheckpoint I had to pass 'loss'. My loss argument in compile was "'mean_squared_error'" and with the same value for the monitor argument, I had the same warning. When I change it to 'loss' for ModelCheckpoint, it works.Giles
Is it not possible to monitor different metrics for early stopping and checkpoint?Giustino
J
2

Print the metrics after training for one epoch like below. This will print the metrics defined for your model.

hist = model.fit(...)
for key in hist.history:
print(key)

Now replace them in your metrics. It will work like charm.

This hack was given by the gentleman in the below link. Thanks to him!! https://github.com/tensorflow/tensorflow/issues/33163#issuecomment-540451749

Jesus answered 19/4, 2022 at 8:58 Comment(0)
K
1

monitor='val_loss' in both Checkpointing and Earlystopping callbacks worked for me.

Kasher answered 5/9, 2022 at 14:27 Comment(0)
P
0

I had the same issue as even after mentioning the metric=val_accuracy it did not work. So I just changed it to metric=val_acc and it worked.

Pich answered 26/2, 2022 at 20:5 Comment(1)
Its not working.. maybe you must have mentioned some custom function to mention in metric.Desiraedesire
C
0

If you are using validation_steps or steps per epochs in model.fit() function. Remove that parameter. The validation losses and accuracy will start appearing. Just include a few parameters as possible:

model_history = model.fit(x=aug.flow(X_train, y_train, batch_size=16), epochs=EPOCHS,validation_data=[X_val, y_val], callbacks=[callbacks_list])
Cartulary answered 6/5, 2022 at 11:51 Comment(0)
M
0

If you are using ModelCheckpoint and EarlyStopping then in that case both the "momitor" metric should be same like 'accuracy'.

Also, EarlyStopping doesn't support all metrics in some tensorflow versions so you have to choose an metrics that's common in both and what best suits your model.

Maddy answered 4/6, 2022 at 22:3 Comment(0)
E
0

I still had the issue even after changing the argument from monitor='val_acc' to monitor='val_accuracy'.

You can check this link from Keras and make sure you keep the arguments and the values you are passing as it is. I removed extra arguments I was passing and it worked for me!

Before

checkpoint = ModelCheckpoint("mnist-cnn-keras.h5", monitor='val_acc', verbose=1, save_best_only=True, save_weights_only=False, mode='auto', save_freq=1)

After

checkpoint = ModelCheckpoint("./", monitor='val_accuracy', verbose=2, save_best_only=True, mode='max')
Ermines answered 7/6, 2022 at 5:42 Comment(0)
I
0

You have to write what the name appears when you run it. You are probably using a different metric instead of 'accuracy' in the metric section. BinaryAccuracy, SparseAccuracy, CategoricalAccuracy etc. For example, when you use BinaryAccuracy, 'binary_accuracy' is written instead of 'accuracy' in the run section. This is how you should write in the monitor section.

Intestine answered 18/7, 2022 at 17:52 Comment(0)
J
0

You may also find your model metrics having an incrementing number appended to them after the first run. E.g.

for key in history.history:
    print(key)
loss
accuracy
auc_4
precision_4
recall_4
true_positives_4
true_negatives_4
false_positives_4
false_negatives_4
val_loss
val_accuracy
val_auc_4

If that is the case, you can reset the session before each run so that the numbers aren't appended.

for something in something_else:
    tf.keras.backend.clear_session()  # resets the session
    model = define_model(...)
    history = train_model(...)
Joejoeann answered 11/2, 2023 at 18:52 Comment(0)
T
0

Set the argument save_freq='epoch' in the ModelCheckpoint, like this:

ModelCheckpoint('best_model', monitor= 'val_accuracy', save_best_only=True, mode='max', save_format='tf', save_freq='epoch')

Thorwald answered 3/7 at 22:42 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.