Object detection Classfication /A checkpoint was restored (e.g. tf.train.Checkpoint.restore or tf.keras.Model.load_weights)
Asked Answered
G

3

7

I am try to classfication with object detection at the colab.I am using "ssd_resnet101_v1_fpn_640x640_coco17_tpu-8.config"When I start to training I get error. Training=

!python model_main_tf2.py \
    --pipeline_config_path=training/ssd_resnet101_v1_fpn_640x640_coco17_tpu-8.config \
    --model_dir=training \
    --alsologtostderr
WARNING:tensorflow:A checkpoint was restored (e.g. tf.train.Checkpoint.restore or tf.keras.Model.load_weights) but not all checkpointed values were used. See above for specific issues. Use expect_partial() on the load status object, e.g. tf.train.Checkpoint.restore(...).expect_partial(), to silence these warnings, or use assert_consumed() to make the check explicit. See https://www.tensorflow.org/guide/checkpoint#loading_mechanics for details.
W1130 13:39:27.991891 140559633127296 util.py:158] A checkpoint was restored (e.g. tf.train.Checkpoint.restore or tf.keras.Model.load_weights) but not all checkpointed values were used. See above for specific issues. Use expect_partial() on the load status object, e.g. tf.train.Checkpoint.restore(...).expect_partial(), to silence these warnings, or use assert_consumed() to make the check explicit. See https://www.tensorflow.org/guide/checkpoint#loading_mechanics for details.
Glabrate answered 30/11, 2020 at 13:41 Comment(0)
G
9

I was dealing with the same error. I assume that the training stopped when you got the error you cited above. If so, you might want to check your folder paths.

I was able to get rid of the error myself when I figured out that I was trying to create a new model but TF was looking to a 'model_dir' folder that contained checkpoints from my previous model. Because my num_steps was not greater than the num_steps used in the previous model, TF effectively stopped running the training because the num_steps had already been completed.

By changing the model_dir to a brand new folder, I was able to overcome this error and begin training a new model. Hopefully this works for you as well.

Grainfield answered 5/2, 2021 at 18:7 Comment(0)
V
2

If anyone is trying to continue their training, the solution as @GbG mentioned is to update your num_steps value in the pipeline.config:

Original:

  num_steps: 25000
  optimizer {
    momentum_optimizer: {
      learning_rate: {
        cosine_decay_learning_rate {
          learning_rate_base: .04
          total_steps: 25000

Updated:

  num_steps: 50000
  optimizer {
    momentum_optimizer: {
      learning_rate: {
        cosine_decay_learning_rate {
          learning_rate_base: .04
          total_steps: 50000
Vainglory answered 1/7, 2021 at 10:38 Comment(0)
A
0

It means you trained your model enough num_steps in your config file

Altonaltona answered 7/12, 2021 at 1:5 Comment(1)
As it’s currently written, your answer is unclear. Please edit to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers in the help center.Define

© 2022 - 2024 — McMap. All rights reserved.