Estimator's model_fn includes params argument, but params are not passed to Estimator
Asked Answered
F

5

7

I'm trying to run Object Detection API locally.

I believe I have everything set up as described in the TensorFlow Object Detection API documents, however, when I'm trying to run model_main.py, this warning shows and model doesn't train. (I can't really tell if model is training or not, because the process isn't terminated, but no further logs appear)

WARNING:tensorflow:Estimator's model_fn (.model_fn at 0x0000024BDBB3D158>) includes params argument, but params are not passed to Estimator.

The code I'm passing in is:

python tensorflow-models/research/object_detection/model_main.py \
--model_dir=training \
--pipeline_config_path=ssd_mobilenet_v1_coco.config \
--checkpoint_dir=ssd_mobilenet_v1_coco_2017_11_17/model.ckpt \
--num_tain_steps=2000 \
--num_eval_steps=200 \
--alsologtostderr

What could be causing this warning?

Why would the code seem stuck?

Please help!

Flimflam answered 4/9, 2018 at 9:3 Comment(2)
Sounds like your tensorflow version is out of sync with your version of models. Are you using the most recent version of models repo? What version of tensorflow are you using?Lara
@Lara I'm using Tensorflow version 1.10.0 and the most recent object detection API. Maybe I need to downgrade my tensorflow?Flimflam
L
14

I met the same problem, and I found that this warning has nothing to do with the problem that the model doesn't work. I can make the model work as this warning showing.

My mistake was that I misunderstood the line in the document of running_locally.md

"${MODEL_DIR} points to the directory in which training checkpoints and events will be written to"

I changed the MODEL_DIR to the {project directory}/models/model where the structure of the directory is:

+data
  -label_map file
  -train TFRecord file
  -eval TFRecord file
+models
  + model
    -pipeline config file
    +train
    +eval

And it worked. Hoping this can help you.

Edit: while this may work, in this case model_dir does not contain any saved checkpoint files, if you stop the training after some checkpoint files are saved and restart again, the training would still be skipped. The doc specifies the recommended directory structure, but it is not necessary to be the same structure as all paths to tfrecord, pretrained checkpoints can be configured in the config file.

The actual reason is when model_dir contains checkpoint files which already reached the NUM_TRAIN_STEP, the script will assume the training is finished and exit. Remove the checkpoint files and restart training will work.

Leis answered 6/9, 2018 at 9:37 Comment(2)
Do you still get the warning?Christyna
{project directory}/models/model with this context should we create new directory model because i created model but it didn't worked for me or can you explain in python execution code?Circular
R
2

In my case, I had the same error because I had inside of the folder where my .cpkt files were, the checkpoint of the pre-trained models too.

Removing that file came inside of the .tar.gz file, the training worked.

enter image description here

Rafat answered 23/11, 2018 at 13:57 Comment(0)
S
2

I also received this error, and it was because I had previously trained a model on a different dataset/model/config file, and the previous ckpt files still existed in the directory I was working with, moving the old ckpt training data to a different directory fixed the issue

Seminarian answered 31/12, 2018 at 17:46 Comment(2)
this doesn't qualify as the answer. It should be a comment instead.Carilyn
You must have 50 reputation to commentSeminarian
K
0

Your script seems good. One thing we should notice is that, the new model_main.py will not print the log of training(like training step, lr, loss and so on.) It only print the evaluation result after one or multi-epoches, which will be a long time.
So "the process isn't terminated, but no further logs appear" is normal. You can confirm its running by using "nvidia-smi" to check the gpu situation, or use tensorboard to check.

Kroeger answered 5/12, 2018 at 3:50 Comment(0)
R
0

I also encountered this warning message. I checked nvidia-smi and it seemed training wasn't started. Also tried re-organizing output directory and it didn't work out. After checking out Configuring the Object Detection Training Pipeline (tensorflow official), I found it was configuration problem. Solved the problem by adding load_all_detection_checkpoint_vars: true.

Rakia answered 25/1, 2019 at 22:47 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.