google colab setting a '^C' in the proccess
Asked Answered
S

4

8

I'm running this code that i got from this tutorial I'm trying running the tensorflow object detection api, all code work well, if you run all calls, all cell will works well, and in the end, my images are classified.

Buuut have 1 cell that dont work well, it's work, but doesn't like it must work.

When i will train my model with !python legacy/train.py --logtostderr --train_dir=training/ --pipeline_config_path=training/ssd_mobilenet_v1_pets.config it start the tensorflow and start the training, buuut it only run 3 steps, 4 steps, some times 20,21,23 steps and in the end, the google colab set a ^C in the process

I never can finish my training because the google colab close my process, some one know whatsap happening?

I already try use GPU and TPU instances.

[...]
INFO:tensorflow:Restoring parameters from training/model.ckpt-0
I1022 20:41:48.368024 139794549495680 tf_logging.py:115] Restoring parameters from training/model.ckpt-0
INFO:tensorflow:Running local_init_op.
I1022 20:41:52.779153 139794549495680 tf_logging.py:115] Running local_init_op.
INFO:tensorflow:Done running local_init_op.
I1022 20:41:52.997912 139794549495680 tf_logging.py:115] Done running local_init_op.
INFO:tensorflow:Starting Session.
I1022 20:41:59.072830 139794549495680 tf_logging.py:115] Starting Session.
INFO:tensorflow:Saving checkpoint to path training/model.ckpt
I1022 20:41:59.245162 139793493063424 tf_logging.py:115] Saving checkpoint to path training/model.ckpt
INFO:tensorflow:Starting Queues.
I1022 20:41:59.252097 139794549495680 tf_logging.py:115] Starting Queues.
INFO:tensorflow:global_step/sec: 0
I1022 20:42:10.151180 139793484670720 tf_logging.py:159] global_step/sec: 0
INFO:tensorflow:Recording summary at step 0.
I1022 20:42:16.119055 139793476278016 tf_logging.py:115] Recording summary at step 0.
INFO:tensorflow:global step 1: loss = 14.0911 (28.770 sec/step)
I1022 20:42:28.496783 139794549495680 tf_logging.py:115] global step 1: loss = 14.0911 (28.770 sec/step)
INFO:tensorflow:global step 2: loss = 12.4958 (10.529 sec/step)
I1022 20:42:39.334129 139794549495680 tf_logging.py:115] global step 2: loss = 12.4958 (10.529 sec/step)
INFO:tensorflow:global step 3: loss = 11.6073 (8.267 sec/step)
I1022 20:42:47.601801 139794549495680 tf_logging.py:115] global step 3: loss = 11.6073 (8.267 sec/step)
^C
Seibert answered 22/10, 2018 at 20:59 Comment(4)
Sounds like out of memory. Do things progress if you sample your data or run with a lower batch size? Can you share a notebook that reproduces the problem?Plummet
i'm using a bath=24 buuut i only have 50 images to train(i'm only testing it, it don't will to production)Rawdin
The code link the original question 404s.Plummet
drive.google.com/open?id=1gZTADeRnAX4li-yK-BlKk7qMHHLWlJoQRawdin
N
4

I agree with Bob Smith about 'out of memory' issue here. You can cope with it by upgrading your memory from 12GB to 25GB of RAM with a simple trick from Haohui. Run the following code in Colab:

a = []
while(1):
    a.append('1')

It will crash the session and you'll get a message 'Would you like to switch to a high-RAM runtime...' in the lower left side of the screen.

Newsstand answered 17/4, 2020 at 18:31 Comment(2)
They seem to have fixed this 'hack' now, so that it is not possible to increase the RAM this waySands
This solution doesn't work anymore, they probably have removed this featureOutland
L
4

You can also try to reduce the "batch_size" in the .config file.

Lobelia answered 8/9, 2020 at 16:4 Comment(1)
Worked for me. My batch_size was 24 then i changed to 12 Thanks :)Yetac
C
0

You can use the following GitHub repo for training a tensorflow object detection model on Google Colab. It has a readme, a .ipynb file, a model config file and a sample label_map file. Please do let me know if you face any issues. Hope this helps

Chordate answered 30/7, 2019 at 2:56 Comment(0)
A
0

I know this is old but I stumpled upon with the same problem and couldn't find a solution. This happened to me because I forgot to enable GPU under

Runtime->Change runtime type

, and my code utilizes gpu (using pytorch cuda).

More info: https://medium.com/deep-learning-turkey/google-colab-free-gpu-tutorial-e113627b9f5d Enable GPU in Colab

Administrator answered 5/2, 2020 at 0:1 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.