Why does my google colab session keep crashing?

P

7

20

I am using google colab on a dataset with 4 million rows and 29 columns. When I run the statement sns.heatmap(dataset.isnull()) it runs for some time but after a while the session crashes and the instance restarts. It has been happening a lot and I till now haven't really seen an output. What can be the possible reason ? Is the data/calculation too much ? What can I do ?

Painter answered 24/1, 2019 at 10:11 Comment(1)

Please share a notebook that reproduces the problem you describe. – Antihero 24/1, 2019 at 16:19

G

23

I'm not sure what is causing your specific crash, but a common cause is an out-of-memory error. It sounds like you're working with a large enough dataset that this is probable. You might try working with a subset of the dataset and see if the error recurs.

Otherwise, CoLab keeps logs in /var/log/colab-jupyter.log. You may be able to get more insight into what is going on by printing its contents. Either run:

!cat /var/log/colab-jupyter.log

Or, to get the messages alone (easier to read):

import json

with open("/var/log/colab-jupyter.log", "r") as fo:
  for line in fo:
    print(json.loads(line)['msg'])

Giordano answered 5/2, 2019 at 0:19 Comment(1)

Has something changed in how logging is done on colab? For me these are the contents of the /var/log folder:

alternatives.log  bootstrap.log  dpkg.log  fontconfig.log  lastlog  pip.log.bak-run.sh	wtmp apt		  btmp		 faillog   journal	   pip.log  private

– Powdery 14/3 at 17:14

Z

5

Another cause - if you're using PyTorch and assign your model to the GPU, but don't assign an internal tensor to the GPU (e.g. a hidden layer).

Zephan answered 21/4, 2020 at 4:56 Comment(0)

C

2

This error mostly comes if you enable the GPU but do not using it. Change your runtime type to "None". You will not face this issue again.

Clutter answered 23/10, 2021 at 13:36 Comment(1)

hmm, I mean i need more than two seconds to start using it :shrug: – Sherrillsherrington 20/4 at 6:46

C

1

I would first suggest closing your browser and restarting the notebook. Look at the run time logs and check to see if cuda is mentioned anywhere. If not then do a factory runtime reset and run the notebook. Check your logs again and you should find cuda somewhere there.

Cherish answered 7/11, 2021 at 11:50 Comment(0)

R

0

For me, passing specific arguments to the tfms augmentation failed the dataloader and crahed the session. Wasted lot of time checking the images not coruppt and clean the gc and more...

Rounds answered 29/8, 2020 at 16:37 Comment(0)

T

0

What worked for me was to click on the RAM/Disk Resources drop down menu, then 'Manage Sessions' and terminate my current session which had been active for days. Then reconnect and run everything again.

Before that, my code kept crashing even though it was working perfectly the previous day, so I knew there was nothing wrong coding wise.

After doing this, I also realized that the parameter n_jobs in GridSearchCV (https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html) plays a massive role in GPU RAM consumption. For example, for me it works fine and execution doesn't crash if n_jobs is set to None, 1 (same as None), or 2. Setting it to -1 (using all processors) or >3 crashes everything.

Tori answered 25/8, 2022 at 20:24 Comment(0)

R

0

The common cause is an out-of-memory error, Possible reasons maybe you specified a larger batch size while training your model try to reduce the batch size

Repudiation answered 17/3 at 9:18 Comment(1)

Welcome to StackOverflow! The accepted answer already mentions out-of-memory error. Could you edit your answer to explain how to change the batch size? – Checkered 20/3 at 3:51

Recommended topics

Hot tags