Nvidia GPU memory allocated but by no process?
Asked Answered
S

2

16

I am frequently rerunning the same mxnet script while I try to iron out some bugs in a new script (and I am new to mxnet). Pretty often when I try to run my script I get an error that the GPU is out of memory, and when I use nvidia-smi to check, this is what I see:

Wed Dec  5 15:41:29 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.24.02              Driver Version: 396.24.02                 |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 108...  Off  | 00000000:65:00.0  On |                  N/A |
|  0%   54C    P2    68W / 300W |  10891MiB / 11144MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1446      G   /usr/lib/xorg/Xorg                            40MiB |
|    0      1481      G   /usr/bin/gnome-shell                         114MiB |
|    0     10216      G   ...-token=8422C9FC67F51AEC1893FEEBE9DB68C6    31MiB |
|    0     18221      G   /usr/lib/xorg/Xorg                           458MiB |
|    0     18347      G   /usr/bin/gnome-shell                         282MiB |
+-----------------------------------------------------------------------------+

So it seems like most of the memory is in use (10891/11144) BUT I don't see any process in the list taking up a large portion of the GPU, so there doesn't seem to be anything to call. And my mxnet script has been exited out, so I assume it shouldn't be that. I would understand if there were some seconds or even tens of seconds lagging if the GPU does not know right away that the script no longer needs memory, but I am going on many minutes and still see the same display. What gives, and is there some memory cleanup I should do? If so, how? Thank you for any tips to a newbie.

Straticulate answered 5/12, 2018 at 20:45 Comment(1)
B
7

The GPU memory usage is completely bound to the lifetime of the process. If you see GPU memory used, there must be a process that's still alive and holding on to memory. If you run ps -a |grep python you should see all python processes and that will tell you which process is still alive.

Briolette answered 6/12, 2018 at 3:58 Comment(2)
How could I use this command in Windows ?Thyestes
Using ps -a | grep python I did find & kill some processes which were consuming GPU memory. However, after killing all the processes returned by ps -a | grep python, there is still some GPU memory being used according to nvidia-smi. I'm using detectron2 and I'm wondering if this has to do with multiprocessing.Laundes
H
0

there must be a process occupying fb but not listed by nvidia-smi

you may use fuser -v /dev/nvidia0 to find if any process is still accesing the gpu:0 and probably holding the fb, though it is not shown under nvidia-smi.

Then you can kill it to release the fb.

Harwell answered 31/5 at 1:37 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.