How can I flush GPU memory using CUDA (physical reset is unavailable)

S

13

116

My CUDA program crashed during execution, before memory was flushed. As a result, device memory remained occupied.

I'm running on a GTX 580, for which nvidia-smi --gpu-reset is not supported.

Placing cudaDeviceReset() in the beginning of the program is only affecting the current context created by the process and doesn't flush the memory allocated before it.

I'm accessing a Fedora server with that GPU remotely, so physical reset is quite complicated.

So, the question is - Is there any way to flush the device memory in this situation?

Sandbox answered 4/3, 2013 at 8:22 Comment(12)

"As a result, device memory remains occupied" - How do you know this to be true? – Psychoneurosis 4/3, 2013 at 8:28

Although nvidia-smi --gpu-reset is not available, I can still get some information with nvidia-smi -q. In most fields it gives 'N/A', but some information is useful. Here is the relevant output:

Memory Usage         Total                   : 1535 MB         Used                    : 1227 MB         Free                    : 307 MB

– Sandbox 4/3, 2013 at 8:35

Plus, I fail to allocate memory for variables, which are small enough – Sandbox 4/3, 2013 at 8:36

Is the process which was holding the context on the GPU still alive? Even catastrophic termination of a process should result in the driver destroying the context and releasing resources. – Psychoneurosis 4/3, 2013 at 9:19

It doesn't look like it is alive. At least, I don't see it alive on CPU. I guess, the process on GPU cannot be alive as well, since I can launch another kernel (concurrent execution is not available on my GPU). But the memory is still occupied, I can be sure about it because of the reasons described above – Sandbox 4/3, 2013 at 9:25

If you have root access, you can unload and reload the nvidia driver. – Zenia 4/3, 2013 at 10:14

Did it crash oh host side or while kernel was running? – Candracandy 4/3, 2013 at 11:57

If you do ps -ef |grep 'whoami' and the results show any processes that appear to be related to your crashed session, kill those. (the single quote ' should be replaced with backtick ` ) – Equable 4/3, 2013 at 16:18

Have you tried sudo rmmod nvidia? – Piggish 4/3, 2013 at 22:46

ksooklall has a great answer to find what is hogging the memory, even if it doesn't show on nvidia-smi. – Dermatitis 10/10, 2017 at 18:50

nvidia-smi -caa worked great for me to release memory on all GPUs at once. – Countermove 25/6, 2019 at 11:54

How do you clear the NVIDIA GPU memory in Windows? – Seadon 22/9, 2022 at 23:13

P

16

Although it should be unecessary to do this in anything other than exceptional circumstances, the recommended way to do this on linux hosts is to unload the nvidia driver by doing

$ rmmod nvidia

with suitable root privileges and then reloading it with

$ modprobe nvidia

If the machine is running X11, you will need to stop this manually beforehand, and restart it afterwards. The driver intialisation processes should eliminate any prior state on the device.

This answer has been assembled from comments and posted as a community wiki to get this question off the unanswered list for the CUDA tag

Psychoneurosis answered 4/3, 2013 at 8:22 Comment(1)

cannot process the above command, error says, CUDA in use. So killed the PID using the solution provided by https://mcmap.net/q/187966/-how-can-i-flush-gpu-memory-using-cuda-physical-reset-is-unavailable. Its works for me – Neurophysiology 17/4, 2019 at 0:53

A

208

check what is using your GPU memory with

sudo fuser -v /dev/nvidia*

Your output will look something like this:

                     USER        PID  ACCESS COMMAND
/dev/nvidia0:        root       1256  F...m  Xorg
                     username   2057  F...m  compiz
                     username   2759  F...m  chrome
                     username   2777  F...m  chrome
                     username   20450 F...m  python
                     username   20699 F...m  python

Then kill the PID that you no longer need on htop or with

sudo kill -9 PID.

In the example above, Pycharm was eating a lot of memory so I killed 20450 and 20699.

Ahem answered 6/10, 2017 at 2:7 Comment(9)

Thank you! For some reason, I had a process hogging all my VRAM, not showing on nvidia-smi. – Dermatitis 10/10, 2017 at 18:48

I need to use this a lot when running deep learning in different jupyter notebooks. The only issue is knowing exactly which PID is which. Any tips on this? – Pip 12/5, 2018 at 21:56

is chrome --- the google chrome browser? if so what business does it have using a gpu? – Chigoe 5/6, 2018 at 21:19

@josh I kill them one at a time making a mental note of the COMMAND. – Ahem 7/6, 2018 at 14:43

@kRazzyR - It uses it for speeding up computations, I assume, for rendering graphics, but maybe also other things. This did cause me a lot of issues when I install Nvidia drivers, CUDA and cudnn. I had to turn a lot of it off. See here. – Pip 7/6, 2018 at 15:50

@ksooklall - I meant how do we know which jupyter notebook/process corresponds to which PID and therefore which one to kill... thanks. – Pip 7/6, 2018 at 15:52

I notice that I kill one then it start a new one. I tried to pkill -9 -t pts/1 to log out,still not working – Cape 31/5, 2019 at 16:7

Try kill -9 or kill -15, not pkill – Ahem 31/5, 2019 at 17:37

In my case, sudo is not necessary. – Waylonwayman 7/8, 2020 at 0:58

T

69

First type

nvidia-smi

then select the PID that you want to kill

sudo kill -9 PID

Trilbi answered 14/12, 2018 at 8:24 Comment(3)

Brilliant, this one actually worked for me. PID should be replaced with the.. PID number of the process that uses the GPU (which you can figure by nvidia-smi) – Cockneyfy 26/7, 2021 at 11:42

the command nvidia-smi returns Failed to initialize NVML: Driver/library version mismatch – Gastrotomy 2/11, 2021 at 10:40

nvidia-smi gives me two processes and when I go to kill them, it says no such process. processes are both called Xwayland – Circus 29/10, 2022 at 0:21

W

18

for the ones using python:

import torch, gc
gc.collect()
torch.cuda.empty_cache()

Westberg answered 31/3, 2020 at 9:34 Comment(3)

This cannot in any way to what the questioner was asking about – Psychoneurosis 6/6, 2020 at 11:31

nevertheless answered my problem (which is admittedly not the exact same as the OP asked, but matches the title while searching) – Dreadfully 10/1, 2023 at 23:44

same here, any help is appreciated :) – Quartern 15/2, 2023 at 19:15

P

16

Although it should be unecessary to do this in anything other than exceptional circumstances, the recommended way to do this on linux hosts is to unload the nvidia driver by doing

$ rmmod nvidia

with suitable root privileges and then reloading it with

$ modprobe nvidia

If the machine is running X11, you will need to stop this manually beforehand, and restart it afterwards. The driver intialisation processes should eliminate any prior state on the device.

This answer has been assembled from comments and posted as a community wiki to get this question off the unanswered list for the CUDA tag

Psychoneurosis answered 4/3, 2013 at 8:22 Comment(1)

cannot process the above command, error says, CUDA in use. So killed the PID using the solution provided by https://mcmap.net/q/187966/-how-can-i-flush-gpu-memory-using-cuda-physical-reset-is-unavailable. Its works for me – Neurophysiology 17/4, 2019 at 0:53

G

12

I also had the same problem, and I saw a good solution in quora, using

sudo kill -9 PID.

see https://www.quora.com/How-do-I-kill-all-the-computer-processes-shown-in-nvidia-smi

Gunderson answered 19/7, 2017 at 13:26 Comment(1)

Worked a treat when I accidentally opened and loaded two different jupyter notebooks with VGG16. Warning: it kills the notebooks. I guess you could pick one to free up some memory for the other but I dont know how you select the PID for a given notebook. – Pip 30/12, 2017 at 15:2

T

11

One can also use nvtop, which gives an interface very similar to htop, but showing your GPU(s) usage instead, with a nice graph. You can also kill processes directly from here.

Here is a link to its Github : https://github.com/Syllo/nvtop

NVTOP interface

Tom answered 10/4, 2020 at 9:57 Comment(0)

A

6

to kill all processess on GPU:

sudo fuser -v /dev/nvidia* -k

Alessandro answered 17/12, 2022 at 11:19 Comment(1)

Instead of simply providing the answer directly, try writing a detailed comment that explains the solution, as long as the explanation is not too lengthy. @Alessandro . – Mess 21/12, 2022 at 15:8

I

5

on macOS (/ OS X), if someone else is having trouble with the OS apparently leaking memory:

https://github.com/phvu/cuda-smi is useful for quickly checking free memory
Quitting applications seems to free the memory they use. Quit everything you don't need, or quit applications one-by-one to see how much memory they used.
If that doesn't cut it (quitting about 10 applications freed about 500MB / 15% for me), the biggest consumer by far is WindowServer. You can Force quit it, which will also kill all applications you have running and log you out. But it's a bit faster than a restart and got me back to 90% free memory on the cuda device.

Immunogenetics answered 12/10, 2016 at 21:54 Comment(0)

M

4

Normally I just use nvidia-smi, but for some problems it's not enough (something still in cuda memory)

The nvidia-smi kill all is:

nvidia-smi | grep 'python' | awk '{ print $5 }' | xargs -n1 kill -9

If you're still hitting unexpected memory errors or similar problems then try:

sudo fuser -v /dev/nvidia* | cut -d' ' -f2- | sudo xargs -n1 kill -9

Maryn answered 18/7, 2023 at 21:9 Comment(0)

K

1

For OS: UBUNTU 20.04 In the terminal type

nvtop

If the direct killing of consuming activity doesn't work then find and note the exact number of activity PID with most GPU usage.

sudo kill PID -number

Kauffman answered 1/12, 2021 at 5:1 Comment(0)

B

0

If you have the problem that after killing one process the next starts (Comment)- like for example when you have a bash script that calls multiple python scripts and you want to kill them but can't find its PID you can use ps -ef where you'll find the PID of your "problematic" process and also its PPID (parent PID). Use kill PPID or kill -9 PPID or sudo kill PPID to stop the processes.

Bayberry answered 28/9, 2022 at 9:32 Comment(0)

P

0

If all of this does not work, I found another answer here:

How to kill process on GPUs with PID in nvidia-smi using keyword?

nvidia-smi | grep 'python' | awk '{ print $X }' | xargs -n1 kill -9

Note that X (in the 'awk' expression) corresponds to the Xth column of your nvidia-smi command. If your nvidia-smi command looks like this, you should then replace X by 5.

Polder answered 24/2, 2023 at 11:53 Comment(0)

B

-2

I just started a new terminal and closed the old one and it worked out pretty well for me.

Brag answered 18/9, 2022 at 17:30 Comment(0)

Recommended topics

Hot tags