NVIDIA Cuda error "all CUDA-capable devices are busy or unavailable" on OSX

Asked 6/8, 2011 at 11:28 Answered 24/1 at 22:7

Quite often, I get the CUDA library to completely fail and return with an error 46 ("all CUDA-capable devices are busy or unavailable") even for simple calls like cudaMalloc. The code runs successfully if I restart the computer, but this is far from ideal. This problem is apparently quite common.

My setup is the following:

OSX 10.6.8
NVIDIA CUDA drivers : CUDA Driver Version: 4.0.31 (latest)
GPU Driver Version: 1.6.36.10 (256.00.35f11)

I tried many solutions from the Nvidia forum, but it didn't work. I don't want to reboot every time it happens. I also tried to unload and reload the driver with a procedure I assume to be correct (may not be)

kextunload -b com.nvidia.CUDA
kextload -b com.nvidia.CUDA

But still it does not work. How can I kick the GPU (or CUDA) back into sanity ?

This is the device querying result

 CUDA Device Query (Runtime API) version (CUDART static linking)

Found 1 CUDA Capable device(s)

Device 0: "GeForce 9400M"
  CUDA Driver Version / Runtime Version          4.0 / 4.0
  CUDA Capability Major/Minor version number:    1.1
  Total amount of global memory:                 254 MBytes (265945088 bytes)
  ( 2) Multiprocessors x ( 8) CUDA Cores/MP:     16 CUDA Cores
  GPU Clock Speed:                               1.10 GHz
  Memory Clock rate:                             1075.00 Mhz
  Memory Bus Width:                              128-bit
  Max Texture Dimension Size (x,y,z)             1D=(8192), 2D=(65536,32768), 3D=(2048,2048,2048)
  Max Layered Texture Size (dim) x layers        1D=(8192) x 512, 2D=(8192,8192) x 512
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       16384 bytes
  Total number of registers available per block: 8192
  Warp size:                                     32
  Maximum number of threads per block:           512
  Maximum sizes of each dimension of a block:    512 x 512 x 64
  Maximum sizes of each dimension of a grid:     65535 x 65535 x 1
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             256 bytes
  Concurrent copy and execution:                 No with 0 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            Yes
  Support host page-locked memory mapping:       Yes
  Concurrent kernel execution:                   No
  Alignment requirement for Surfaces:            Yes
  Device has ECC support enabled:                No
  Device is using TCC driver mode:               No
  Device supports Unified Addressing (UVA):      No
  Device PCI Bus ID / PCI location ID:           2 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 4.0, CUDA Runtime Version = 4.0, NumDevs = 1, Device = GeForce 9400M
[deviceQuery] test results...
PASSED

This is an example of code that may fail (although in normal conditions it does not)

#include <stdio.h>

__global__ void add(int a, int b, int *c) {
    *c = a + b;
}

int main(void) {
    int c;
    int *dev_c;

    cudaMalloc( (void **) &dev_c, sizeof(int)); // fails here, returning 46

    add<<<1,1>>>(2,7,dev_c);
    cudaMemcpy(&c, dev_c, sizeof(int), cudaMemcpyDeviceToHost);
    printf("hello world, %d\n",c);
    cudaFree( dev_c);
    return 0;
}

I also found out that occasionally I get to revert back to a sane behavior without a reboot. I still don't know what triggers it.

Onomastics answered 6/8, 2011 at 11:28 Comment(9)

Are you running on a Macbook Pro with a discrete GPU? If so, check out gfxCardStatus, which allows you to force OS X to use your discrete GPU. – Jiles 6/8, 2011 at 22:33

gpu have finite hwd resources, and MacOSX itself use CUDA resources (since it just gpu hwd that is used for 3d rendering anyway). So it may be case of too weak gpu for CUDA tasks you give to it :( Post gpu info! – Rania 12/8, 2011 at 17:0

@Rania I am allocating very small amounts of memory in my tests, yet they fail. Added info – Onomastics 12/8, 2011 at 19:24

What kind of compile flags are you using, which architecture are you targeting ? I have a similar Mac and I generate for sm_10 and sm_20 - never seen that error, but I'm only on CUDA 3.2. – Tallboy 12/8, 2011 at 20:33

@Tallboy : I'm not specifying any flag – Onomastics 12/8, 2011 at 21:53

OK, it seems the default is sm_10. Are you certain you aren't running another GPU intensive process, screensaver, etc ? Your links don't really prove it is "quite common". I'm not on CUDA 4.0, but I suspect an error in your code. – Tallboy 12/8, 2011 at 22:15

@Tallboy To my knowledge, I am not running any GPU sensitive stuff. No screensaver (I just print the time and apple logo). The code is trivial. A simile cudaMalloc of a handful of integers fails. – Onomastics 12/8, 2011 at 23:8

If your code is trivial then give an example that fails. – Tallboy 13/8, 2011 at 7:24

Even if you're not aware of having any GPU using programs, maybe there is one? A browser rendering on the GPU or something. Does it start working again if you kill a few apps (and stuff like SystemUIServer)? Maybe you can isolate the one by testing. – Roanne 15/8, 2011 at 20:33

I confirm the statement made by the commenters to my post. The GPU may not work if other applications are taking control of it. In my case, the flash player in firefox was apparently occupying all the available resources on the card. I killed the firefox plugin for flash and the card immediately started working again.

Onomastics answered 25/8, 2011 at 8:1 Comment(3)

I'm glad you and the others posted about this. It saved me a chunk of time and kept me from tearing my hair out. Thanks! – Tallu 30/1, 2012 at 4:16

on OSX 10.7, stopping firefox indeed makes cudaMalloc work. I found that one can also uncheck the 'Use hardware acceleration when available' option in Preferences -> 'Advanced' tab -> 'General' subtab of Firefox to run Firefox together with another GPU application. – Ideography 9/8, 2013 at 14:34

i have similar problem – Punk 12/6, 2017 at 16:32

Restarting my computer did the trick for me.

Matildematin answered 10/4, 2020 at 15:43 Comment(3)

It always work but that's not the permanent solution. – Rental 12/8, 2020 at 9:25

totally agreed. someone looking for a quick fix could try it tho. – Matildematin 12/8, 2020 at 13:49

Every time you "awaken" the computer you have to shut down Jupyter and run this command: sudo rmmod nvidia_uvm && sudo modprobe nvidia_uvm. It's a bug in nvidia_uvm that has been there for years (or in the way some of these tools interact with it). The bug still exists in version 465.19.01 . – Rickert 18/5, 2021 at 15:29

In the case of VScode, restarting VScode itself works for me.

Merissameristem answered 1/3, 2023 at 16:9 Comment(1)

not for me unfortunately – Exum 24/1 at 21:58

Change the Nvidia driver version for ubuntu (Nvidia 450 ) works

Hamhung answered 1/5, 2021 at 7:43 Comment(0)

As user Brannon stated in a comment, removing and reloading the nvidia_uvm module will work.

This was the only method that (except for restart) helped me. However, keep in mind to:

exit jupyter (or close vs code)
exit firefox (as this might interact with the GPU)
exit tensorboard server
other processes that interact with GPU

otherwise the module cannot be removed and rmmod: ERROR: Module nvidia_uvm is in use will be raised.

sudo rmmod nvidia_uvm && sudo modprobe nvidia_uvm

Exum answered 24/1 at 22:7 Comment(0)

-2

You can do this to save your day:

os.environ['CUDA_VISIBLE_DEVICES'] = '0'

It worked for me.

Thordia answered 24/11, 2019 at 22:23 Comment(1)

How would that help? This basically leads to a program being only able to see specified devices (in your example the device with ID 0). – Lorollas 27/11, 2019 at 13:53

Recommended topics

Hot tags