Add nvidia runtime to docker runtimes
Asked Answered
C

2

23

I’m running a virtual vachine on GCP with a tesla GPU. And try to deploy a PyTorch-based app to accelerate it with GPU.

I want to make docker use this GPU, have access to it from containers.

I managed to install all drivers on host machine, and the app runs fine there, but when I try to run it in docker (based on nvidia/cuda container) pytorch fails:

File "/usr/local/lib/python3.6/dist-packages/torch/cuda/__init__.py", line 82, 
in _check_driver http://www.nvidia.com/Download/index.aspx""")
AssertionError: 
Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from

To get some info about nvidia drivers visible to the container, I run this:

docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi
But it complains: docker: Error response from daemon: Unknown runtime specified nvidia.

On the host machine nvidia-smi output looks like this:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01    Driver Version: 440.33.01    CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla P100-PCIE...  On   | 00000000:00:04.0 Off |                    0 |
| N/A   39C    P0    35W / 250W |    873MiB / 16280MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

If I check my runtimes in docker, I get only runc runtime, no nvidia as in examples around the internet.

$ docker info|grep -i runtime
 Runtimes: runc
 Default Runtime: runc

How can I add this nvidia runtime environment to my docker?

Most posts and questions I found so far say something like "I just forgot to restart my docker daemon, it worked", but it does not help me. Whot should I do?

I checked many issues on github, and #1, #2 and #3 StackOverflow questions - didn't help.

Cicerone answered 23/11, 2019 at 13:45 Comment(2)
1. Install latest driver for your NVIDIA GPU on the base system. 2. Install docker-ce 19.03 or newer. 3. launch your containers with --gpus all switch. 4. Profit! See here: docs.nvidia.com/ngc/ngc-titan-setup-guide/index.htmlFlong
To share some experience: Some older projects rely on the interface option, --runtime nvidia, of the deprecated nvidia-docker2. The checked answer (install nvidia-container-runtime and edit /etc/docker/daemon.json), can be installed on top of the new nvidia-docker-toolkit seems compatible with it and achieves the required backwards compatibility with just a very small package (600kB on Ubuntu).Pibgorn
R
27

The nvidia runtime you need, is nvidia-container-runtime.

Follow the installation instructions here:
https://github.com/NVIDIA/nvidia-container-runtime#installation

Basically, you install it with your package manager first, if it's not present:

sudo apt-get install nvidia-container-runtime

Then you add it to docker runtimes:
https://github.com/nvidia/nvidia-container-runtime#daemon-configuration-file

This option worked for me:

$ sudo tee /etc/docker/daemon.json <<EOF
{
    "runtimes": {
        "nvidia": {
            "path": "/usr/bin/nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
}
EOF
sudo pkill -SIGHUP dockerd

Check that it's added:

$ docker info|grep -i runtime
 Runtimes: nvidia runc
 Default Runtime: runc
Resnatron answered 23/11, 2019 at 13:53 Comment(1)
Or, this modification can be done with sudo nvidia-ctk runtime configure --runtime=dockerSoberminded
A
3

As an update to @Viacheslav Shalamov's answer, the nvidia-container-runtime package is now part of the nvidia-container-toolkit which can also be installed with:

sudo apt install nvidia-cuda-toolkit

and then follow the same instruction above to set nvidia as default runtime.

Aristotelianism answered 26/2, 2023 at 17:51 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.