I’m running a virtual vachine on GCP
with a tesla GPU.
And try to deploy a PyTorch
-based app to accelerate it with GPU.
I want to make docker use this GPU, have access to it from containers.
I managed to install all drivers on host machine, and the app runs fine there, but when I try to run it in docker (based on nvidia/cuda container) pytorch fails:
File "/usr/local/lib/python3.6/dist-packages/torch/cuda/__init__.py", line 82,
in _check_driver http://www.nvidia.com/Download/index.aspx""")
AssertionError:
Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from
To get some info about nvidia drivers visible to the container, I run this:
docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi
But it complains: docker: Error response from daemon: Unknown runtime specified nvidia.
On the host machine nvidia-smi
output looks like this:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla P100-PCIE... On | 00000000:00:04.0 Off | 0 |
| N/A 39C P0 35W / 250W | 873MiB / 16280MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
If I check my runtimes in docker, I get only runc
runtime, no nvidia
as in examples around the internet.
$ docker info|grep -i runtime
Runtimes: runc
Default Runtime: runc
How can I add this nvidia
runtime environment to my docker?
Most posts and questions I found so far say something like "I just forgot to restart my docker daemon, it worked", but it does not help me. Whot should I do?
I checked many issues on github, and #1, #2 and #3 StackOverflow questions - didn't help.
--gpus all
switch. 4. Profit! See here: docs.nvidia.com/ngc/ngc-titan-setup-guide/index.html – Flong--runtime nvidia
, of the deprecated nvidia-docker2. The checked answer (install nvidia-container-runtime and edit /etc/docker/daemon.json), can be installed on top of the new nvidia-docker-toolkit seems compatible with it and achieves the required backwards compatibility with just a very small package (600kB on Ubuntu). – Pibgorn