Cannot run nvidia-smi inside the docker without sudo
Asked Answered
E

2

7

I installed the nvidia-docker2 following the instructions here. When running the following command I will get the expected output as shown.

sudo docker run --rm --gpus all nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi


+-----------------------------------------------------------------------------+
| NVIDIA-SMI 495.29.05    Driver Version: 495.29.05    CUDA Version: 11.5     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:0B:00.0  On |                  N/A |
| 24%   31C    P8    13W / 250W |    222MiB / 11011MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                           
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

However, running the above command without "sudo" results in the following error for me:

$ docker run --rm --gpus all nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi
docker: Error response from daemon: failed to create shim task: OCI runtime create 
failed: runc create failed: unable to start container process: error during container 
init: error running hook #0: error running hook: exit status 1, stdout: , stderr: 
nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: 
cannot open shared object file: no such file or directory: unknown.

Can anyone please help me with how I can solve this problem?

Egidio answered 15/7, 2022 at 22:22 Comment(0)
P
1

Add docker group to your user:

sudo usermod -aG docker your_user

Update:

Check here https://github.com/NVIDIA/nvidia-docker/issues/539

Maybe something from the comments will help you.

Phenylalanine answered 16/7, 2022 at 16:27 Comment(2)
Thanks for answering. But I already did that as part of my docker installation. But It does not help and I still have the error. Do you recommend anything else?Egidio
In fact, commands like $docker run hello-world is working without the need of 'sudo' which confirms that I have my user in the docker group. But my problem with calling the nvidia-smi is still not resolved.Egidio
A
0

try adding "sudo" to you docker command. e.g sudo docker-compose ...

Andromeda answered 19/8, 2022 at 1:22 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.