Failed to install CUDA : warns that driver not selected, but nvidia-smi runs well
Asked Answered
N

4

6
  • OS: Ubuntu 22.04.1
  • Python 3.8.1 (Conda)
  • GPU: RTX4090
  • Nvidia driver: 530.30.02

When I set the environment of Deep Learning, I found that in pytorch, the torch.cuda.is_available() function is always False. I tried many times to change the version of pytorch, the cpu version installed successfully, but the gpu version can not be installed. The server may installed CUDA in wrong way before (nvcc --version not working, but I can see a lot files like CUDA-11.4), so I tried to install CUDA 12.1 and delete the file before. But still failed to install CUDA.

When I first check nvidia-smi, the output is like:

Mon Apr 24 11:16:34 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.30.02              Driver Version: 530.30.02    CUDA Version: 12.1     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                  Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 4090         On | 00000000:05:00.0 Off |                  Off |
|  0%   42C    P8               12W / 450W|      1MiB / 24564MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

It shows me the current nvidia driver version is 530.30.02, and the max CUDA version supported is 12.1. Then I try to download the CUDA 12.1 and install it by following commands:

wget https://developer.download.nvidia.com/compute/cuda/12.1.1/local_installers/cuda_12.1.1_530.30.02_linux.run
sudo sh cuda_12.1.1_530.30.02_linux.run

Then, it shows me a graph like this: CUDA Installer Then I continued to install by changing nothing:

Installation failed. See log at /var/log/cuda-installer.log for details.

Then I opened cuda-installer.log: cuda-installer.log The first line said 'Driver not installed', but when I checked nvidia-smi it shows me the driver is installed. Why?

Then I tried by not installing driver in the CUDA Installer: Not installing Driver Then it outputs following warnings:

===========
= Summary =
===========

Driver:   Not Selected
Toolkit:  Installed in /usr/local/cuda-12.1/

Please make sure that
 -   PATH includes /usr/local/cuda-12.1/bin
 -   LD_LIBRARY_PATH includes /usr/local/cuda-12.1/lib64, or, add /usr/local/cuda-12.1/lib64 to /etc/ld.so.conf and run ldconfig as root

To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-12.1/bin
***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 530.00 is required for CUDA 12.1 functionality to work.
To install the driver using this installer, run the following command, replacing <CudaInstaller> with the name of this run file:
    sudo <CudaInstaller>.run --silent --driver

But at this time, when I check nvidia-smi , it actually works, when I check nvcc --version, it prints command not found.

Then I checked other methods to install CUDA like

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin
sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/12.1.1/local_installers/cuda-repo-ubuntu2204-12-1-local_12.1.1-530.30.02-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu2204-12-1-local_12.1.1-530.30.02-1_amd64.deb
sudo cp /var/cuda-repo-ubuntu2204-12-1-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda

It doesn't works, the outputs like this:

(base) root@6f0f4f1d5e21:~/zyx/test# sudo apt-get -y install cuda
Reading package lists... Done
Building dependency tree       
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
 cuda : Depends: cuda-12-1 (>= 12.1.1) but it is not going to be installed
E: Unable to correct problems, you have held broken packages.
N answered 24/4, 2023 at 3:48 Comment(1)
Does your path include the CUDA bin location ? ie /usr/local/cuda-12.1/binStockbroker
U
4

I also had the same problem with the APT package. I went trough the "not going to be installed" dependency tree by trying to apt install every package it said not going to be installed, until I hit one that I could install. This turned out to be libnvidia-extra-530. So the following worked (following the documentation):

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-keyring_1.0-1_all.deb
sudo dpkg -i cuda-keyring_1.0-1_all.deb
sudo apt update
sudo apt upgrade
sudo apt install libnvidia-extra-530
sudo apt install cuda
Unhand answered 1/6, 2023 at 10:15 Comment(2)
i've still the problemSeeing
Ah this worked for me, but needed sudo apt install libnvidia-extra-550 instead.Matted
G
1

For me it was the problem with CUDA in Docker: it can't install drivers, so instead of apt install cuda I ran apt install cuda-toolkit and got the CUDA libraries.

Gleason answered 19/8, 2023 at 5:41 Comment(0)
L
0

I had a similar problem, where the output from the installer was:

The following packages have unmet dependencies: cuda : Depends: cuda-12-0 (>= 12.0) but it is not going to be installed E: Unable to correct problems, you have held broken packages.

In my case this was because I had already installed an older version of CUDA, specifically CUDA 11.4. I had to remove the previous installation using:

sudo apt-get remove --auto-remove cuda

Then a reinstalled CUDA 12.0 as it is explained in the nvidia official CUDA donwload section:

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin
sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/12.0.0/local_installers/cuda-repo-ubuntu2004-12-0-local_12.0.0-525.60.13-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu2004-12-0-local_12.0.0-525.60.13-1_amd64.deb
sudo cp /var/cuda-repo-ubuntu2004-12-0-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda
Lanita answered 30/9 at 6:41 Comment(0)
W
-1

I solved unmet dependency issue using aptitude package. Below worked for me to install latest CUDA 12.2:

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-keyring_1.1-1_all.deb
# ************** Do not delete any key even if message says so *********
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt-get update
# use aptitude instead of apt-get because of dependency issues
sudo aptitude install cuda
rm cuda-keyring_1.0-1_all.deb
Weft answered 31/8, 2023 at 0:26 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.