i got an error about error: Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice in tensorflow object_detection api
Asked Answered
C

5

6

Windows Version: Windows 10 Pro 21H2 19044.1706 GPU: rtx2070

import tensorflow as tf
import torch
print(torch.__version__) #1.10.1+cu113
print(torch.version.cuda) #11.3
print(tf.__version__) #2.9.1

and i run

python .\object_detection\builders\model_builder_tf2_test.py

i can get 'Ran 24 tests in 18.279s OK (skipped=1)' result;

But when I want to train my model, i use

feature_extractor {
   type: 'faster_rcnn_inception_resnet_v2_keras'
}

in my pipeline_config, and i run

python .\object_detection\model_main_tf2.py --logtostderr --pipeline_config_path=LOCATION_OF_MY_PIPECONFIG --model_dir=LOCATION_OF_MY_MODEL_DIR

And then i get the following error enter image description here In my system environment variable , 'CUDA_DIR' is variable and can be accessed

Crossland answered 4/6, 2022 at 11:36 Comment(0)
B
12

I had the same problem and just fixed it. The library can't find the folder even if you set the "CUDA_DIR" because it's not using that variable or any other I tried. This post is helpful in understanding the issue. The only solution I was able to find is just copying the required files.

Steps for a quick fix:

  1. Find where your CUDA nvvm is installed (for me it is "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6").
  2. Find the working directory for your script (the environment or the directory you are running the script in).
  3. Copy the entire nvvm folder into the working directory and your script should work.

This is not a great solution but until someone else posts a answer you can at least run your code.

Bowens answered 15/6, 2022 at 0:15 Comment(3)
I also get this error, but I don't really understand your solution. To where exactly did you copy your nvmm folder? How could that solve anything since the error still specifies that it is trying to read: "${CUDA_DIR}/nvmm/libdevice" so without knowing the value of "CUDA_DIR" I have no way of knowing where to put the nvmm folder.Bosom
Never mind, I solved it by also setting the CUDA as an XLA flag. Check this: #54189768Bosom
@MartinSonesson that solution doesn't apply to a windows enviroment, just a clarification to any other reader.Thromboplastin
R
2

Here is how I solve the same problem:

  1. Go to your CUDA main folder path and its v10.x or v11.x sub-folder.

    In my case is from the directory path: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7

    enter image description here

  2. Copy the nvmm whole folder.

  3. Paste it into your current python working directory.

    In my case using Pycharm, it must be the same directory with .idea and __pycache__ folder, and venv folder (not shown in the photo) enter image description here

  4. Run your program/code again. It should work now and won't show up error: Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice anymore.

Rolfe answered 13/2, 2023 at 15:30 Comment(0)
P
1

Solution 1

If you followed the official tensorflow guide and installed tensorflow using anaconda/miniconda + pip, here is the location of CUDA_DIR:

/root/miniconda3/{YOUR_ENVIRONMENT_NAME}

If you properly installed CUDA, this folder should contain the subfolder nvvm which is exactly what tensorflow looks for in order to find the libdevice.

Edit:

Solution 2

After some experimentation, it turns out that sometimes even setting the CUDA_DIR path does not work. It turns out that tensorflow will look for CUDA inside the current working directory.

Solution:

  1. Determine your current working directory (just execute pwd)
  2. Copy the libdevice.10.bc file into your current working directory. If you used conda as mentioned above, you should find the file in this directory: /{path_to_your_conda_environment}/nvvm/libdevice/libdevice.10.bc

Obviously it's a terrible solution, but hey, after 5 hours I don't care anymore. Hope this helps

Paddock answered 30/8, 2023 at 13:19 Comment(0)
R
0

Copy the CUDA nvvm in the directory where your virtual environment exists.

Riley answered 27/9, 2022 at 14:23 Comment(0)
G
0

I had the same error using tensorflow 2.14.0 from within a Docker install. An upgrade did the trick for me:

pip install --no-cache-dir tensorflow==2.15.0 keras==2.15.0

Hope this helps!

Glomma answered 14/12, 2023 at 13:38 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.