Tensorflow not running on GPU
Asked Answered
C

8

50

I have aldready spent a considerable of time digging around on stack overflow and else looking for the answer, but couldn't find anything

Hi all,

I am running Tensorflow with Keras on top. I am 90% sure I installed Tensorflow GPU, is there any way to check which install I did?

I was trying to do run some CNN models from Jupyter notebook and I noticed that Keras was running the model on the CPU (checked task manager, CPU was at 100%).

I tried running this code from the tensorflow website:

# Creates a graph.
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print(sess.run(c))

And this is what I got:

MatMul: (MatMul): /job:localhost/replica:0/task:0/cpu:0
2017-06-29 17:09:38.783183: I c:\tf_jenkins\home\workspace\release-win\m\windows\py\35\tensorflow\core\common_runtime\simple_placer.cc:847] MatMul: (MatMul)/job:localhost/replica:0/task:0/cpu:0
b: (Const): /job:localhost/replica:0/task:0/cpu:0
2017-06-29 17:09:38.784779: I c:\tf_jenkins\home\workspace\release-win\m\windows\py\35\tensorflow\core\common_runtime\simple_placer.cc:847] b: (Const)/job:localhost/replica:0/task:0/cpu:0
a: (Const): /job:localhost/replica:0/task:0/cpu:0
2017-06-29 17:09:38.786128: I c:\tf_jenkins\home\workspace\release-win\m\windows\py\35\tensorflow\core\common_runtime\simple_placer.cc:847] a: (Const)/job:localhost/replica:0/task:0/cpu:0
[[ 22.  28.]
 [ 49.  64.]]

Which to me shows I am running on my CPU, for some reason.

I have a GTX1050 (driver version 382.53), I installed CUDA, and Cudnn, and tensorflow installed without any problems. I installed Visual Studio 2015 as well since it was listed as a compatible version.

I remember CUDA mentioning something about an incompatible driver being installed, but if I recall correctly CUDA should have installed its own driver.

Edit: I ran theses commands to list the available devices

from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())

and this is what I get

[name: "/cpu:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 14922788031522107450
]

and a whole lot of warnings like this

2017-06-29 17:32:45.401429: W c:\tf_jenkins\home\workspace\release-win\m\windows\py\35\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE instructions, but these are available on your machine and could speed up CPU computations.

Edit 2

Tried running

pip3 install --upgrade tensorflow-gpu

and I get

Requirement already up-to-date: tensorflow-gpu in c:\users\xxx\appdata\local\programs\python\python35\lib\site-packages
Requirement already up-to-date: markdown==2.2.0 in c:\users\xxx\appdata\local\programs\python\python35\lib\site-packages (from tensorflow-gpu)
Requirement already up-to-date: html5lib==0.9999999 in c:\users\xxx\appdata\local\programs\python\python35\lib\site-packages (from tensorflow-gpu)
Requirement already up-to-date: werkzeug>=0.11.10 in c:\users\xxx\appdata\local\programs\python\python35\lib\site-packages (from tensorflow-gpu)
Requirement already up-to-date: wheel>=0.26 in c:\users\xxx\appdata\local\programs\python\python35\lib\site-packages (from tensorflow-gpu)
Requirement already up-to-date: bleach==1.5.0 in c:\users\xxx\appdata\local\programs\python\python35\lib\site-packages (from tensorflow-gpu)
Requirement already up-to-date: six>=1.10.0 in c:\users\xxx\appdata\local\programs\python\python35\lib\site-packages (from tensorflow-gpu)
Requirement already up-to-date: protobuf>=3.2.0 in c:\users\xxx\appdata\local\programs\python\python35\lib\site-packages (from tensorflow-gpu)
Requirement already up-to-date: backports.weakref==1.0rc1 in c:\users\xxx\appdata\local\programs\python\python35\lib\site-packages (from tensorflow-gpu)
Requirement already up-to-date: numpy>=1.11.0 in c:\users\xxx\appdata\local\programs\python\python35\lib\site-packages (from tensorflow-gpu)
Requirement already up-to-date: setuptools in c:\users\xxx\appdata\local\programs\python\python35\lib\site-packages (from protobuf>=3.2.0->tensorflow-gpu)

Solved: Check comments for solution. Thanks to all who helped!

I am new to this, so any help is greatly appreciated! Thank you.

Cobra answered 29/6, 2017 at 15:17 Comment(18)
Did you install tf with pip?Marras
Could you list the available devices using stackoverflow.com/documentation/tensorflow/10621/… ?Psychokinesis
Yes, I installed Tensorflow using pip3, I'm running Python 3.Cobra
According your edit, it's likely that you haven't the GPU version installed or at least that your GPU card is not supported.Psychokinesis
I tried running pip3 install --upgrade tensorflow-gpu and it tells me all the requirements are there: Requirement already up-to-date: tensorflow-gpu in c:\users\goofynose\appdata\local\programs\python\python35\lib\site-packagesCobra
I don't know much about TensorFlow, but did you install tensorflow-gpu? github.com/fchollet/keras/issues/5712Mogerly
Pretty sure I did, and I ran the command again to be safe and it shows it's installed. Checked with NVDIA and the GTX 1050 is listed as supported (mine is on a laptop but they list is as a desktop card).Cobra
From my link, Dahlasam's comment: "Then I installed tensorflow-gpu by copy-pasting "pip3 install --upgrade tensorflow-gpu" from Tensorflow pages. This didn't work and I needed to install tensorflow-gpu with "pip install tensorflow-gpu". Then GPU is activated as expected:"Mogerly
can you check that you do not have several tensorflow versions installed by running pip list and check for all lines with tensorflowUdela
Ran pip list and I get tensorflow (1.2.0) tensorflow-gpu (1.2.0), is that normal? or does it mean I have a normal tensorflow and a gpu one installed? If so, can I uninstall the standard one?Cobra
You should unistall tensorflow and keep tensorflow-gpu: pip uninstall tensorflowDelight
Okay I uninstalled tensorflow using pip, and now pip list shows only tensorflow-gpu but now I get import errors "no module named tensorflow" when I run my codeCobra
Also tried to do pip install tensorflow-gpu instead of pip3 but it says it's all already installed. Still getting No module named tensorlfow however, any ideas?Cobra
Okay, I think I fixed it. I think when I uninstalled tensorflow it deleted the init.py file or something. So I ran pip install --ignore-installed --upgrade and now this from tensorflow.python.client import device_lib print(device_lib.list_local_devices()) shows the gpu as one of the devices.Cobra
I tried the above steps, it doesnt show gpu as a device. Tensorflow-gpu and tensorflow-tensorboard are shown in list of installed. Any help?Spine
I think it worth mentioning this link for ubuntu users, it was super helpful for me : https://github.com/williamFalcon/tensorflow-gpu-install-ubuntu-16.04Inclined
Possible duplicate of Keras with TensorFlow backend not using GPUAttest
for ver>1.15, tensorflow-gpu is included with tensorflow tensorflow.org/install/gpuClinic
P
44

To check which devices are available to TensorFlow you can use this and see if the GPU cards are available:

from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())

More info

There are also C++ logs available controlled by the TF_CPP_MIN_VLOG_LEVEL env variable, e.g.:

import os
os.environ["TF_CPP_MIN_VLOG_LEVEL"] = "2"

should allow them to be printed when running import tensorflow as tf.

You should see this kind of logs if you use GPU-enabled tensorflow with proper access to the GPU machine:

successfully opened CUDA library libcublas.so.*.* locally
successfully opened CUDA library libcudnn.so.*.*  locally
successfully opened CUDA library libcufft.so.*.*  locally

On the other hand, if there are no CUDA libraries in the system / container, you will see:

Could not find cuda drivers on your machine, GPU will not be used.

and where CUDA are installed, but there is no GPU physically available, TF will import cleanly and error only later, when you run device_lib.list_local_devices() with this:

failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected

Psychokinesis answered 29/6, 2017 at 15:30 Comment(2)
They are C++ logs and are controlled by the TF_CPP_MIN_VLOG_LEVEL env variable, e.g.: export TF_CPP_MIN_VLOG_LEVEL=2 should allow them to be printed when running import tensorflow as tf.Psychokinesis
In my case, TF_CPP_MAX_VLOG_LEVEL=2 works instead of TF_CPP_MIN_VLOG_LEVEL=2.Ichthyornis
T
21

It may sound dumb, but try reboot. It helped me and some other folks in GitHub.

Tack answered 18/3, 2018 at 15:16 Comment(3)
Same here. WTF. Been struggling for two days, a single reboot helped :|Mcclung
I love you. In my case it was probably due to not rebooting after changing the driver from X to NVidia.Septuple
I was using TensorFlow GPU for months and suddenly it stopped using the GPU. Reboot solved it. Thanks.Eglantine
Q
10

I was still having trouble getting GPU support even after correctly installing tensorflow-gpu via pip. My problem was that I had installed tensorflow 1.5, and CUDA 9.1 (the default version Nvidia directs you to), whereas the precompiled tensorflow 1.5 works with CUDA versions <= 9.0. Here is download page on nvidia's site to get the correct CUDA 9.0:

https://developer.nvidia.com/cuda-90-download-archive

Also make sure to update your cuDNN to a version compatible with CUDA 9.0 https://developer.nvidia.com/cudnn https://developer.nvidia.com/rdp/cudnn-download

Qualifier answered 1/2, 2018 at 21:16 Comment(0)
T
2

If you happen to using Anaconda to manage your environments => uninstall all existing versions of tensorflow

pip uninstall tensorflow
pip3 uninstall tensorflow

Install tensorflow-gpu using conda

conda install tensorflow-gpu

If you don't mind starting from a new environment tho the easiest way to do so without

conda create --name tf_gpu tensorflow-gpu 

creates a new conda environment with the name tf_gpu with tensorflow gpu installed

Trichromatic answered 10/8, 2021 at 6:54 Comment(1)
I believe the GPU-only version is now tensorflow and tensorflow-gpu is outdated. "For the CPU-only build use the pip package named tensorflow-cpu."Invincible
B
2

If you have problems running Tensorflow in the GPU, you should check if you have good / any versions of CUDA and cuDNN installed.

These versions should be ideally exactly the same as those tested to work by the devs here. For example for tensorflow==2.8.0 you should have CUDA v11.2 and cuDNN v8.1.

Also, you should add CUDA /bin folder and /libnvvp to system PATH.

This answer is based on this tutorial Tensorflow 2021 install tutorial.

Bowse answered 31/3, 2022 at 23:27 Comment(1)
VERY good point indeed - TF is completely opposite to PyTorch (which comes with its own cuDNN library bundled in): TF will not complain at all (silently falling back to CPU) if you don't have cuDNN installed at all (e.g. using 11.8.0-devel-ubuntu22.04 container image instead of 11.8.0-cudnn8-devel-ubuntu22.04). What saved you 1.5 GB per image before, will later squander you a day or so in debug time).Skantze
S
2

You may also have CUDA versions mismatch than needs to be solved one way or the other (downgrading / pinning tensorflow to the latest version supported by your system CUDA is arguably quicker, but only doing the opposite is future-proof).

To verify, check CUDA versions used in your installed Tensorflow package:

>>> import tensorflow as tf
>>> tf.sysconfig.get_build_info()['cuda_version']
'11.8'

... and compare it with the CUDA version installed on the host / in the container / VM:

>>> import os
>>> os.system("nvcc --version")

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Feb_14_21:12:58_PST_2021
Cuda compilation tools, release 11.2, V11.2.152
Build cuda_11.2.r11.2/compiler.29618528_0
0

More info

When tensorflow imports cleanly (without any warnings), but it detects only CPU on a GPU-equipped machine with CUDA libraries installed, then you may also have a CUDA versions mismatch between the pre-compiled tensorflow package wheel and the system / container-installed versions.

The above CUDA versions mismatch (v11.8 used during Tensorflow compilation vs. v11.2 CUDA compiler installed in the container) resulted in TF without GPU access, despite nvidia-smi loading correctly).

See also: Tensorflow CUDA compatibility table (tested build configurations):

Skantze answered 3/6, 2023 at 11:50 Comment(0)
W
1

For me the following worked.

I used conda environment, as python environment meant setting LD_LIBRARY_PATH and installing Cuda manually which is an another mess.

In the mentioned blog, he have installed cudatoolkit and cudann inside conda and then installed tensorflow-gpu later which fixed the problem.

P.S, as far as I read, cudatoolkit and cudann plays huge role in getting your code running on tensorflow-gpu.

Woodchuck answered 13/1, 2021 at 19:17 Comment(0)
Z
1

I ran into a similar problem I had the follwing versions of tensor flow libraries.

tensorboard               2.4.1              pyhd8ed1ab_1    conda-forge
tensorboard-plugin-wit    1.8.0              pyh44b312d_0    conda-forge
tensorflow                2.4.1            py39hf3d152e_0    conda-forge
tensorflow-base           2.4.1            py39h23a8cbf_0    conda-forge
tensorflow-estimator      2.4.0              pyh9656e83_0    conda-forge
tensorflow-gpu            2.4.1                h30adc30_0

The same version of libraries were installed in another machine where it was able to utilise the GPU. The Cuda toolkit version and driver versions were the same in both machines( the machine where it was working and the one where it wasnt).

Turns out the reason was that tensorflow-gpu=2.4.1 is compatible with python version 3.8.10. Changing my python version to 3.8.10 and keeping all other things unchanged worked for me !

Zloty answered 13/11, 2021 at 22:41 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.