Is there a way to set all my GPUs to NOT be XLA so I can train with multiple gpus rather than just one?
Asked Answered
P

1

6

I would like to train keras models using multiple GPUs. My understanding is that you cannot currently train multiple gpus using XLA. The issue is I can't figure out how to turn off XLA. Every GPU is listed as an xla gpu.

For reference, I am using 3 RTX2070s on the latest Ubuntu desktop. nvidia-smi does indeed show all 3 gpus.

I have tried uninstalling and reinstalling tensorflow-gpu. That does not help.

from

keras.utils.training_utils import multi_gpu_model
model = multi_gpu_model(model,gpus=3)

ValueError:

 To call `multi_gpu_model` with `gpus=3`, we expect the following devices to be available: ['/cpu:0', '/gpu:0', '/gpu:1', '/gpu:2']. However this machine only has: ['/cpu:0', '/xla_cpu:0', '/xla_gpu:0', '/xla_gpu:1', '/xla_gpu:2']. Try reducing `gpus`.

EDIT: I am using tensorflow-gpu and actually I've just confirmed it isn't even using one gpu. I confirmed this by cranking up the batch size to 10,000 and saw no change to nvidia-smi but I did see changes to the cpu/memory usage via htop.

EDIT2:

tf.test.gpu_device_name()

prints just an empty string

whereas

    from tensorflow.python.client import device_lib
    print(device_lib.list_local_devices())

prints all of my devices...
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 7781250607362587360
, name: "/device:XLA_CPU:0"
device_type: "XLA_CPU"
memory_limit: 17179869184
locality {
}
incarnation: 12317810384332135154
physical_device_desc: "device: XLA_CPU device"
, name: "/device:XLA_GPU:0"
device_type: "XLA_GPU"
memory_limit: 17179869184
locality {
}
incarnation: 1761593194774305176
physical_device_desc: "device: XLA_GPU device"
, name: "/device:XLA_GPU:1"
device_type: "XLA_GPU"
memory_limit: 17179869184
locality {
}
incarnation: 11323027499711415341
physical_device_desc: "device: XLA_GPU device"
, name: "/device:XLA_GPU:2"
device_type: "XLA_GPU"
memory_limit: 17179869184
locality {
}
incarnation: 3573490477127930095
physical_device_desc: "device: XLA_GPU device"
]
Ponderous answered 9/8, 2019 at 23:39 Comment(4)
I have the same issue!Outrank
Is there any solution to this issue? I am facing the same problem.Zendavesta
Facing the same problem!Sublimation
facing some issues. Somehow XLA got turned on and I do not know how to turn it off. Therefore my tensorflow programs that used to run using the gpu no longer use the gpu just the cpu.. Anone ever find the solution??Walloon
M
0

I faced this problem either.

Sometimes I fixed it by reinstalling the tensorflow-gpu package.

pip uninstall tensorflow-gpu
pip install tensorflow-gpu

However, sometimes these commands didn't work. So I tried the following ones and it works surprisingly.

conda install -c anaconda tensorflow-gpu
Marguerita answered 6/3, 2022 at 7:24 Comment(1)
Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.Niple

© 2022 - 2024 — McMap. All rights reserved.