I would like to train keras models using multiple GPUs. My understanding is that you cannot currently train multiple gpus using XLA. The issue is I can't figure out how to turn off XLA. Every GPU is listed as an xla gpu.
For reference, I am using 3 RTX2070s on the latest Ubuntu desktop. nvidia-smi does indeed show all 3 gpus.
I have tried uninstalling and reinstalling tensorflow-gpu
. That does not help.
from
keras.utils.training_utils import multi_gpu_model
model = multi_gpu_model(model,gpus=3)
ValueError:
To call `multi_gpu_model` with `gpus=3`, we expect the following devices to be available: ['/cpu:0', '/gpu:0', '/gpu:1', '/gpu:2']. However this machine only has: ['/cpu:0', '/xla_cpu:0', '/xla_gpu:0', '/xla_gpu:1', '/xla_gpu:2']. Try reducing `gpus`.
EDIT: I am using tensorflow-gpu
and actually I've just confirmed it isn't even using one gpu. I confirmed this by cranking up the batch size to 10,000 and saw no change to nvidia-smi but I did see changes to the cpu/memory usage via htop.
EDIT2:
tf.test.gpu_device_name()
prints just an empty string
whereas
from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())
prints all of my devices...
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 7781250607362587360
, name: "/device:XLA_CPU:0"
device_type: "XLA_CPU"
memory_limit: 17179869184
locality {
}
incarnation: 12317810384332135154
physical_device_desc: "device: XLA_CPU device"
, name: "/device:XLA_GPU:0"
device_type: "XLA_GPU"
memory_limit: 17179869184
locality {
}
incarnation: 1761593194774305176
physical_device_desc: "device: XLA_GPU device"
, name: "/device:XLA_GPU:1"
device_type: "XLA_GPU"
memory_limit: 17179869184
locality {
}
incarnation: 11323027499711415341
physical_device_desc: "device: XLA_GPU device"
, name: "/device:XLA_GPU:2"
device_type: "XLA_GPU"
memory_limit: 17179869184
locality {
}
incarnation: 3573490477127930095
physical_device_desc: "device: XLA_GPU device"
]