How to get the ID of GPU allocated to a SLURM job on a multiple GPUs node?
Asked Answered
A

3

12

When I submit a SLURM job with the option --gres=gpu:1 to a node with two GPUs, how can I get the ID of the GPU which is allocated for the job? Is there an environment variable for this purpose? The GPUs I'm using are all nvidia GPUs. Thanks.

Andreandrea answered 14/5, 2017 at 18:22 Comment(0)
U
5

You can get the GPU id with the environment variable CUDA_VISIBLE_DEVICES. This variable is a comma separated list of the GPU ids assigned to the job.

Usn answered 14/5, 2017 at 19:37 Comment(2)
It works. Thanks. It seems that the environment variable GPU_DEVICE_ORDINAL also works.Andreandrea
This doesn't identify the GPU uniquely when using cgroups. With cgroups, CUDA_VISIBLE_DEVICES would be 0 for all GPUs because each process only sees a single GPU (others are hidden by the cgroup).Bantustan
G
5

You can check the environment variables SLURM_STEP_GPUS or SLURM_JOB_GPUS for a given node:

echo ${SLURM_STEP_GPUS:-$SLURM_JOB_GPUS}

Note CUDA_VISIBLE_DEVICES may not correspond to the real value (see @isarandi's comment).

Also, note this should work for non-Nvidia GPUs as well.

Glovsky answered 13/1, 2021 at 20:12 Comment(0)
W
3

Slurm stores this information in an environment variable, either SLURM_JOB_GPUS or SLURM_STEP_GPUS.

One way to keep track of such information is to log all SLURM related variables when running a job, for example (following Kaldi's slurm.pl, which is a great script to wrap Slurm jobs) by including the following command within the script run by sbatch:

set | grep SLURM | while read line; do echo "# $line"; done
Whitethorn answered 21/7, 2019 at 1:54 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.