I remotely access High-performance computing nodes. I am not sure about NVIDIA Collective Communications Library (NCCL) is installed in my directory or not. Is there any way to check whether the NCCL is installed or not?
How to check the version of NCCL
Asked Answered
You can try
locate nccl| grep "libnccl.so" | tail -n1 | sed -r 's/^.*\.so\.//'
or if you use PyTorch:
python -c "import torch;print(torch.cuda.nccl.version())"
Check it this link Command Cheatsheet: Checking Versions of Installed Software / Libraries / Tools for Deep Learning on Ubuntu
For containers, where no locate
is available sometimes, one might replace it with ldconfig -v
:
ldconfig -v | grep "libnccl.so" | tail -n1 | sed -r 's/^.*\.so\.//'
When I enter `locate nccl| grep "libnccl.so" | tail -n1 | sed -r 's/^.*\.so\.//'1, it show nothing. –
Amorphous
torch/lib/libtorch_cuda.so: undefined symbol: ncclCommRegister –
Pale
You can usually do this in the command line:
nvcc --version
you might have to run:
sudo apt install nvidia-cuda-toolkit
too.
As the other answerer mentioned, you can do:
torch.cuda.nccl.version()
in pytorch. Copy paste this into your terminal:
python -c "import torch;print(torch.cuda.nccl.version())"
I am sure there is something like that in tensorflow.
NVCC is a general CUDA C++ compiler. It doesn't report NCCL (communications library) version. The first part of the answer is wrong. –
Gillmore
i tried
print(torch.cuda.nccl.version())
, it throws me this error: AttributeError: module 'torch._C' has no attribute '_nccl_version'
–
Measly © 2022 - 2024 — McMap. All rights reserved.
nvcc --version
? – Stithypython -c "import torch;print(torch.cuda.nccl.version())"
with pytorch. I wish I new the terminal command without pytorch. – Stithyprint(torch.cuda.nccl.version())
, it throws me this error:AttributeError: module 'torch._C' has no attribute '_nccl_version'
– Measly