Nvidia-smi returns nothing

nvidia-smi returns nothing in our Condo GPU nodes under part qcbgpu. What am I missing ?

manna@b23-15(~)
(0)> nvidia-smi
No devices were found
manna@b23-15(~)
(6)> which nvidia-smi
/usr/bin/nvidia-smi
manna@b23-15(~)
(0)>

I loaded all the cuda and nvidia kit modules. Thanks,

Luigi–

It looks like a GPU was not allocated to the job. Slurm requires an extra option for that, something like:

salloc -p qcbgpu --gpus=l40s:1

Then nvidia-smi should find the GPU.

Thanks … I was actually looking for the specs of the GPUs
we purchased. In the case of the A100 mem is listed. In the case of the L40s it’s not. CUDA cores are not listed either. Which command would list these parameters ?

Thanks,

Luigi–

A command like the following will return all nvidia-smi query elements:

nvidia-smi -i 0 -q

For the GPU with ID 0.

There’s also a program called deviceQuery that comes with the CUDA toolkit. They don’t put it on PATH for some reason though, so you may have to search for it. Here’s one example:

/spack/2308/apps/linux-centos7-x86_64_v3/gcc-12.3.0/cuda-12.2.1-tymqzrw/extras/demo_suite/deviceQuery

Thanks Derek. This is very helpful.

Luigi–