CUDA error (no kernel image)

junw · March 29, 2021, 6:44pm

Hello,
I have a ‘CUDA’ problem when running python. Below are the scripts and the error message. While it said ‘torch.cuda.is_available’(True), the error message said ‘no CUDA kernel image is available’.

Any advice would be appreciated! thank you.

$ salloc --time=5:00:00 --mem=1Gb --gres=gpu:k40:1
$ module load gcc
$ module load cuda/11.1-1
$ python

import torch
import sys
print(‘A’, sys.version)
A 3.8.5 (default, Sep 4 2020, 07:30:14)
[GCC 7.3.0]
print(‘B’, torch.version)
B 1.8.1
print(‘C’, torch.cuda.is_available())
C True
print(‘D’, torch.backends.cudnn.enabled)
D True
device = torch.device(‘cuda’)
print(‘E’, torch.cuda.get_device_properties(device))
E _CudaDeviceProperties(name=‘Tesla K40m’, major=3, minor=5, total_memory=11441MB, multi_processor_count=15)
print(‘F’, torch.tensor([1.0, 2.0]).cuda())
Traceback (most recent call last):
File “”, line 1, in
File “/home1/junw/miniconda3/lib/python3.8/site-packages/torch/tensor.py”, line 193, in repr
return torch._tensor_str._str(self)
File “/home1/junw/miniconda3/lib/python3.8/site-packages/torch/_tensor_str.py”, line 383, in _str
return _str_intern(self)
File “/home1/junw/miniconda3/lib/python3.8/site-packages/torch/_tensor_str.py”, line 358, in _str_intern
tensor_str = _tensor_str(self, indent)
File “/home1/junw/miniconda3/lib/python3.8/site-packages/torch/_tensor_str.py”, line 242, in _tensor_str
formatter = _Formatter(get_summarized_data(self) if summarize else self)
File “/home1/junw/miniconda3/lib/python3.8/site-packages/torch/_tensor_str.py”, line 90, in init
nonzero_finite_vals = torch.masked_select(tensor_view, torch.isfinite(tensor_view) & tensor_view.ne(0))
RuntimeError: CUDA error: no kernel image is available for execution on the device

dstrong · March 31, 2021, 12:39am

@junw The recent versions of pytorch (distributed as binaries) do not support older GPU models by default. So you could use a p100 or v100 GPU instead, or alternatively you could install pytorch from source in order to use k40 nodes. Try the following:

pip3 install --pre torch torchvision -f https://download.pytorch.org/whl/nightly/cu111/torch_nightly.html -U

junw · March 31, 2021, 5:02pm

Thank you very much for your reply!

I actually tried to use other types of GPU earlier but the job got ‘killed’. Below is the error message.

$ salloc --time=5:00:00 --mem=1Gb --gres=gpu:p100:1
$ module load gcc
$ module load cuda/11.1-1
$ python

import torch
import sys
print(‘A’, sys.version)
A 3.8.5 (default, Sep 4 2020, 07:30:14)
[GCC 7.3.0]
print(‘B’, torch.version)
B 1.8.1
print(‘C’, torch.cuda.is_available())
C True
print(‘D’, torch.backends.cudnn.enabled)
D True
device = torch.device(‘cuda’)
print(‘E’, torch.cuda.get_device_properties(device))
E _CudaDeviceProperties(name=‘Tesla P100-PCIE-16GB’, major=6, minor=0, total_memory=16280MB, multi_processor_count=56)
print(‘F’, torch.tensor([1.0, 2.0]).cuda())
Killed

I just tried to install pytorch from the source using the scripts you provided but I’m still getting the same error message: “RuntimeError: CUDA error: no kernel image is available for execution on the device”, when requesting GPU k40 nodes.

Any suggestion? Thanks much for the help.

dstrong · March 31, 2021, 6:36pm

It looks like your job was killed because it ran out of memory. Try requesting more memory when using the p100 node. Also, which python are you using? It looks like you’re using a conda environment (we have no gcc/7.3.0). If I use the latest python/3.9.2 module, the install from source worked for me.

junw · March 31, 2021, 7:51pm

After increasing memory to 4gb when request the p100 node, it works!
Thanks much for your help.

And yes, I’m using a conda environment for ‘tensorQTL’ and the python is an older version 3.8.5