Hello,
I have a ‘CUDA’ problem when running python. Below are the scripts and the error message. While it said ‘torch.cuda.is_available’(True), the error message said ‘no CUDA kernel image is available’.
import torch
import sys
print(‘A’, sys.version)
A 3.8.5 (default, Sep 4 2020, 07:30:14)
[GCC 7.3.0]
print(‘B’, torch.version)
B 1.8.1
print(‘C’, torch.cuda.is_available())
C True
print(‘D’, torch.backends.cudnn.enabled)
D True
device = torch.device(‘cuda’)
print(‘E’, torch.cuda.get_device_properties(device))
E _CudaDeviceProperties(name=‘Tesla K40m’, major=3, minor=5, total_memory=11441MB, multi_processor_count=15)
print(‘F’, torch.tensor([1.0, 2.0]).cuda())
Traceback (most recent call last):
File “”, line 1, in
File “/home1/junw/miniconda3/lib/python3.8/site-packages/torch/tensor.py”, line 193, in repr
return torch._tensor_str._str(self)
File “/home1/junw/miniconda3/lib/python3.8/site-packages/torch/_tensor_str.py”, line 383, in _str
return _str_intern(self)
File “/home1/junw/miniconda3/lib/python3.8/site-packages/torch/_tensor_str.py”, line 358, in _str_intern
tensor_str = _tensor_str(self, indent)
File “/home1/junw/miniconda3/lib/python3.8/site-packages/torch/_tensor_str.py”, line 242, in _tensor_str
formatter = _Formatter(get_summarized_data(self) if summarize else self)
File “/home1/junw/miniconda3/lib/python3.8/site-packages/torch/_tensor_str.py”, line 90, in init
nonzero_finite_vals = torch.masked_select(tensor_view, torch.isfinite(tensor_view) & tensor_view.ne(0))
RuntimeError: CUDA error: no kernel image is available for execution on the device
@junw The recent versions of pytorch (distributed as binaries) do not support older GPU models by default. So you could use a p100 or v100 GPU instead, or alternatively you could install pytorch from source in order to use k40 nodes. Try the following:
import torch
import sys
print(‘A’, sys.version)
A 3.8.5 (default, Sep 4 2020, 07:30:14)
[GCC 7.3.0]
print(‘B’, torch.version)
B 1.8.1
print(‘C’, torch.cuda.is_available())
C True
print(‘D’, torch.backends.cudnn.enabled)
D True
device = torch.device(‘cuda’)
print(‘E’, torch.cuda.get_device_properties(device))
E _CudaDeviceProperties(name=‘Tesla P100-PCIE-16GB’, major=6, minor=0, total_memory=16280MB, multi_processor_count=56)
print(‘F’, torch.tensor([1.0, 2.0]).cuda())
Killed
I just tried to install pytorch from the source using the scripts you provided but I’m still getting the same error message: “RuntimeError: CUDA error: no kernel image is available for execution on the device”, when requesting GPU k40 nodes.
It looks like your job was killed because it ran out of memory. Try requesting more memory when using the p100 node. Also, which python are you using? It looks like you’re using a conda environment (we have no gcc/7.3.0). If I use the latest python/3.9.2 module, the install from source worked for me.