Hello,
I’m trying to train a model on HPC’s GPUs using Pytorch, but when I try to do so I get the following error:
File "/home1/sommerer/.conda/envs/rocus/lib/python3.7/site-packages/torch/nn/modules/container.py", line 100, in forward
input = module(input)
File "/home1/sommerer/.conda/envs/rocus/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/home1/sommerer/.conda/envs/rocus/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 87, in forward
return F.linear(input, self.weight, self.bias)
File "/home1/sommerer/.conda/envs/rocus/lib/python3.7/site-packages/torch/nn/functional.py", line 1370, in linear
ret = torch.addmm(bias, input, weight.t())
RuntimeError: CUDA error: no kernel image is available for execution on the device
One post (https://github.com/pytorch/pytorch/issues/31285) seems to indicate that I will have to build Pytorch from source. If this is the solution, then would I have to build from source every time I submit a job to the GPUs? That doesn’t seem ideal. If anyone knows how to fix this error I’d greatly appreciate it.