I am a CARC user with intensive gpu requirements. For now, my jobs need to queue for a long time, such that even squeue --start -j
cannot show the expected starting time.
After checking the current job queue, I found that there are a lot of jobs that do not require GPUs keep using the gpu
partition. Most of them requests huge amount of CPUs in one node, in that case, even though GPUs are idle, our jobs cannot get allocated because of the storage of CPU cores.
Here are the screenshots of those non-GPU jobs on gpu
partition. I cannot appreciate it more if someone could help me understand the situation.