Max Job running time

Dear HPC staff,

I’d like to ask if it is possible to have more than 48 hours running time for slurm jobs

Many thanks!

There is a oneweek partition that allows jobs up to 7 days of run time. To use it, add the following option to your job script:

#SBATCH --partition=oneweek

Note that there are no GPUs on this partition yet.

Thank you for your reply!
But when I add this command, after sbatch the job, this error message comes:
“Request node configuration is not avaliable”

Could you share your job script? What SBATCH options are you using?

Screen Shot 2020-11-04 at 12.16.15
Hi this is my slurm script. (I had a problem with copy/paste so I upload the screenshot)

The compute nodes on the oneweek partition only have 16 cores, so change the second option to:

#SBATCH --cpus-per-task=16

And then your job will submit successfully.

Thank you very much! It was submitted successfully. But the new problem is Node message:
“ReqNodeNotAvail, reserved for maintenance”.
I wonder if is an ongoing maintenance or there are still some problem in my script.

Yeah, there’s maintenance on Friday, so it will run once that is completed.

Thank you very much!

Hi Derek,

I have met another problem that my 9 out of 10 scripts were killed with this following message.
“terminate called after throwing an instance of ‘std::bad_alloc’
what(): std::bad_alloc”
And the only one didn’t have that error message also stopped running and had no feedback.

I wondered is that problem also due to maintenance? But my other scripts which did not run on the oneweek partition completed will.

That looks like the jobs do not have enough memory, but you are using --mem=0 to request all the available memory on a node. Most of the nodes on the oneweek partition have 64 GB, but there are two that have 256 GB. Change that line to --mem=248GB and --cpus-per-task=16 to use those nodes (some memory is reserved for Slurm, for example, so 248 GB is actually the max you can use).

Is there any potential to modify your R script to reduce memory usage? Please submit a ticket and share your R script there if needed.

Also, the local /tmp directories on compute nodes are limited. That could be part of the issue as well because R will automatically use /tmp. So try changing your TMPDIR with the shell command export TMPDIR=/scratch/$USER. This will automatically redirect temporary files to your scratch directory.