OOM on /scratch2

Hi,

My job keeps getting killed because it is OOM [specific error message: slurmstepd: error: Detected 1 oom-kill event(s) in step 8977427.batch cgroup.] . To deal with this, I kept increasing the --mem until I hit the max. However, I’m still getting the error. Strangely, it doesn’t even appear that the job is really running through much of the script…the memory also doesn’t appear to be “out” given the seff output. How do I proceed with this? I attached the seff output:

Thanks

Can you share the command/script you’re using?

btw, make sure you leave no space between the ampersand the SBATCH keyword.
#SBATCH is correct whereas # SBATCH isn’t.

1 Like

It somehow resolved on its own