Hi,
This is a very late response but I was going to do a blog post on this topic and was reminded of this question while doing research. Here’s a hint from the schedmd mail list. We have to launch python with srun
. The explanation seems to be that using srun
puts your python script into a job step while launching it from the job script creates a child process. Apparently only the batch script or job steps can be sent signals.
Anyway, your job script should look like this
#!/bin/bash
#SBATCH --time=00:01:30
#SBATCH --partition=debug
#SBATCH --signal=USR1
module load gcc/11.3.0
module load python
echo 'job started!'
srun python3 ascript.py
echo 'job ended!'
Note that bash will try to interpret the !
character if its in "
quotes but not '
quotes.
ascript.py
should look the same
import signal
import time
def handler(signum, frame):
print('Signal handler called with signal', signum)
exit(0)
signal.signal(signal.SIGUSR1, handler)
signal.signal(signal.SIGTERM, handler)
signal.signal(signal.SIGINT, handler)
# do nothing
print("Going to sleep...")
time.sleep(100000)
Finally, for anyone wondering how this is useful. One thing you can do is put functionality in your signal handler function to save the state of your program before it gets terminated so you can pick up where you left off in the next job.