Launcher now available

dstrong · July 26, 2021, 11:52pm

Launcher is a utility for performing simple, data parallel, high-throughput computing (HTC) workflows on clusters, massively parallel processor (MPP) systems, workgroups of computers, and personal machines. It is designed for running large collections of serial or multi-threaded applications in a single batch job. Launcher can be used as an alternative to job arrays and to pack many (short-running) jobs into one batch job.

With Launcher, you can run a sequence of defined jobs within a single batch job even when you have more jobs to run than the requested number of processors. The number of available processors simply determines the upper limit on the number of jobs that can be run at the same time.

See the following links for more information:

https://www.tacc.utexas.edu/research-development/tacc-software/the-launcher
https://github.com/uschpc/launcher/tree/uschpc

Example

You can load the module with module load launcher.

Running a Launcher job requires at least two things:

A job script that requests resources and configures Launcher
A launcher job file that contains jobs to run (one job per line)

In the $LAUNCHER_ROOT/examples/slurm directory is an example for running a Launcher job on a Slurm cluster, launcher-serial.slurm and helloworld-output.txt. You can copy these files to one of your directories to run a test Launcher job. An example Slurm job script is copied below:

#!/bin/bash

#SBATCH --nodes=1
#SBATCH --ntasks=16
#SBATCH --cpus-per-task=1
#SBATCH --mem=0
#SBATCH --time=00:10:00
#SBATCH --partition=debug

module purge
module load launcher
module load usc hwloc

export LAUNCHER_DIR=$LAUNCHER_ROOT
export LAUNCHER_RMI=SLURM
export LAUNCHER_PLUGIN_DIR=$LAUNCHER_DIR/plugins
export LAUNCHER_SCHED=interleaved
export LAUNCHER_BIND=1
export LAUNCHER_WORKDIR=$PWD
export LAUNCHER_JOB_FILE=helloworld-output.txt

$LAUNCHER_DIR/paramrun

This example uses one compute node with 16 CPUs, but you can also use multiple nodes and increase the number of tasks if you have a very large number of jobs. Apart from the Slurm options, the main line to edit is LAUNCHER_JOB_FILE, which points to the file that contains the list of commands (i.e., jobs) to run. This is in the $PWD in this example.

In this simple example, the file helloworld-output.txt contains many duplicate lines like the following:

echo "Hello world from job $LAUNCHER_JID running on task $LAUNCHER_TSK_ID" >& job-$LAUNCHER_JID.log
echo "Hello world from job $LAUNCHER_JID running on task $LAUNCHER_TSK_ID" >& job-$LAUNCHER_JID.log
echo "Hello world from job $LAUNCHER_JID running on task $LAUNCHER_TSK_ID" >& job-$LAUNCHER_JID.log
echo "Hello world from job $LAUNCHER_JID running on task $LAUNCHER_TSK_ID" >& job-$LAUNCHER_JID.log
echo "Hello world from job $LAUNCHER_JID running on task $LAUNCHER_TSK_ID" >& job-$LAUNCHER_JID.log

Launcher will schedule each line as a job on one of the tasks (processors) requested. In this serial example, the number of tasks is equal to the number of processors, 16. So 16 jobs will run at one time until all jobs are completed.

In this example, the output of each job is also saved to a unique log file. For example, the job-1.log file would contain the output Hello world from job 1 running on task 0.