Node allocation for Lumerical jobs

raudhkha · June 29, 2021, 5:06pm

Hi,

I am trying to run Lumerical jobs on Discovery on more than one node. Even if I specify the number of nodes as 6, the scheduler only allocates a single node to my job. This significantly increases the time it takes for the job to complete.

Is there a way to run on more than one node? I am specifying 8 tasks per node and 1 CPU per task.

Thanks,
Romil

dstrong · June 29, 2021, 10:46pm

@raudhkha According to Slurm accounting, there was only 1 requested node and then 1 allocated node. Maybe there’s an error in the job script? Could you share the job script?

raudhkha · June 30, 2021, 2:13am

Sure, here is the text in the script (I cannot upload the file):

#!/bin/bash
#SBATCH --time=08:00:00
#SBATCH --nodes=6
#SBATCH --ntasks-per-node=8
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=3G
#SBATCH --partition=main

    module purge
    module load usc
    module load intel/19.0.4
    module load mesa-glu
    module load libpng/1.2.57
    module load libxslt/1.1.33
    module load libxdmcp
    module load lumerical

srun --mpi=pmix_v2 -n $SLURM_NTASKS fdtd-engine-ompi-lcl /project/povinell_98/raudhkha/Focussed_emission/Graphene_design/Graphene_Si_Au/GMA_a_0p2_EF_0p5_L_vary/L_vary_1.fsp

dstrong · June 30, 2021, 8:09pm

Thanks. I’m not able to reproduce the issue. That job script should work. Slurm should allocate 6 nodes. The licensed software does not affect that part. Could you double check the formatting of the job script? Maybe try re-typing from scratch in a new file. It seems Slurm is not reading the --nodes line for some reason.

raudhkha · June 30, 2021, 11:28pm

Thanks @dstrong. There was an issue with the formatting. The indentation of a few lines was off, which was probably causing slurm to ignore them. I fixed that and now I am able to allocate the nodes correctly.