I am trying to run Lumerical jobs on Discovery on more than one node. Even if I specify the number of nodes as 6, the scheduler only allocates a single node to my job. This significantly increases the time it takes for the job to complete.
Is there a way to run on more than one node? I am specifying 8 tasks per node and 1 CPU per task.
@raudhkha According to Slurm accounting, there was only 1 requested node and then 1 allocated node. Maybe there’s an error in the job script? Could you share the job script?
Sure, here is the text in the script (I cannot upload the file):
module load usc
module load intel/19.0.4
module load mesa-glu
module load libpng/1.2.57
module load libxslt/1.1.33
module load libxdmcp
module load lumerical
srun --mpi=pmix_v2 -n $SLURM_NTASKS fdtd-engine-ompi-lcl /project/povinell_98/raudhkha/Focussed_emission/Graphene_design/Graphene_Si_Au/GMA_a_0p2_EF_0p5_L_vary/L_vary_1.fsp
Thanks. I’m not able to reproduce the issue. That job script should work. Slurm should allocate 6 nodes. The licensed software does not affect that part. Could you double check the formatting of the job script? Maybe try re-typing from scratch in a new file. It seems Slurm is not reading the --nodes line for some reason.
Thanks @dstrong. There was an issue with the formatting. The indentation of a few lines was off, which was probably causing slurm to ignore them. I fixed that and now I am able to allocate the nodes correctly.