Slurm commands for monitoring jobs now available

Following up on our last post, we have now made available three commands for monitoring your jobs on CARC systems. These are available on the login nodes.

myqueue

Print job queue information for user.

$ myqueue
------------------------------------------------------------------------------
      Job ID     Job Name  Partition    State     Elapsed     Nodelist(Reason)
------------ ------------ ---------- -------- ----------- --------------------
487418            sim2.sl       main  PENDING        0:00         (Dependency)
486794             sim.sl       main  RUNNING  1-12:13:20               d05-06

jobhist

Print a compact history of user’s jobs with basic job information.

$ jobhist
-----------------------------------------------------------------------------------------
 Startdate        Job ID     Job Name  Partition      State    Elapsed Nodes CPUs  Memory
---------- ------------- ------------ ---------- ---------- ---------- ----- ---- -------
2021-06-25 484933         debugsim.sl      debug     FAILED   00:00:05     1    4    30Gn
2021-06-25 484934         debugsim.sl      debug  COMPLETED   00:00:19     1    4    30Gn
2021-06-26 486288         interactive       main  COMPLETED   00:20:53     1   16     2Gc
2021-06-27 486290         interactive      debug    TIMEOUT   00:30:02     1   16     2Gc
2021-06-28 486624              sim.sl    oneweek  COMPLETED 3-00:30:22     1   16   120Gn

Add an integer argument to the command to print the job history for that many days in the past. The default value is 7 days.

jobinfo

Print detailed information for a pending, running, or completed job.

$ jobinfo 483699
Job ID               : 483699
Job name             : sim.sl
User                 : ttrojan
Account              : ttrojan_123
Cluster              : discovery
Partition            : oneweek
Nodes                : 1
Nodelist             : e01-76
CPUs                 : 16
GPUs                 : 0
State                : FAILED
Exit code            : 1:0
Submit time          : 2021-06-22T12:38:06
Start time           : 2021-06-22T12:38:15
End time             : 2021-06-25T12:40:53
Wait time            :   00:00:09
Reserved walltime    : 3-07:00:00
Used walltime        : 3-00:02:38
Used CPU time        : 3-14:00:01
% User (computation) : 93.82%
% System (I/O)       :  6.18%
Memory reserved      : 248G/node
Max memory used      : 218.60G (e01-76)
Max disk write       : 143.41G (e01-76)
Max disk read        : 352.60G (e01-76)

We will also add job efficiency information to this output at a later date.

4 Likes