Slurm

From In The Wings
Jump to navigation Jump to search

SLURM is a job management package used on larger clusters.

Handy Commands

Listing jobs that have run on specific nodes

[jakers@adm2 hipergator]$ sacct --nodelist=c39a-s39 --starttime=2024-02-15T13:30
JobID           JobName  Partition    Account  AllocCPUS      State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
23813188_7      helixer        gpu jeremybra+          4  NODE_FAIL      0:0
23813188_7.+      batch            jeremybra+          4  CANCELLED
23813188_8      helixer        gpu jeremybra+          4  NODE_FAIL      0:0
23813188_8.+      batch            jeremybra+          4  CANCELLED
23813188_9      helixer        gpu jeremybra+          4  NODE_FAIL      0:0
23813188_9.+      batch            jeremybra+          4  CANCELLED

Listing jobs that you have the jobid for

[jakers@adm2 comsol]$ sacct -j 23170593 --format=jobid,jobname,nodelist
JobID           JobName        NodeList
------------ ---------- ---------------
23170593        t5_roct      c1007a-s29
23170593.ba+      batch      c1007a-s29

Fields Available

                 Fields available:

                 Account             AdminComment        AllocCPUS           AllocNodes
                 AllocTRES           AssocID             AveCPU              AveCPUFreq
                 AveDiskRead         AveDiskWrite        AvePages            AveRSS
                 AveVMSize           BlockID             Cluster             Comment
                 Constraints         Container           ConsumedEnergy      ConsumedEnergyRaw
                 CPUTime             CPUTimeRAW          DBIndex             DerivedExitCode
                 Elapsed             ElapsedRaw          Eligible            End
                 ExitCode            Flags               GID                 Group
                 JobID               JobIDRaw            JobName             Layout
                 MaxDiskRead         MaxDiskReadNode     MaxDiskReadTask     MaxDiskWrite
                 MaxDiskWriteNode    MaxDiskWriteTask    MaxPages            MaxPagesNode
                 MaxPagesTask        MaxRSS              MaxRSSNode          MaxRSSTask
                 MaxVMSize           MaxVMSizeNode       MaxVMSizeTask       McsLabel
                 MinCPU              MinCPUNode          MinCPUTask          NCPUS
                 NNodes              NodeList            NTasks              Priority
                 Partition           QOS                 QOSRAW              Reason
                 ReqCPUFreq          ReqCPUFreqMin       ReqCPUFreqMax       ReqCPUFreqGov
                 ReqCPUS             ReqMem              ReqNodes            ReqTRES
                 Reservation         ReservationId       Reserved            ResvCPU
                 ResvCPURAW          Start               State               Submit
                 SubmitLine          Suspended           SystemCPU           SystemComment
                 Timelimit           TimelimitRaw        TotalCPU            TRESUsageInAve
                 TRESUsageInMax      TRESUsageInMaxNode  TRESUsageInMaxTask  TRESUsageInMin
                 TRESUsageInMinNode  TRESUsageInMinTask  TRESUsageInTot      TRESUsageOutAve
                 TRESUsageOutMax     TRESUsageOutMaxNode TRESUsageOutMaxTask TRESUsageOutMin
                 TRESUsageOutMinNode TRESUsageOutMinTask TRESUsageOutTot     UID
                 User                UserCPU             WCKey               WCKeyID
                 WorkDir