Difference between revisions of "Slurm"
Jump to navigation
Jump to search
(Created page with "SLURM is a job management package used on larger clusters. ==Handy Commands== ===Listing jobs that have run on specific nodes=== <pre> [jakers@adm2 hipergator]$ sacct --node...") |
|||
| Line 13: | Line 13: | ||
23813188_9 helixer gpu jeremybra+ 4 NODE_FAIL 0:0 | 23813188_9 helixer gpu jeremybra+ 4 NODE_FAIL 0:0 | ||
23813188_9.+ batch jeremybra+ 4 CANCELLED | 23813188_9.+ batch jeremybra+ 4 CANCELLED | ||
| + | </pre> | ||
| + | ===Listing jobs that you have the jobid for=== | ||
| + | <pre> | ||
| + | [jakers@adm2 comsol]$ sacct -j 23170593 --format=jobid,jobname,nodelist | ||
| + | JobID JobName NodeList | ||
| + | ------------ ---------- --------------- | ||
| + | 23170593 t5_roct c1007a-s29 | ||
| + | 23170593.ba+ batch c1007a-s29 | ||
| + | </pre> | ||
| + | ===Fields Available=== | ||
| + | <pre> | ||
| + | Fields available: | ||
| + | |||
| + | Account AdminComment AllocCPUS AllocNodes | ||
| + | AllocTRES AssocID AveCPU AveCPUFreq | ||
| + | AveDiskRead AveDiskWrite AvePages AveRSS | ||
| + | AveVMSize BlockID Cluster Comment | ||
| + | Constraints Container ConsumedEnergy ConsumedEnergyRaw | ||
| + | CPUTime CPUTimeRAW DBIndex DerivedExitCode | ||
| + | Elapsed ElapsedRaw Eligible End | ||
| + | ExitCode Flags GID Group | ||
| + | JobID JobIDRaw JobName Layout | ||
| + | MaxDiskRead MaxDiskReadNode MaxDiskReadTask MaxDiskWrite | ||
| + | MaxDiskWriteNode MaxDiskWriteTask MaxPages MaxPagesNode | ||
| + | MaxPagesTask MaxRSS MaxRSSNode MaxRSSTask | ||
| + | MaxVMSize MaxVMSizeNode MaxVMSizeTask McsLabel | ||
| + | MinCPU MinCPUNode MinCPUTask NCPUS | ||
| + | NNodes NodeList NTasks Priority | ||
| + | Partition QOS QOSRAW Reason | ||
| + | ReqCPUFreq ReqCPUFreqMin ReqCPUFreqMax ReqCPUFreqGov | ||
| + | ReqCPUS ReqMem ReqNodes ReqTRES | ||
| + | Reservation ReservationId Reserved ResvCPU | ||
| + | ResvCPURAW Start State Submit | ||
| + | SubmitLine Suspended SystemCPU SystemComment | ||
| + | Timelimit TimelimitRaw TotalCPU TRESUsageInAve | ||
| + | TRESUsageInMax TRESUsageInMaxNode TRESUsageInMaxTask TRESUsageInMin | ||
| + | TRESUsageInMinNode TRESUsageInMinTask TRESUsageInTot TRESUsageOutAve | ||
| + | TRESUsageOutMax TRESUsageOutMaxNode TRESUsageOutMaxTask TRESUsageOutMin | ||
| + | TRESUsageOutMinNode TRESUsageOutMinTask TRESUsageOutTot UID | ||
| + | User UserCPU WCKey WCKeyID | ||
| + | WorkDir | ||
</pre> | </pre> | ||
Latest revision as of 14:06, 21 February 2024
SLURM is a job management package used on larger clusters.
Contents
Handy Commands
Listing jobs that have run on specific nodes
[jakers@adm2 hipergator]$ sacct --nodelist=c39a-s39 --starttime=2024-02-15T13:30 JobID JobName Partition Account AllocCPUS State ExitCode ------------ ---------- ---------- ---------- ---------- ---------- -------- 23813188_7 helixer gpu jeremybra+ 4 NODE_FAIL 0:0 23813188_7.+ batch jeremybra+ 4 CANCELLED 23813188_8 helixer gpu jeremybra+ 4 NODE_FAIL 0:0 23813188_8.+ batch jeremybra+ 4 CANCELLED 23813188_9 helixer gpu jeremybra+ 4 NODE_FAIL 0:0 23813188_9.+ batch jeremybra+ 4 CANCELLED
Listing jobs that you have the jobid for
[jakers@adm2 comsol]$ sacct -j 23170593 --format=jobid,jobname,nodelist JobID JobName NodeList ------------ ---------- --------------- 23170593 t5_roct c1007a-s29 23170593.ba+ batch c1007a-s29
Fields Available
Fields available:
Account AdminComment AllocCPUS AllocNodes
AllocTRES AssocID AveCPU AveCPUFreq
AveDiskRead AveDiskWrite AvePages AveRSS
AveVMSize BlockID Cluster Comment
Constraints Container ConsumedEnergy ConsumedEnergyRaw
CPUTime CPUTimeRAW DBIndex DerivedExitCode
Elapsed ElapsedRaw Eligible End
ExitCode Flags GID Group
JobID JobIDRaw JobName Layout
MaxDiskRead MaxDiskReadNode MaxDiskReadTask MaxDiskWrite
MaxDiskWriteNode MaxDiskWriteTask MaxPages MaxPagesNode
MaxPagesTask MaxRSS MaxRSSNode MaxRSSTask
MaxVMSize MaxVMSizeNode MaxVMSizeTask McsLabel
MinCPU MinCPUNode MinCPUTask NCPUS
NNodes NodeList NTasks Priority
Partition QOS QOSRAW Reason
ReqCPUFreq ReqCPUFreqMin ReqCPUFreqMax ReqCPUFreqGov
ReqCPUS ReqMem ReqNodes ReqTRES
Reservation ReservationId Reserved ResvCPU
ResvCPURAW Start State Submit
SubmitLine Suspended SystemCPU SystemComment
Timelimit TimelimitRaw TotalCPU TRESUsageInAve
TRESUsageInMax TRESUsageInMaxNode TRESUsageInMaxTask TRESUsageInMin
TRESUsageInMinNode TRESUsageInMinTask TRESUsageInTot TRESUsageOutAve
TRESUsageOutMax TRESUsageOutMaxNode TRESUsageOutMaxTask TRESUsageOutMin
TRESUsageOutMinNode TRESUsageOutMinTask TRESUsageOutTot UID
User UserCPU WCKey WCKeyID
WorkDir