Scavenge Partition

A scavenge partition is available on all of our clusters. It allows you to (a) run jobs outside of your normal limits (e.g. QOSMaxCpuPerUserLimit) and (b) use unutilized cores, if available, in any private partition on the cluster. You can also use the scavenge partition to get access to unused cores in special purpose partitions, such as the "gpu" or "mpi" partitions, and unused GPUs in private partitions.

However, any job running in the scavenge partition is subject to preemption if any node in use by the job is required for a job in the node's normal partition. This means that your job may be killed without advance notice, so you should only run jobs in the scavenge partition that either have checkpoint capabilities or that can otherwise be restarted with minimal loss of progress.

Warning

Not all jobs are a good fit for the scavenge partition, such as jobs with long startup times or jobs that run a long time between checkpoint operations.

Automatically Requeue Preempted Jobs

If you would like your job to be automatically added back to the queue if preempted, you can add the --requeue flag to your submission script.

#SBATCH --requeue

Be aware that your job, when started from a requeue, will still re-run the entire original submission script. It will only resume progress if your program has the its own ability to checkpoint and restart from previous progress.

Track History of a Requeued Job

When a scavenge job is requeued after preemption, it retains the same job id. However, this can make it difficult to track the history of the job (how many times it was requeued, how long it ran for each time). To view the full history of your job use the --duplicates flag for the sacct command.

sacct -j <jobid> --duplicates

Scavenge GPUs

On Grace and McCleary, we also have a scavenge_gpu partition, that contains all scavenge-able GPU enabled nodes and has higher priority for those node than normal scavenge. In all other ways (e.g. preemption, time limit), scavenge_gpu behaves the same as the normal scavenge partition. You can see the full count of GPU nodes in the Partition tables on the respective cluster pages.

Scavenge MPI Nodes

On Grace, we have a scavenge_mpi partition, that contains all scavenge-able nodes similar to the mpi partition and has higher priority for those node than normal scavenge. scavenge_mpi is subject to the same preemption model as scavenge and the same use case restrictions as the regular mpi partition (multi-node, tightly couple parallel codes). You can see the full count of MPI nodes in the Partition tables on the respective cluster pages.

Research Available Nodes

If you are interested in specific hardware and its availability, you can use the sinfo command to query how many of each type of node is available and what features it lists. For example:

sinfo -e -o "%.6D|%c|%G|%b" | column -ts "|"

will show you the kinds of nodes available, and

sinfo -e -o "%.6D|%T|%c|%G|%b" | column -ts "|"

will break out how many nodes in each state (e.g. allocated, mixed, idle) there are. For more options see the official sinfo documentation.

Last update: October 20, 2023