Skip to content

Run Jobs with Slurm

Performing computational work at scale in a shared environment involves organizing everyone's work into jobs and scheduling them. We use Slurm to schedule and manage jobs on the YCRC clusters.

Submitting a job involves specifying a resource request then running one or more commands or applications. These requests take the form of options to the command-line programs srun and sbatch or those same options as directives inside submission scripts. Requests are made of groups of compute nodes (servers) called partitions. Partitions, their defaults, limits, and purposes are listed on each cluster page. Once submitted, jobs wait in a queue and are subject to several factors affecting scheduling priority. When your scheduled job begins, the commands or applications you specify are run on compute nodes the scheduler found to satisfy your resource request. If the job was submitted as a batch job, output normally printed to the screen will be saved to file.

Please be a good cluster citizen.

  • Do not run heavy computation on login nodes (e.g. grace1, farnam2). Doing so negatively impacts everyone's ability to interact with the cluster.
  • Make resource requests for your jobs that reflect what they will use. Wasteful job allocations slow down everyone's work on the clusters. See our documentation on Monitoring CPU and Memory Usage for how to measure job resource usage.
  • If you plan to run many similar jobs, use our Dead Simple Queue tool or job arrays - we enforce limits on job submission rates on all clusters.

If you find yourself wondering how best to schedule a job, please contact us for some help.

Common Slurm Commands

For an exhaustive list of commands and their official manuals, see the SchedMD Man Pages. Below are some of the most common commands used to interact with the scheduler.

Submit a script called my_job.sh as a job (see below for details):

sbatch my_job.sh

List your queued and running jobs:

squeue -u$USER

Cancel a queued job or kill a running job, e.g. a job with ID 12345:

scancel 12345

Check status of a job, e.g. a job with ID 12345:

sacct -j 12345

Check how efficiently a job ran, e.g. a job with ID 12345:

seff 12345
See our Monitor CPU and Memory page for more on tracking the resources your job actually uses.

Common Job Request Options

These options modify the size, length and behavior of jobs you submit. They can be specified when calling srun or sbatch, or saved to a batch script. Options specified on the command line to sbatch will override those in a batch script. See our Request Compute Resources page for discussion on the differences between --ntasks and --cpus-per-task, constraints, GPUs, etc. If options are left unspecified defaults are used.

Long Option Short Option Default Description
--job-name -J Name of script Custom job name.
--output -o "slurm-%j.out" Where to save stdout and stderr from the job. See filename patterns for more formatting options.
--partition -p Varies by cluster Partition to run on. See individual cluster pages for details.
--account -A Your group name Specify if you have access to multiple private partitions.
--time -t Varies by partition Time limit for the job in D-HH:MM:SS, e.g. -t 1- is one day, -t 4:00:00 is 4 hours.
--nodes -N 1 Total number of nodes.
--ntasks -n 1 Number of tasks (MPI workers).
--ntasks-per-node Scheduler decides Number of tasks per node.
--cpus-per-task -c 1 Number of CPUs for each task. Use this for threads/cores in single-node jobs.
--mem-per-cpu 5G Memory requested per CPU in MiB. Add G to specify GiB (e.g. 10G).
--mem Memory requested per node in MiB. Add G to specify GiB (e.g. 10G).
--gres Used to request GPUs
--constraint -C Constraints on node features. To limit kinds of nodes to run on.
--mail-user Your Yale email Mail address (alternatively, put your email address in ~/.forward).
--mail-type None Send email when jobs change state. Use ALL to receive email notifications at the beginning and end of the job.

Interactive Jobs

Interactive jobs can be used for testing and troubleshooting code. Requesting an interactive job will allocate resources and log you into a shell on a compute node. For example:

srun --pty -t 2:00:00 --mem=8G -p interactive bash

This will assign one CPU and 8GiB of RAM to you for two hours. You can run commands in this shell as needed. To exit, you can type exit or Ctrl+d

Use tmux with Interactive Sessions

Remote sessions are vulnerable to being killed if you lose your network connection. We recommend using tmux alleviate this. When using tmux with interactive jobs, please take extra care to stop jobs that are no longer needed.

Graphical applications

Many graphical applications are well served with the Open OnDemand Remote Desktop app. If you would like to use X11 forwarding, first make sure it is installed and configured. Then, add the --x11 flag to an interactive job request:

srun --pty --x11 -p interactive bash

Batch Jobs

You can submit a script as a batch job, i.e. one that can be run non-interactively in batches. These submission scripts are comprised of three parts:

  1. A hashbang line specifying the program that runs the script. This is normally #!/bin/bash.
  2. Directives that list job request options. These lines must appear before any other commands or definitions, otherwise they will be ignored.
  3. The commands or applications you want executed during your job.

See our page of Submission Script Examples for a few more, or the example scripts repo for more in-depth examples. Here is an example submission script that prints some job information and exits:

#!/bin/bash
#SBATCH --job-name=example_job
#SBATCH --out="slurm-%j.out"
#SBATCH --time=01:00
#SBATCH --nodes=1 --ntasks=1 --cpus-per-task=2
#SBATCH --mem-per-cpu=5G
#SBATCH --mail-type=ALL

mem_bytes=$(</sys/fs/cgroup/memory/slurm/uid_${SLURM_JOB_UID}/job_${SLURM_JOB_ID}/memory.limit_in_bytes)
mem_gbytes=$(( $mem_bytes / 1024 **3 ))

echo "Starting at $(date)"
echo "Job submitted to the ${SLURM_JOB_PARTITION} partition, the default partition on ${SLURM_CLUSTER_NAME}"
echo "Job name: ${SLURM_JOB_NAME}, Job ID: ${SLURM_JOB_ID}"
echo "  I have ${SLURM_CPUS_ON_NODE} CPUs and ${mem_gbytes}GiB of RAM on compute node $(hostname)"

Save this file as example_job.sh, then submit it with:

sbatch example_job.sh

When the job finishes the output should be stored in a file called slurm-jobid.out, where jobid is the submitted job's ID. If you find yourself writing loops to submit jobs, instead use our Dead Simple Queue tool or job arrays.


Last update: November 18, 2020