Broadly speaking, a high performance computing (HPC) cluster is a collection of networked computers and data storage. We refer to individual servers in this network as nodes. Our clusters are only accessible to researchers remotely; your gateways to the cluster are the login nodes. From these nodes, you view files and dispatch jobs to other nodes across the cluster configured for computation, called compute nodes. The tool we use to manage these jobs is called a job scheduler. All compute nodes on a cluster mount a shared filesystem; a file server or set of servers store files on a large array of disks. This allows your jobs to access and edit your data from any compute node. See our summary of the compute and storage hardware we maintain, from which you can navigate to a detailed description of each cluster.
Request an Account
The first step in gaining access to our clusters is to request an account. There are several HPC clusters available at Yale. There is no charge for using these clusters. To understand which cluster is appropriate for you and to request an account, visit the account request page.
Once you have an account, go to our Log on to the Clusters page login information and configuration.
If you want to access the clusters from outside Yale's network, you must use the Yale VPN.
Schedule a Job
On our clusters, you control your jobs using a job scheduling system called Slurm that allocates and manages compute resources for you. You can submit your jobs in one of two ways. For testing and small jobs you may want to run a job interactively. This way you can directly interact with the compute node(s) in real time. The other way, which is the preferred way for multiple jobs or long-running jobs, involves writing your job commands in a script and submitting that to the job scheduler. Please see our Slurm documentation or attend the HPC bootcamp for more details.
A basically familiarity with Linux commands is required for interacting with the clusters. We periodically run an Intro to Linux Bootcamp to get you started. There are also many excellent beginner tutorials available for free online, including the following:
Transfer Your Files
You will likely want to copy files between your computer and the clusters. There are a couple methods available to you, and the best for each situation usually depends on the size and number of files you would like to transfer. See our transferring data page for more information.
To best serve the diverse needs of all our researchers, we use software modules to make multiple versions of popular software available. Modules allow you to swap between different applications and versions of those applications with relative ease.
We also provide assistance for installing less commonly used packages. See our Applications & Software documentation for more details.
Be a Good Cluster Citizen
While using HPC resources, here are some important things to remember:
- Do not run jobs or computation on a login node, instead submit jobs.
- Never give your password or ssh key to anyone else.
- Do not store any high risk data on the clusters, except Milgram.
Use of the clusters is also governed by our official guidelines.
Hands on Training
We offer several courses that will assist you with your work on our clusters. They range from orientation for absolute beginners to advanced topics on application-specific optimization. Please peruse our catalog of training to see what is available.
If you have additional questions/comments, please contact us. Where applicable, please include the following information:
- Your netid
- Partition name
- Job ID(s)
- Error messages
- Command used to submit the job(s)
- Path(s) to scripts called by the submission command
- Path(s) to output files from your jobs