Broadly speaking, a compute cluster is a collection of networked computers which we call nodes. Our clusters are only accessible to researchers remotely; your gateway to the cluster is the login node. From this node, you will be able to view your files and dispatch jobs to one or several other nodes across the cluster configured for computation, called compute nodes. The tool we use to submit these jobs is called a job scheduler. All compute nodes on a cluster mount a shared filesystem; a file server or set of servers that keeps track of all files on a large array of disks, so that you can access and edit your data from any compute node. Detailed information about each of our clusters is available here.
Request an Account
The first step in gaining access to our clusters is to request an account. There are several HPC clusters available at Yale. There is no charge for using these clusters. To understand which cluster is appropriate for you and to request an account, visit the account request page.
All of Yale's clusters are accessed via a protocol called secure shell (ssh). Once you have an account, look at our SSH instructions to log on to the system.
If you want to access the clusters from outside Yale, you must use the Yale VPN.
Schedule a Job
On our clusters, you control your jobs using a job scheduling system called Slurm that dedicates and manages compute resources for you. Schedulers are usually used in one of two ways. For testing and small jobs you may want to run a job interactively. This way you can directly interact with the compute node(s) in real time to make sure your jobs will behave as expected. The other way, which is the preferred way for long-running jobs, involves writing your job commands in a script and submitting that to the job scheduler. Please see our Slurm documentation or attend the HPC bootcamp for more details.
A basically familiarity with Linux commands is required for interacting with the clusters. We periodically run an Intro to Linux Bootcamp to get you started. There are also many excellent beginner tutorials available for free online, including the following:
Move Your Files
You will likely find it necessary to copy files between your local machines and the clusters. Just as with logging in, there are different ways to do this, depending on your local operating system. See the documentation on transferring data for more information.
To best serve the diverse needs of all our researhcers, we use a module system to manage the most commonly used software. This allows you to swap between different applications and versions of those applications with relative ease and focus on getting your work done. See the Modules documentation in our User Guide for more information.
We also provide assistance for installing less commonly used packages. See our Applications & Software documentation for more details.
Rules of the Road
Before you begin using the cluster, here are some important things to remember:
- Do not run jobs or do real work on the login node. Always allocate a compute node and run programs there.
- Never give your password or ssh key to anyone else.
- Do not store any protected or regulated data on the cluster (e.g. PHI data)
Use of the clusters is also governed by our official guidelines.
Hands on Training
We offer several courses that will assist you with your work on our clusters. They range from orientation for absolute beginners to advanced topics on application-specific optimization. Please peruse our catalog of training to see what is available.
Get Additional Help
If you have additional questions/comments, please contact the YCRC team. If possible, please include the following information:
- Your netid
- Queue/partition name
- Job ID(s)
- Error messages
- Command used to submit the job(s)
- Path(s) to scripts called by the submission command
- Path(s) to output files from your jobs