Ruddle
Ruddle is intended for use only on projects related to the Yale Center for Genome Analysis; Please do not use this cluster for other projects. If you have any questions about this policy, please contact us.
Ruddle is named for Frank Ruddle, a Yale geneticist who was a pioneer in genetic engineering and the study of developmental genetics.
Upcoming Ruddle Retirement
After more than seven years in service, we will be retiring the Ruddle HPC cluster on July 1st. Ruddle is being replaced with the new HPC cluster, McCleary. For more information and updates see the McCleary announcement page.
Access the Cluster
Once you have an account, the cluster can be accessed via ssh or through the Open OnDemand web portal.
System Status and Monitoring
For system status messages and the schedule for upcoming maintenance, please see the system status page. For a current node-level view of job activity, see the cluster monitor page (VPN only).
Partitions and Hardware
Ruddle is made up of several kinds of compute nodes. We group them into (sometimes overlapping) Slurm partitions meant to serve different purposes. By combining the --partition
and --constraint
Slurm options you can more finely control what nodes your jobs can run on.
Job Submission Rate Limits
Job submissions are limited to 200 jobs per hour. See the Rate Limits section in the Common Job Failures page for more info.
Public Partitions
See each tab below for more information about the available common use partitions.
Use the general partition for most batch jobs. This is the default if you don't specify one with --partition
.
Request Defaults
Unless specified, your jobs will run with the following options to salloc
and sbatch
options for this partition.
--time=7-00:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120
Job Limits
Jobs submitted to the general partition are subject to the following limits:
Limit | Value |
---|---|
Maximum job time limit | 30-00:00:00 |
Maximum CPUs per user | 300 |
Maximum memory per user | 1800G |
Available Compute Nodes
Requests for --cpus-per-task
and --mem
can't exceed what is available on a single compute node.
Count | CPU Type | CPUs/Node | Memory/Node (GiB) | Node Features |
---|---|---|---|---|
138 | E5-2660_v3 | 20 | 119 | haswell, E5-2660_v3, nogpu, standard, common, oldest |
Use the interactive partition to jobs with which you need ongoing interaction. For example, exploratory analyses or debugging compilation.
Request Defaults
Unless specified, your jobs will run with the following options to salloc
and sbatch
options for this partition.
--time=1-00:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120
Job Limits
Jobs submitted to the interactive partition are subject to the following limits:
Limit | Value |
---|---|
Maximum job time limit | 2-00:00:00 |
Maximum CPUs per user | 20 |
Maximum memory per user | 256G |
Available Compute Nodes
Requests for --cpus-per-task
and --mem
can't exceed what is available on a single compute node.
Count | CPU Type | CPUs/Node | Memory/Node (GiB) | Node Features |
---|---|---|---|---|
146 | E5-2660_v3 | 20 | 119 | haswell, E5-2660_v3, nogpu, standard, common, oldest |
Use the bigmem partition for jobs that have memory requirements other partitions can't handle.
Request Defaults
Unless specified, your jobs will run with the following options to salloc
and sbatch
options for this partition.
--time=1-00:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120
Job Limits
Jobs submitted to the bigmem partition are subject to the following limits:
Limit | Value |
---|---|
Maximum job time limit | 7-00:00:00 |
Maximum CPUs per user | 32 |
Maximum memory per user | 1505G |
Available Compute Nodes
Requests for --cpus-per-task
and --mem
can't exceed what is available on a single compute node.
Count | CPU Type | CPUs/Node | Memory/Node (GiB) | Node Features |
---|---|---|---|---|
2 | E7-4809_v3 | 32 | 1505 | haswell, E7-4809_v3, nogpu, common |
Use the scavenge partition to run preemptable jobs on more resources than normally allowed. For more information about scavenge, see the Scavenge documentation.
Request Defaults
Unless specified, your jobs will run with the following options to salloc
and sbatch
options for this partition.
--time=1-00:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120
Job Limits
Jobs submitted to the scavenge partition are subject to the following limits:
Limit | Value |
---|---|
Maximum job time limit | 7-00:00:00 |
Maximum CPUs per user | 300 |
Maximum memory per user | 1800G |
Available Compute Nodes
Requests for --cpus-per-task
and --mem
can't exceed what is available on a single compute node.
Count | CPU Type | CPUs/Node | Memory/Node (GiB) | Node Features |
---|---|---|---|---|
40 | 6240 | 36 | 181 | cascadelake, avx512, 6240, nogpu, standard, pi, bigtmp |
2 | 6240 | 36 | 179 | cascadelake, avx512, 6240, nogpu, standard, pi, bigtmp |
2 | 6240 | 36 | 1505 | cascadelake, avx512, 6240, nogpu, pi, bigtmp |
146 | E5-2660_v3 | 20 | 119 | haswell, E5-2660_v3, nogpu, standard, common, oldest |
2 | E7-4809_v3 | 32 | 1505 | haswell, E7-4809_v3, nogpu, common |
Private Partitions
With few exceptions, jobs submitted to private partitions are not considered when calculating your group's Fairshare. Your group can purchase additional hardware for private use, which we will make available as a pi_groupname
partition. These nodes are purchased by you, but supported and administered by us. After vendor support expires, we retire compute nodes. Compute nodes can range from $10K to upwards of $50K depending on your requirements. If you are interested in purchasing nodes for your group, please contact us.
PI Partitions (click to expand)
Request Defaults
Unless specified, your jobs will run with the following options to salloc
and sbatch
options for this partition.
--time=1-00:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120
Job Limits
Jobs submitted to the pi_hall partition are subject to the following limits:
Limit | Value |
---|---|
Maximum job time limit | 14-00:00:00 |
Available Compute Nodes
Requests for --cpus-per-task
and --mem
can't exceed what is available on a single compute node.
Count | CPU Type | CPUs/Node | Memory/Node (GiB) | Node Features |
---|---|---|---|---|
40 | 6240 | 36 | 181 | cascadelake, avx512, 6240, nogpu, standard, pi, bigtmp |
Request Defaults
Unless specified, your jobs will run with the following options to salloc
and sbatch
options for this partition.
--time=1-00:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120
Job Limits
Jobs submitted to the pi_hall_bigmem partition are subject to the following limits:
Limit | Value |
---|---|
Maximum job time limit | 14-00:00:00 |
Available Compute Nodes
Requests for --cpus-per-task
and --mem
can't exceed what is available on a single compute node.
Count | CPU Type | CPUs/Node | Memory/Node (GiB) | Node Features |
---|---|---|---|---|
2 | 6240 | 36 | 1505 | cascadelake, avx512, 6240, nogpu, pi, bigtmp |
Request Defaults
Unless specified, your jobs will run with the following options to salloc
and sbatch
options for this partition.
--time=1-00:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120
Job Limits
Jobs submitted to the pi_townsend partition are subject to the following limits:
Limit | Value |
---|---|
Maximum job time limit | 14-00:00:00 |
Available Compute Nodes
Requests for --cpus-per-task
and --mem
can't exceed what is available on a single compute node.
Count | CPU Type | CPUs/Node | Memory/Node (GiB) | Node Features |
---|---|---|---|---|
2 | 6240 | 36 | 179 | cascadelake, avx512, 6240, nogpu, standard, pi, bigtmp |
YCGA Data Retention Policy
Illumina sequence data is initially written to YCGA's main storage system, which is located in the main HPC datacenter at Yale's West Campus. Data stored there is protected against loss by software RAID. Raw basecall data (bcl files) is immediately transformed into DNA sequences (fastq files).
- 45 days after sequencing, the raw bcl files are deleted.
- 60 days after sequencing, the fastq files are written to a tape archive. Two tape libraries store identical copies of the data, located in two datacenters in separate buildings on West Campus.
- 365 days after sequencing, all data is deleted from main storage. Users continue to have access to the data via the tape archive. Data is retained on the tape archive indefinitely. Instructions for retrieving archived data.
All compression of sequence data is lossless. Gzip is used for data stored on the main storage, and quip is used for data stored on the tape archive. Disaster recovery is provided by the data stored on the tape library.
Access Sequencing Data
To avoid duplication of data and to save space that counts against your quotas, we suggest that you make soft links to your sequencing data rather than copying them.
Normally, YCGA will send you an email informing you that your data is ready, and will include a url that looks like: http://fcb.ycga.yale.edu:3010/randomstring/sample_dir_001
You can use that link to download your data in a browser, but if you plan to process the data on Ruddle, it is better to make a soft link to the data, rather than copying it. You can use the ycgaFastq tool to do that:
$ /home/bioinfo/software/knightlab/bin/ycgaFastq fcb.ycga.yale.edu:3010/randomstring/sample_dir_001
ycgaFastq can also be used to retrieve data that has been archived to tape. The simplest way to do that is to provide the sample submitter's netid and the flowcell (run) name:
$ ycgaFastq rdb9 AHFH66DSXX
If you have a path to the original location of the sequencing data, ycgaFastq can retrieve the data using that, even if the run as been archived and deleted:
$ ycgaFastq /ycga-gpfs/sequencers/illumina/sequencerD/runs/190607_A00124_0104_AHLF3MMSXX/Data/Intensities/BaseCalls/Unaligned-2/Project_Lz438
ycgaFastq can be used in a variety of other ways to retrieve data. For more information, see the documentation or contact us.
If you would like to know the true location of the data on Ruddle, do this:
$ readlink -f /ycga-gpfs/project/fas/lsprog/tools/external/data/randomstring/sample_dir_001
Tip
Original sequence data are archived pursuant to the YCGA retention policy. For long-running projects we recommend you keep a personal backup of your sequence files. If you need to retrieve archived sequencing data, please see our guide on how to do so.
If you have a very old link from YCGA that doesn't use the random string, you can find the location by decoding the url as shown below:
fullPath Starts With |
Root Path on Ruddle |
---|---|
gpfs_illumina/sequencer |
/gpfs/ycga/illumina/sequencer |
ba_sequencers |
/ycga-ba/ba_sequencers |
sequencers |
/gpfs/ycga/sequencers/panfs/sequencers |
For example, if the sample link you received is:
http://sysg1.cs.yale.edu:2011/gen?fullPath=sequencers2/sequencerV/runs/131107_D00306_0096... etc
The path on the cluster to the data is:
/gpfs/ycga/sequencers/panfs/sequencers2/sequencerV/runs/131107_D00306_0096... etc
Public Datasets
We host datasets of general interest in a loosely organized directory tree in /gpfs/gibbs/data
:
├── alphafold-2.3
├── alphafold-2.2 (deprecated)
├── alphafold-2.0 (deprecated)
├── annovar
│ └── humandb
├── db
│ └── blast
├── genomes
│ ├── Aedes_aegypti
│ ├── Bos_taurus
│ ├── Chelonoidis_nigra
│ ├── Danio_rerio
│ ├── Gallus_gallus
│ ├── hisat2
│ ├── Homo_sapiens
│ ├── Macaca_mulatta
│ ├── Monodelphis_domestica
│ ├── Mus_musculus
│ ├── PhiX
│ └── tmp
└── hisat2
└── mouse
If you would like us to host a dataset or questions about what is currently available, please contact us.
Storage
Ruddle's filesystem, /gpfs/ycga
, is where home, project, and scratch60 directories are located. For more details on the different storage spaces, see our Cluster Storage documentation. Ruddle's old ycga-ba
filesystem has been retired.
You can check your current storage usage & limits by running the getquota
command. Your ~/project
and ~/scratch60
directories are shortcuts. Get a list of the absolute paths to your directories with the mydirectories
command. If you want to share data in your Project or Scratch directory, see the permissions page.
For information on data recovery, see the Backups and Snapshots documentation.
Warning
Files stored in scratch60
are purged if they are older than 60 days. You will receive an email alert one week before they are deleted. Artificial extension of scratch file expiration is forbidden without explicit approval from the YCRC. Please purchase storage if you need additional longer term storage.
Partition | Root Directory | Quota | File Count | Backups | Snapshots |
---|---|---|---|---|---|
home | /gpfs/ycga/home |
125GiB/user | 500,000 | Yes | >=2 days |
project | /gpfs/ycga/project |
1TiB/group, increase to 4TiB on request | 5,000,000 | No | >=2 days |
scratch60 | /gpfs/ycga/scratch60 |
20TiB/group | 15,000,000 | No | >=2 days |