Skip to content

Ruddle

Frank

Ruddle is intended for use only on projects related to the Yale Center for Genome Analysis; Please do not use this cluster for other projects. If you have any questions about this policy, please contact us.

Ruddle is named for Frank Ruddle, a Yale geneticist who was a pioneer in genetic engineering and the study of developmental genetics.

Upcoming Ruddle Retirement

After more than seven years in service, we will be retiring the Ruddle HPC cluster on July 1st. Ruddle is being replaced with the new HPC cluster, McCleary. For more information and updates see the McCleary announcement page.


Access the Cluster

Once you have an account, the cluster can be accessed via ssh or through the Open OnDemand web portal.

System Status and Monitoring

For system status messages and the schedule for upcoming maintenance, please see the system status page. For a current node-level view of job activity, see the cluster monitor page (VPN only).

Partitions and Hardware

Ruddle is made up of several kinds of compute nodes. We group them into (sometimes overlapping) Slurm partitions meant to serve different purposes. By combining the --partition and --constraint Slurm options you can more finely control what nodes your jobs can run on.

Job Submission Rate Limits

Job submissions are limited to 200 jobs per hour. See the Rate Limits section in the Common Job Failures page for more info.

Public Partitions

See each tab below for more information about the available common use partitions.

Use the general partition for most batch jobs. This is the default if you don't specify one with --partition.

Request Defaults

Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition.

--time=7-00:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120

Job Limits

Jobs submitted to the general partition are subject to the following limits:

Limit Value
Maximum job time limit 30-00:00:00
Maximum CPUs per user 300
Maximum memory per user 1800G

Available Compute Nodes

Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node.

Count CPU Type CPUs/Node Memory/Node (GiB) Node Features
138 E5-2660_v3 20 119 haswell, E5-2660_v3, nogpu, standard, common, oldest

Use the interactive partition to jobs with which you need ongoing interaction. For example, exploratory analyses or debugging compilation.

Request Defaults

Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition.

--time=1-00:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120

Job Limits

Jobs submitted to the interactive partition are subject to the following limits:

Limit Value
Maximum job time limit 2-00:00:00
Maximum CPUs per user 20
Maximum memory per user 256G

Available Compute Nodes

Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node.

Count CPU Type CPUs/Node Memory/Node (GiB) Node Features
146 E5-2660_v3 20 119 haswell, E5-2660_v3, nogpu, standard, common, oldest

Use the bigmem partition for jobs that have memory requirements other partitions can't handle.

Request Defaults

Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition.

--time=1-00:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120

Job Limits

Jobs submitted to the bigmem partition are subject to the following limits:

Limit Value
Maximum job time limit 7-00:00:00
Maximum CPUs per user 32
Maximum memory per user 1505G

Available Compute Nodes

Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node.

Count CPU Type CPUs/Node Memory/Node (GiB) Node Features
2 E7-4809_v3 32 1505 haswell, E7-4809_v3, nogpu, common

Use the scavenge partition to run preemptable jobs on more resources than normally allowed. For more information about scavenge, see the Scavenge documentation.

Request Defaults

Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition.

--time=1-00:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120

Job Limits

Jobs submitted to the scavenge partition are subject to the following limits:

Limit Value
Maximum job time limit 7-00:00:00
Maximum CPUs per user 300
Maximum memory per user 1800G

Available Compute Nodes

Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node.

Count CPU Type CPUs/Node Memory/Node (GiB) Node Features
40 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, pi, bigtmp
2 6240 36 179 cascadelake, avx512, 6240, nogpu, standard, pi, bigtmp
2 6240 36 1505 cascadelake, avx512, 6240, nogpu, pi, bigtmp
146 E5-2660_v3 20 119 haswell, E5-2660_v3, nogpu, standard, common, oldest
2 E7-4809_v3 32 1505 haswell, E7-4809_v3, nogpu, common

Private Partitions

With few exceptions, jobs submitted to private partitions are not considered when calculating your group's Fairshare. Your group can purchase additional hardware for private use, which we will make available as a pi_groupname partition. These nodes are purchased by you, but supported and administered by us. After vendor support expires, we retire compute nodes. Compute nodes can range from $10K to upwards of $50K depending on your requirements. If you are interested in purchasing nodes for your group, please contact us.

PI Partitions (click to expand)

Request Defaults

Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition.

--time=1-00:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120

Job Limits

Jobs submitted to the pi_hall partition are subject to the following limits:

Limit Value
Maximum job time limit 14-00:00:00

Available Compute Nodes

Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node.

Count CPU Type CPUs/Node Memory/Node (GiB) Node Features
40 6240 36 181 cascadelake, avx512, 6240, nogpu, standard, pi, bigtmp

Request Defaults

Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition.

--time=1-00:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120

Job Limits

Jobs submitted to the pi_hall_bigmem partition are subject to the following limits:

Limit Value
Maximum job time limit 14-00:00:00

Available Compute Nodes

Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node.

Count CPU Type CPUs/Node Memory/Node (GiB) Node Features
2 6240 36 1505 cascadelake, avx512, 6240, nogpu, pi, bigtmp

Request Defaults

Unless specified, your jobs will run with the following options to salloc and sbatch options for this partition.

--time=1-00:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120

Job Limits

Jobs submitted to the pi_townsend partition are subject to the following limits:

Limit Value
Maximum job time limit 14-00:00:00

Available Compute Nodes

Requests for --cpus-per-task and --mem can't exceed what is available on a single compute node.

Count CPU Type CPUs/Node Memory/Node (GiB) Node Features
2 6240 36 179 cascadelake, avx512, 6240, nogpu, standard, pi, bigtmp

YCGA Data Retention Policy

Illumina sequence data is initially written to YCGA's main storage system, which is located in the main HPC datacenter at Yale's West Campus. Data stored there is protected against loss by software RAID. Raw basecall data (bcl files) is immediately transformed into DNA sequences (fastq files).

  • 45 days after sequencing, the raw bcl files are deleted.
  • 60 days after sequencing, the fastq files are written to a tape archive. Two tape libraries store identical copies of the data, located in two datacenters in separate buildings on West Campus.
  • 365 days after sequencing, all data is deleted from main storage. Users continue to have access to the data via the tape archive. Data is retained on the tape archive indefinitely. Instructions for retrieving archived data.

All compression of sequence data is lossless. Gzip is used for data stored on the main storage, and quip is used for data stored on the tape archive. Disaster recovery is provided by the data stored on the tape library.

Access Sequencing Data

To avoid duplication of data and to save space that counts against your quotas, we suggest that you make soft links to your sequencing data rather than copying them.

Normally, YCGA will send you an email informing you that your data is ready, and will include a url that looks like: http://fcb.ycga.yale.edu:3010/randomstring/sample_dir_001

You can use that link to download your data in a browser, but if you plan to process the data on Ruddle, it is better to make a soft link to the data, rather than copying it. You can use the ycgaFastq tool to do that:

$ /home/bioinfo/software/knightlab/bin/ycgaFastq  fcb.ycga.yale.edu:3010/randomstring/sample_dir_001

ycgaFastq can also be used to retrieve data that has been archived to tape. The simplest way to do that is to provide the sample submitter's netid and the flowcell (run) name:

$ ycgaFastq rdb9 AHFH66DSXX

If you have a path to the original location of the sequencing data, ycgaFastq can retrieve the data using that, even if the run as been archived and deleted:

$ ycgaFastq /ycga-gpfs/sequencers/illumina/sequencerD/runs/190607_A00124_0104_AHLF3MMSXX/Data/Intensities/BaseCalls/Unaligned-2/Project_Lz438

ycgaFastq can be used in a variety of other ways to retrieve data. For more information, see the documentation or contact us.

If you would like to know the true location of the data on Ruddle, do this:

$ readlink -f /ycga-gpfs/project/fas/lsprog/tools/external/data/randomstring/sample_dir_001

Tip

Original sequence data are archived pursuant to the YCGA retention policy. For long-running projects we recommend you keep a personal backup of your sequence files. If you need to retrieve archived sequencing data, please see our guide on how to do so.

If you have a very old link from YCGA that doesn't use the random string, you can find the location by decoding the url as shown below:

fullPath Starts With Root Path on Ruddle
gpfs_illumina/sequencer /gpfs/ycga/illumina/sequencer
ba_sequencers /ycga-ba/ba_sequencers
sequencers /gpfs/ycga/sequencers/panfs/sequencers

For example, if the sample link you received is:

http://sysg1.cs.yale.edu:2011/gen?fullPath=sequencers2/sequencerV/runs/131107_D00306_0096... etc

The path on the cluster to the data is:

/gpfs/ycga/sequencers/panfs/sequencers2/sequencerV/runs/131107_D00306_0096... etc

Public Datasets

We host datasets of general interest in a loosely organized directory tree in /gpfs/gibbs/data:

├── alphafold-2.3
├── alphafold-2.2 (deprecated)
├── alphafold-2.0 (deprecated)
├── annovar
│   └── humandb
├── db
│   └── blast
├── genomes
│   ├── Aedes_aegypti
│   ├── Bos_taurus
│   ├── Chelonoidis_nigra
│   ├── Danio_rerio
│   ├── Gallus_gallus
│   ├── hisat2
│   ├── Homo_sapiens
│   ├── Macaca_mulatta
│   ├── Monodelphis_domestica
│   ├── Mus_musculus
│   ├── PhiX
│   └── tmp
└── hisat2
    └── mouse

If you would like us to host a dataset or questions about what is currently available, please contact us.

Storage

Ruddle's filesystem, /gpfs/ycga, is where home, project, and scratch60 directories are located. For more details on the different storage spaces, see our Cluster Storage documentation. Ruddle's old ycga-ba filesystem has been retired.

You can check your current storage usage & limits by running the getquota command. Your ~/project and ~/scratch60 directories are shortcuts. Get a list of the absolute paths to your directories with the mydirectories command. If you want to share data in your Project or Scratch directory, see the permissions page.

For information on data recovery, see the Backups and Snapshots documentation.

Warning

Files stored in scratch60 are purged if they are older than 60 days. You will receive an email alert one week before they are deleted. Artificial extension of scratch file expiration is forbidden without explicit approval from the YCRC. Please purchase storage if you need additional longer term storage.

Partition Root Directory Quota File Count Backups Snapshots
home /gpfs/ycga/home 125GiB/user 500,000 Yes >=2 days
project /gpfs/ycga/project 1TiB/group, increase to 4TiB on request 5,000,000 No >=2 days
scratch60 /gpfs/ycga/scratch60 20TiB/group 15,000,000 No >=2 days

Last update: April 12, 2023