Skip to content

Ruddle

Frank

NOTICE: We plan to permanently decommission the ycga-ba filesystem on Sept 16, 2019. See here for details.

Ruddle is named for Frank Ruddle, a Yale geneticist who was a pioneer in genetic engineering and the study of developmental genetics.

Ruddle is intended for use only on projects related to the Yale Center for Genome Analysis; Please do not use this cluster for other projects. If you have any questions about this policy, please contact us.


Hardware

Ruddle is made up of several kinds of compute nodes. The Features column below lists the features that can be used to request different node types using the --constraints flag (see our Slurm documentation for more details). The RAM listed below is the amount of memory available for jobs.

Warning

Care should be taken if when scheduling your job if you are running programs/libraries optimized for specific hardware. See the guide on how to compile software for specific guidance.

Compute Node Configurations

Count CPU CPU Cores RAM Features
12 4x AMD Opteron 6276 32 499G bulldozer, opteron-6276
155 2x E5-2660 v3 20 121G haswell, avx2, E5-2660_v3, oldest
2 4x E7-4809 v3 32 1507G haswell, avx2, E7-4809_v3

Slurm Partitions

Nodes on the clusters are organized into partitions, to which you submit your jobs with Slurm. The default resource requests for all jobs is 1 core and 5GB of memory per core. The general partition is where most batch jobs should run, and is the default if you don't specify a partition. The interactive partition is dedicated to jobs with which you need ongoing interaction. The bigmem partition contains our largest memory nodes; only jobs that cannot be satisfied by general should run here. The scavenge partition allows you to run preemptable jobs on more resources than normally allowed. For more information about scavenge, see the Scavenge documentation.

The limits listed below are for all running jobs combined. Per-node limits are bound by the node types, as described in the hardware table.

Partition User Limits Walltime Default/Max Node Type (count)
general* 300 CPUs, 1800 G RAM 7d/30d E5-2660_v3 (155)
interactive 20 CPUs, 256 G RAM 1d/2d E5-2660_v3 (155)
bigmem 32 CPUs, 1507 G RAM 1d/7d opteron-6276 (12), E7-4809_v3 1507G (2)
scavenge 800 CPUs, 5120 G RAM 1d/7d all

* default

Access Sequencing Data

To avoid duplication of data and to save space that counts against your quotas, we suggest that you make soft links to your sequencing data rather than copying them:

ln -s /path/to/sequece_data /path/to/your_link

Tip

Original sequence data are archived pursuant to the YCGA retention policy. For long-running projects we recommend you keep a personal backup of your sequence files. If you need to retrieve archived sequencing data, please see our guide on how to do so.

To find the location of the sequence files on the storage, look at the URL that you were sent from YCGA.

fullPath Starts With Root Path on Ruddle
gpfs_illumina/sequencer /gpfs/ycga/illumina/sequencer
ba_sequencers /ycga-ba/ba_sequencers
sequencers /gpfs/ycga/sequencers/panfs/sequencers

For example, if the sample link you received is:

http://sysg1.cs.yale.edu:2011/gen?fullPath=sequencers2/sequencerV/runs/131107_D00306_0096... etc

The path on the cluster to the data is:

/gpfs/ycga/sequencers/panfs/sequencers2/sequencerV/runs/131107_D00306_0096... etc

Public Datasets

We host datasets of general interest in a loosely organized directory tree in /gpfs/ycga/datasets:

├── annovar
│   └── humandb
├── db
│   └── blast
├── genomes
│   ├── Aedes_aegypti
│   ├── Bos_taurus
│   ├── Chelonoidis_nigra
│   ├── Danio_rerio
│   ├── Gallus_gallus
│   ├── hisat2
│   ├── Homo_sapiens
│   ├── Macaca_mulatta
│   ├── Monodelphis_domestica
│   ├── Mus_musculus
│   ├── PhiX
│   └── tmp
└── hisat2
    └── mouse

If you would like us to host a dataset or questions about what is currently available, please email hpc@yale.edu.

Storage

Ruddle has access to two filesystems. /gpfs/ycga is Ruddle's filesystem where home, project, and scratch60 directories are located. /ycga-ba stores legacy data. For more details on the different storage spaces, see our Cluster Storage documentation.

You can check your current storage usage & limits by running the getquota command. Note that the per-user usage breakdown only update once daily..

Warning

Files stored in scratch60 are purged if they are older than 60 days. You will receive an email alert one week before they are deleted.

Partition Root Directory Storage File Count Backups
home /gpfs/ycga/home 125G/user 500,000 Yes
project /gpfs/ycga/project 4T/group 5,000,000 No
scratch60 /gpfs/ycga/scratch60 10T/group 5,000,000 No