All of Omega's scratch space is read-and-delete-only as of Dec 3, 2018. The scratch space will be purged and all data permanently deleted on Feb 1, 2019.
Omega has now served Yale’s research community well for more than 2 years past the normal end-of-life for similar clusters. Most of its components are no longer under vendor warranty, and parts are sometimes difficult to obtain, so we are forced to support it on a best-effort basis. Last year, we developed a multi-year plan to replace Omega, which began with moving Omega’s shared resources to our Grace cluster, for which we acquired new commons nodes.
We plan to continue to support groups with dedicated node allocations and other users running tightly-coupled parallel jobs on Omega until Mid 2019. The
mpi partition on Grace contains the replacement nodes the remainder of Omega. Please test your workload on those nodes are your convenience. We will provide ample warning before the final Omega decommission.
Clean Out Omega Data
All Omega files are now stored solely on the Loomis GPFS system. For groups that have migrated their workloads entirely to Grace or Farnam, their Omega data is now available from Grace and Farnam for copying and clean-up until Feb 2019. See Cleaning Out Omega Data for instructions on retrieving your data.
The cluster is made up of several kinds of compute nodes. The Features column below lists the features that can be used to request different node types using the
--constraints flag (see our Slurm documentation for more details). The RAM listed below is the amount of memory available for jobs.
Compute Node Configurations
Nodes on the clusters are organized into partitions, to which you submit your jobs with Slurm. The default resource requests for all jobs is 1 core and 4GB of memory.
The day partition is where most batch jobs should run, and is the default if you don't specify a partition. The week partition is smaller, but allows for longer jobs. The interactive partition should only be used for testing or compiling software. The bigmem partition contains our largest memory node; only jobs that cannot be satisfied by day should run here. For more information about scavenge, see the Scavenge documentation.
The limits listed below are for all running jobs combined. Per-node limits are bound by the node types, as described in the hardware table.
|Partition||User Limits||Walltime default/max||Node Type (count)|
|day*||128 nodes||1h/1d||X5560 (218)|
|week||64 nodes||1h/7d||X5560 (46), X5560 44G (16)|
|interactive||1 job, 8 CPUs, 1 node||1h/4h||X5560 (2)|
* default partition
** The shared partition is for jobs that require less than a full node of cores
Private partitions contain nodes acquired by specific research groups. Full access to these partitions is granted at the discretion of the owner. Contact us if your group would like to purchase nodes.
|Partition||Walltime default/max||Node Type (count)|
|astro||1h/28d||X5560 (112), X5560 44G (16)|
/gpfs/loomis is Omega's primary filesystem where home, and scratch60 directories are located. You can also access Grace's project space (if you have a Grace account) from Omega. For more details on the different storage spaces, see our Cluster Storage documentation.
You can check your current storage usage & limits by running the
getquota command. Note that the per-user usage breakdown only update once daily.
Files stored in
scratch60 are purged if they are older than 60 days. You will receive an email alert one week before they are deleted.
|Partition||Root Directory||Storage||File Count||Backups|