Skip to content

News

Grace Maintenance

December 6-8, 2022_

Software Updates

  • Slurm updated to 22.05.6
  • NVIDIA drivers updated to 520.61.05
  • Apptainer updated to 1.1.3
  • Open OnDemand updated to 2.0.28

Hardware Updates

  • Roughly 2 racks worth of equipment were moved to upgrade the effective InfiniBand connection speeds of several compute nodes (from 56 to 100 Gbps)
  • The InfiniBand network was modified to increase capacity and allow for additional growth
  • Some parts of the regular network were improved to shorten network paths and increase shared-uplink bandwidth

Loomis Decommission

The Loomis GPFS filesystem has been retired and unmounted from Grace, Farnam, and Ruddle. For additional information please see the Loomis Decommission page.

Published: December 08, 2022

December 2022

Announcements

Grace & Gibbs Maintenance

The biannual scheduled maintenance for the Grace cluster will be occurring December 6-8. During this time, the cluster will be unavailable. Additionally, the Gibbs filesystem will be unavailable on Farnam and Ruddle on Tuesday, December 6th to deploy a critical firmware upgrade. See the maintenance email announcements for more details.

Loomis Decommission

The Loomis GPFS filesystem will be retired and unmounted from Grace and Farnam during the Grace December maintenance starting on December 6th. All data except for a few remaining private filesets have already been transferred to other systems (e.g., current software, home, scratch to Palmer and project to Gibbs). The remaining private filesets are being transferred to Gibbs in advance of the maintenance and owners should have received communications accordingly. The only potential user impact of the retirement is on anyone using the older, deprecated software trees. Otherwise, the Loomis retirement should have no user impact but please reach out if you have any concerns or believe you are still using data located on Loomis. See the Loomis Decommission documentation for more information.

Apptainer Upgrade on Grace and Ruddle

The newest version of Apptainer (v1.1, available now on Ruddle and, after December maintenance, on Grace) comes the ability to create containers without needing elevated privileges (i.e. sudo access). This greatly simplifies the container workflow as you no longer need a separate system to build a container from a definition file. You can simply create a definition file and run the build command.

For example, to create a simple toy container from this def file (lolcow.def):

BootStrap: docker
From: ubuntu:20.04

%post
    apt-get -y update
    apt-get -y install cowsay lolcat

%environment
    export LC_ALL=C
    export PATH=/usr/games:$PATH

%runscript
    date | cowsay | lolcat

You can run:

salloc -p interactive -c 4 
apptainer build lolcow.sif lolcow.def

This upgrade is live on Ruddle and will be applied on Grace during the December maintenance. For more information, please see the Apptainer documentation site and our docs page on containers.

Software Highlights

  • RELION/4.0.0-fosscuda-2020b for cryo-EM/cryo-tomography data processing is now available on Farnam. RELION/3.1 will no longer be updated by the RELION developer. Note that data processed with RELION 4 are not backwards compatible with RELION 3.
Published: December 01, 2022

Ruddle Maintenance

November 1, 2022

Software Updates

  • Security updates
  • Slurm updated to 22.05.5
  • Apptainer updated to 1.1.2
  • Open OnDemand updated to 2.0.28

Hardware Updates

  • No hardware changes during this maintenance.
Published: November 01, 2022

November 2022

Announcements

Ruddle Maintenance

The biannual scheduled maintenance for the Ruddle cluster will be occurring Nov 1-3. During this time, the cluster will be unavailable. See the Ruddle maintenance email announcements for more details.

Grace and Milgram Maintenance Schedule Change

We will be adjusting the timing of Grace and Milgram's scheduled maintenance periods. Starting this December, Grace's maintenance periods will occur in December and June, with the next scheduled for December 6-8, 2022. Milgram's next maintenance will instead be performed in February and August, with the next scheduled for February 7-9, 2023. Please refer to previously sent communications for more information and see the full maintenance schedule for next year on our status page.

Requeue after Timeout

The YCRC clusters all have maximum time-limits that sometimes are shorter than a job needs to finish. This can be a frustration for researchers trying to get a simulation or a project finished. However, a number of workflows have the ability to periodically save the status of a process to a file and restart from where it left off. This is often referred to as "checkpointing" and is built into many standard software tools, like Gaussian and Gromacs.

Slurm is able to send a signal to your job just before it runs out of time. Upon receiving this signal, you can have your job save its current status and automatically submit a new version of the job which picks up where it left off. Here is an example of a simple script that resubmits a job after receiving the TIMEOUT signal:

#!/bin/bash
#SBATCH -p day
#SBATCH -t 24:00:00
#SBATCH -c 1
#SBATCH --signal=B:10@30 # send the signal `10` at 30s before job finishes
#SBATCH --requeue        # mark this job eligible for requeueing

# define a `trap` that catches the signal and requeues the job
trap "echo -n 'TIMEOUT @ '; date; echo 'Resubmitting...'; scontrol requeue ${SLURM_JOBID}  " 10

# run the main code, with the `&` to “background” the task
./my_code.exe &

# wait for either the main code to finish to receive the signal
wait

This tells Slurm to send SIGNAL10 at ~30s before the job finishes. Then we define an action (or trap) based on this signal which requeues the job. Don’t forget to add the & to the end of the main executable and the wait command so that the trap is able to catch the signal.

Software Highlights

  • MATLAB/2022b is now available on all clusters.
Published: November 01, 2022

Farnam Maintenance

October 4-5, 2022

Software Updates

  • Security updates
  • Slurm updated to 22.05.3
  • NVIDIA drivers updated to 515.65.01
  • Lmod updated to 8.7
  • Apptainer updated to 1.0.3
  • Open OnDemand updated to 2.0.28

Hardware Updates

  • No hardware changes during this maintenance.
Published: October 05, 2022

October 2022

Announcements

Farnam Maintenance

The biannual scheduled maintenance for the Farnam cluster will be occurring Oct 4-6. During this time, the cluster will be unavailable. See the Farnam maintenance email announcements for more details.

Gibbs Maintenance

Additionally, the Gibbs storage system will be unavailable on Grace and Ruddle on Oct 4 to deploy an urgent firmware fix. All jobs on those clusters will be held, and no new jobs will be able to start during the upgrade to avoid job failures.

New Command for Interactive Jobs

The new version of Slurm (the scheduler) has improved the process of launching an interactive compute job. Instead of the clunky srun --pty bash syntax from previous versions, this is now replaced with salloc. In addition, the interactive partition is now the default partition for jobs launched using salloc. Thus a simple (1 core, 1 hour) interactive job can be requested like this:

salloc

which will submit the job and then move your shell to the allocated compute node.

For MPI users, this allows multi-node parallel jobs to be properly launched inside an interactive compute job, which did not work as expected previously. For example, here is a two-node job, launched with salloc and then a parallel job-step launched with srun:

[user@grace1 ~]$ salloc --nodes 2 --ntasks 2 --cpus-per-task 1
salloc: Nodes p09r07n[24,28] are ready for job

[user@p09r07n24 ~]$ srun hostname
p09r07n24.grace.hpc.yale.internal
P09r07n28.grace.hpc.yale.internal

For more information on salloc, please refer to Slurm’s documentation.

Software Highlights

  • cellranger/7.0.1 is now available on Farnam.
  • LAMMPS/23Jun2022-foss-2020b-kokkos is now available on Grace.
Published: October 01, 2022

September 2022

Announcements

Software Module Extensions

Our software module utility (Lmod) has been enhanced to enable searching for Python and R (among other software) extensions. This is a very helpful way to know which software modules contain a specific library or package. For example, to see what versions of ggplot2 are available, use the module spider command.

$ module spider ggplot2
--------------------------------------------------------
  ggplot2:
--------------------------------------------------------
     Versions:
        ggplot2/3.3.2 (E)
        ggplot2/3.3.3 (E)
        ggplot2/3.3.5 (E)
$ module spider ggplot2/3.3.5
-----------------------------------------------------------
  ggplot2: ggplot2/3.3.5 (E)
-----------------------------------------------------------
    This extension is provided by the following modules. To access the extension you must load one of the following modules. Note that any module names in parentheses show the module location in the software hierarchy.

       R/4.2.0-foss-2020b

This indicates that by loading the R/4.2.0-foss-2020b module you will gain access to ggplot2/3.3.5.

Software Highlights

  • topaz/0.2.5-fosscuda-2020b for use with RELION (fosscuda-2020b toolchain) is now available as a module on Farnam.
Published: September 01, 2022

Grace Maintenance

August 2-4, 2022

Software Updates

  • Security updates
  • Slurm updated to 22.05.2
  • NVIDIA drivers updated to 515.48.07 (except for nodes with K80 GPUs, which are stranded at 470.129.06)
  • Singularity replaced by Apptainer version 1.0.3 (note: the "singularity" command will still work as expected)
  • Lmod updated to 8.7
  • Open OnDemand updated to 2.0.26

Hardware Updates

  • Core components of the ethernet network were upgraded to improve performance and increase overall capacity.

Loomis Decommission and Project Data Migration

After over eight years in service, the primary storage system on Grace, Loomis (/gpfs/loomis), will be retired later this year.

Project. We have migrated all of the Loomis project space (/gpfs/loomis/project) to the Gibbs storage system at /gpfs/gibbs/project during the maintenance. You will need to update your scripts and workflows to point to the new location (/gpfs/gibbs/project/<group>/<netid>). The "project" symlink in your home directory has been updated to point to your new space (with a few exceptions described below), so scripts using the symlinked path will not need to be updated. If you have jobs in a pending state going into the maintenance that used the absolute Loomis path, we recommend canceling, updating and then re-submitting those jobs so they do not fail.

If you had a project space that exceeds the no-cost allocation (4 TiB), you have received a separate communication from us with details about your data migration. In these instances, your group has been granted a new, empty "project" space with the default no-cost quota. Any scripts will need to be updated accordingly.

Conda. By default, all conda environments are installed into your project directory. However, most conda environments do not survive being moved from one location to another, so you may need to regenerate your conda environment(s). To assist with this, we provide documentation and a helper script.

R. Similarly, in 2022 we started redirecting user R packages to your project space to conserve home directory usage. If your R environment is not working as expected, we recommend deleting the old installation (found in ~/project/R/<version>) and rerunning install.packages.

Custom Software Installation. If you or your group had any self-installed software in the project directory, it is possible that the migration will have broken the software and it will need to be recompiled. Contact us if you need assistance recompiling.

Scratch60. The Loomis scratch space (/gpfs/loomis/scratch60) is now read-only. All data in that directory will be purged in 60 days on October 3, 2022. Any data in /gpfs/loomis/scratch60 you wish to retain needs to be copied into another location by that date (such as your Gibbs project or Palmer scratch).

Changes to Non-Interactive Sessions

Non-interactive sessions (e.g. file transfers, commands sent over ssh) no longer load the standard cluster environment to alleviate performance issues due to unnecessary module loads. Please contact us if this change affects your workflow so we can resolve the issue or provide a workaround.

Published: August 04, 2022

August 2022

Announcements

Grace Maintenance & Storage Changes

The biannual scheduled maintenance for the Grace cluster will be occurring August 2-4. During this time, the cluster will be unavailable. See the Grace maintenance email announcement for more details.

During the maintenance, significant changes will be made to the project and scratch60 directories on Grace. See our website for more information and updates.

SpinUp Researcher Image & Containers

Yale offers a simple portal for creating cloud-based compute resources called SpinUp. These cloud instances are hosted on Amazon Web Services, but have access to Yale services like Active Directory, DNS, and Storage at Yale. SpinUp offers a range of services including virtual machines, web servers, remote storage, and databases.

Part of this service is a Researcher Image, an Ubuntu-based system which contains a suite of pre-installed commonly utilized software utilities, including: - PyTorch, TensorFlow, Keras, and other GPU-accelerated deep learning frameworks - GCC, Cmake, Go, and other development tools - Singularity/Apptainer and Docker for container development

We recommend researchers looking to develop containers for use on YCRC HPC resources to utilize SpinUp to build containers which can then be copied to the clusters.

If there are software utilities or commonly used tools that you would like added to the Researcher Image, let us know and we can work with the Cloud Team to get them integrated.

Software Highlights

  • AFNI/2022.1.14 is now available on Farnam and Milgram.
  • cellranger/7.0.0 is now available on Grace.
Published: August 01, 2022

July 2022

Announcements

Loomis Decommission

After almost a decade in service, the primary storage system on Grace, Loomis (/gpfs/loomis), will be retired later this year. The usage and capacity on Loomis will be replaced by two existing YCRC storage systems, Palmer and Gibbs, which are already available on Grace. Data in Loomis project storage will be migrated to /gpfs/gibbs/project during the upcoming August Grace maintenance. See the Loomis Decommission documenation for more information and updates.

Updates to OOD Jupyter App

OOD Jupyter App has been updated to handle conda environments more intelligently. Instead of listing all the conda envs in your account, the app now lists only the conda environments with Jupyter installed. If you do not see your desired environment listed in the dropdown, check that you have installed Jupyter in that environment. In addition, the “jupyterlab” checkbox in the app will only be visible if the environment selected has jupyterlab installed.

YCRC conda environment

ycrc_conda_env.list has been replaced by ycrc_conda_env.sh. To update your conda enviroments in OOD for the Jupyter App and RStudio Desktop (with Conda R), please run ycrc_conda_env.sh update.

Software Highlights

  • miniconda/4.12.0 is now available on all clusters
  • RStudio/2022.02.3-492 is now available on all clusters. This is currently the only version that is compatible with the graphic engine used by R/4.2.0-foss-2020b.
  • fmriprep/21.0.2 is now available on Milgram.
  • cellranger/7.0.0 is now available on Farnam.
Published: July 01, 2022

Milgram Maintenance

June 7-8, 2022

Software Updates

  • Security updates
  • Slurm updated to 21.08.8-2
  • NVIDIA drivers updated to 515.43.04
  • Singularity replaced by Apptainer version 1.0.2 (note: the "singularity" command will still work as expected)
  • Lmod updated to 8.7
  • Open OnDemand updated to 2.0.23

Hardware Updates

  • The hostnames of the compute nodes on Milgram were changed to bring them in line with YCRC naming conventions.

Changes to non-interactive sessions

Non-interactive sessions (e.g. file transfers, commands sent over ssh) no longer load the standard cluster environment to alleviate performance issues due to unnecessary module loads. Please contact us if this change affects your workflow so we can resolve the issue or provide a workaround.

Published: June 08, 2022

June 2022

Announcements

Farnam Decommission & McCleary Announcement

After more than six years in service, we will be retiring the Farnam HPC cluster later this year. Farnam will be replaced with a new HPC cluster, McCleary. The McCleary HPC cluster will be Yale's first direct-to-chip liquid cooled cluster, moving the YCRC and the Yale research computing community into a more environmentally friendly future. For more information about the decommission process and the launch of McCleary, see our website.

RStudio (with module R) has been retired from Open OnDemand as of June 1st

Please switch to RStudio Server which provides a better user experience. For users using a conda environment with RStudio, RStudio (with Conda R) will continue to be served on Open OnDemand.

Milgram Maintenance

The biannual scheduled maintenance for the Milgram cluster will be occurring June 7-9. During this time, the cluster will be unavailable. See the Milgram maintenance email announcements for more details.

Software Highlights

  • QTLtools/1.3.1-foss-2020b is now available on Farnam.
  • R/4.2.0-foss-2020b is available on all clusters.
  • Seurat for R/4.2.0 is now available on all clusters through the R-bundle-Bioconductor/3.15-foss-2020b-R-4.2.0 module along with many other packages. Please check to see if any packages you need are available in these modules before running install.packages.
Published: June 01, 2022

Ruddle Maintenance

May 2, 2022

Software Updates

  • Security updates
  • Slurm updated to 21.08.7
  • Singularity replaced by Apptainer version 1.0.1 (note: the "singularity" command will still work as expected)
  • Lmod updated to 8.7

Changes to non-interactive sessions

Non-interactive sessions (e.g. file transfers, commands sent over ssh) no longer load the standard cluster environment to alleviate performance issues due to unnecessary module loads. Please contact us if this change affects your workflow so we can resolve the issue or provide a workaround.

Published: May 01, 2022

May 2022

Announcements

Ruddle Maintenance

The biannual scheduled maintenance for the Ruddle cluster will be occurring May 3-5. During this time, the cluster will be unavailable. See the Ruddle maintenance email announcements for more details.

Remote Visualization with Hardware Acceleration

VirtualGL is installed on all GPU nodes on Grace, Farnam, and Milgram to provide hardware accelerated 3D rendering. Instructions on how to use VirtualGL to accelerate your 3D applications can be found at https://docs.ycrc.yale.edu/clusters-at-yale/guides/virtualgl/.

Software Highlights

  • Singularity is now called "Apptainer". Singularity is officially named “Apptainer” as part of its move to the Linux Foundation. The new command apptainer works as drop-in replacement for singularity. However, the previous singularity command will also continue to work for the foreseeable future so no change is needed. The upgrade to Apptainer is on Grace, Farnam and Ruddle (as of the maintenance completion). Milgram will be upgraded to Apptainer during the June maintenance.
  • Slurm has been upgraded to version 21.08.6 on Grace
  • MATLAB/2022a is available on all clusters
Published: May 01, 2022

Farnam Maintenance

April 4-7, 2022

Software Updates

  • Security updates
  • Slurm updated to 21.08.6
  • NVIDIA drivers updated to 510.47.03 (note: driver for NVIDIA K80 GPUs was upgraded to 470.103.01)
  • Singularity replaced by Apptainer version 1.0.1 (note: the "singularity" command will still work as expected)
  • Open OnDemand updated to 2.0.20

Hardware Updates

  • Four new nodes with 4 NVIDIA GTX3090 GPUs each have been added

Changes to the bigmem Partition

Jobs requesting less than 120G of memory are no longer allowed in the "bigmem" partition. Please submit these jobs to the general or scavenge partitions instead.

Changes to non-interactive sessions

Non-interactive sessions (e.g. file transfers, commands sent over ssh) no longer load the standard cluster environment to alleviate performance issues due to unnecessary module loads. Please contact us if this change affects your workflow so we can resolve the issue or provide a workaround.

Published: April 07, 2022

April 2022

Announcements

Updates to R on Open OnDemand

RStudio Server is out of beta! With the deprecation of R 3.x (see below), we will be removing RStudio Desktop with module R from Open OnDemand on June 1st.

Improvements to R install.packages Paths

Starting with the R 4.1.0 software module, we now automatically set an environment variable (R_LIBS_USER) which directs these packages to be stored in your project space. This will helps ensure that packages are not limited by home-space quotas and that packages installed for different versions of R are properly separated from each other. Previously installed packages should still be available and there should be no disruption from the change.

Instructions for Running a MySQL Server on the Clusters

Occasionally it could be useful for a user to run their own MySQL database server on one of the clusters. Until now, that has not been possible, but recently we found a way using singularity. Instructions may be found in our new MySQL guide.

Software Highlights

  • R 3.x modules have been deprecated on all clusters and are no longer supported. If you need to continue to use an older version of R, look at our R conda documentation.
  • R/4.1.0-foss-2020b is now available on all clusters.
  • Seurat/4.1.0-foss-2020b-R-4.1.0 (for using the Seurat R package) is now available on all clusters.
Published: April 01, 2022

March 2022

Announcements

Snapshots

Snapshots are now available on all clusters for home and project spaces. Snapshots enable self-service restoration of modified or deleted files for at least 2 days in the past. See our User Documentation for more details on availability and instructions.

OOD File Browser Tip: Shortcuts

You can add shortcuts to your favorite paths in the OOD File Browser. See our OOD documentation for instructions on setting up shortcuts.

Software Highlights

  • R/4.1.0-foss-2020b is now on Grace.
  • GCC/11.2.0 is now on Grace.
Published: March 01, 2022

Grace Maintenance

February 3-6, 2022

Software Updates

  • Latest security patches applied
  • Slurm updated to version 21.08.5
  • NVIDIA driver updated to version 510.39.01 (except for nodes with K80 GPUs which are stranded at 470.82.01)
  • Singularity updated to version 3.8.5
  • Open OnDemand updated to version 2.0.20

Hardware Updates

  • Changes have been made to networking to improve performance of certain older compute nodes

Changes to Grace Home Directories

During the maintenance, all home directories on Grace have been moved to our new all-flash storage filesystem, Palmer. The move is in anticipation of the decommissioning of Loomis at the end of the year and will provide a robust login experience by protecting home directory interactions from data intensive compute jobs.

Due to this migration, your home directory path has changed from /gpfs/loomis/home.grace/<netid> to /vast/palmer/home.grace/<netid>. Your home directory can always be referenced in bash and submission scripts and from the command line with the $HOME variable. Please update any scripts and workflows accordingly.

Interactive Jobs

We have added an additional way to request an interactive job. The Slurm command salloc can be used to start an interactive job similar to srun --pty bash. In addition to being a simpler command (no --pty bash is needed), salloc jobs can be used to interactively test mpirun executables.

Palmer scratch

Palmer is out of beta! We have fixed the issue with Plink on Palmer, so now you can use Palmer scratch for any workloads. See https://docs.ycrc.yale.edu/data/hpc-storage#60-day-scratch for more information on Palmer scratch.

Published: February 06, 2022

February 2022

Announcements

Grace Maintenance

The biannual scheduled maintenance for the Grace cluster will be occurring February 1-3. During this time, the cluster will be unavailable. See the Grace maintenance email announcement for more details.

Data Transfers

For non-Milgram users doing data transfers, transfers should not be performed on the login nodes. We have a few alternative ways to get better networking and reduce the impact on the clusters’ login nodes:

  1. Dedicated transfer node. Each cluster has a dedicated transfer node, transfer-<cluster>.hpc.yale.edu. You can ssh directly to this node and run commands.
  2. “transfer” Slurm partition. This is a small partition managed by the scheduler for doing data transfer. You can submit jobs to it using srun/sbatch -p transfer … *For recurring or periodic data transfers (such as using cron), please use Slurm’s scrontab to schedule jobs that run on the transfer partition instead.
  3. Globus. For robust transfers of larger amount of data, see our Globus documentation.

More info about data transfers can be found in our Data Transfer documentation.

Software Highlights

  • Rclone is now installed on all nodes and loading the module is no longer necessary.
  • MATLAB/2021b is now on all clusters.
  • Julia/1.7.1-linux-x86_64 is now on all clusters.
  • Mathematica/13.0.0 is now on Grace.
  • QuantumESPRESSO/6.8-intel-2020b and QuantumESPRESSO/7.0-intel-2020b are now on Grace.
  • Mathematica documentation has been updated with regards to configuring parallel jobs.
Published: February 01, 2022

1 2 3 4


Last update: June 9, 2022