News

May 2025

Announcements

Announcing Hopper Secure Computing Environment

We are pleased to announce the upcoming launch of Hopper, our new secure high performance computing cluster, scheduled for deployment in Summer 2025. Upon completion, Hopper will have undergone rigorous external auditing for NIST 800-171 and HIPAA compliance, making it suitable for high performance computation of electronic Protected Health Information (ePHI), NIH Controlled-Access Data, Controlled Unclassified Information (CUI), and other sensitive data types.

This secure computing environment represents a collaborative effort between the YCRC and Health Sciences IT, to ensure enhanced support for secure computational research. Hopper complements Yale's secure computing portfolio, which includes SpinUp+ and Computational Health Platform (CHP), developed to address the diverse secure computing needs of the Yale research community.

We anticipate Hopper will be fully operational and available to Yale researchers by July 2025, with access procedures to be announced in June.

NIH Controlled Access Data on McCleary

Effective January 25, 2025, all new or renewed Data Use Certifications for NIH Controlled-Access Data and Repositories must comply with the NIH Security Best Practices for Users of Controlled-Access Data.

During the development phase of the YCRC's new cluster Hopper, Yale University has successfully completed the required documentation to designate McCleary as an approved environment for housing NIH Controlled-Access Data and Repositories, subject to specific conditions.

For more information and access requests, please consult our NIH Controlled-Access Data documentation.

YCRC User Town Hall Meetings

Throughout April, the YCRC Leadership hosted a series of Town Hall meetings with Principal Investigators on Science Hill and at the Medical Library and welcoming all users to our 160 St Ronan Street location.

During these sessions, we presented significant updates regarding our computing infrastructure expansion, team growth initiatives, and strategic calibration of services to address the evolving computational needs of the Yale research community. The meetings provided a forum for us to hear directly from both PI and non-PI users about their specific requirements, concerns, and innovative suggestions for enhancing our support framework.

We extend our sincere appreciation to everyone who participated in these discussions, contributing to our mission of advancing computational research at Yale. We look forward to continuing this productive dialogue in the future Town Hall meetings.

Software Highlights

FreeSurfer/8.0.0 now available on Grace, McCleary and Milgram

Published: May 01, 2025

April 2025

Announcements

Cluster User Town Hall Meeting

Please join us in shaping a meaningful long-term vision and strategy for YCRC as we advance our mission of supporting cutting-edge research. The session will include a brief presentation on recent developments and upcoming plans, followed by an open forum for questions and feedback. We look forward to hearing from you.

YCRC User Town Hall
Date: Monday, April 14, 2025
Time: 12:30 PM - 2:00 PM
Location: YCRC Auditorium, 160 St Ronan Street

This meeting is in-person-only to enable the conversations to be the most interactive. Please register to assist us in getting an accurate headcount. Pizza will be served.

Bouchet Expansion Coming This Spring

The Bouchet cluster will be expanded later this spring to include NVIDIA GPUs and additional CPU resources. As part of the AI Initiative, we will introduce 80 H200 GPU cards, each with 140GB of VRAM and advanced networking capabilities. This upgrade will enable large-scale AI work to be performed on our computing systems. Additionally, we will make available 48 A5000 ADA cards, each with 32GB of VRAM, for general-purpose workflows. Furthermore, 6,000 CPUs will enhance the existing resources for general computation, including several nodes equipped with 4TB of RAM.

Once these nodes are operational, we will grant access to the Bouchet cluster for researchers from all domains.

Using Nextflow on the clusters

Nextflow is a widely used workflow tool with an active community and numerous prebuilt pipelines, especially in the field of bioinformatics. You can learn more and access training, documentation, and examples at [Nextflow webpage](https://www.nextflow.io/.

Our team has developed specific guidelines to help our users run Nextflow pipelines on YCRC's clusters most efficiently. For more information please visit docs.ycrc.yale.edu/clusters-at-yale/guides/nextflow.

Software Highlights

Mathematica/14.2.0 now available on Grace and McCleary
AlphaFold/3.0.1 now available on Grace and McCleary

Published: April 01, 2025

March 2025

Announcements

Bouchet Open OnDemand

We are excited to announce that Bouchet’s Open OnDemand web portal is now live! Currently, Remote Desktop and Jupyter applications are available, and we are working to expand this list in the coming months. Please visit ood-bouchet.ycrc.yale.edu (VPN login required) and email hpc@yale.edu if you would like to request specific applications that would enhance your workflows.

LLM User Documentation

The YCRC has released instructions for installing and using HuggingFace and Ollama to conduct LLM research on research computing hardware. You may find them by visiting docs.ycrc.yale.edu/clusters-at-yale/guides/LLMs/. These instructions cover both interactive sessions, such as Jupyter notebooks, and batch submissions for larger jobs.

This documentation is the first in a series planned for this year that will focus on AI and machine learning. For additional information and user guidance, please contact hpc@yale.edu.

Ollama Module Available

A module has been installed on McCleary, Grace, and Milgram clusters to facilitate easy access to various LLM models through the Ollama program. With Ollama, researchers can download and run popular LLMs offline, which enhances the protection of their data.

For instructions on how to run the module, please refer to our documentation page located at docs.ycrc.yale.edu/clusters-at-yale/guides/LLMs/.

Software Highlights

The latest RELION version 5.0.0 cryo-EM image-processing package is now available on McCleary and Grace.

Published: March 01, 2025

February 2025

Announcements

Bouchet is Now in Production

We are excited to announce that the beta testing period for the Bouchet cluster has concluded, and the cluster is now officially in production. This marks our first cluster at the Massachusetts Green High Performance Computing Center and signifies an important milestone in our transition to a purpose-built data center for academic research computing.

Currently, the cluster is limited to tightly coupled parallel workflows. However, we are actively working on acquiring general-purpose compute and GPU nodes, which will expand access to all Yale researchers whose computational work involves low-risk data. We will provide more information about the timeline for availability of these nodes soon.

Milgram Maintenance - Tuesday, February 4th, 2025

Due to the small number of updates needed on Milgram, the upcoming February maintenance will result in limited disruptions. The Milgram cluster and storage will remain online and available throughout the maintenance period, and there will be no disruption to running or pending batch jobs. However, certain services will be unavailable for short periods during the maintenance window. Users might experience temporarily increased wait times, as maintenance activities will, at times, reduce the availability of compute nodes.

Please refer to the Maintenance Announcement email sent on January 14, 2025 for more details.

Published: February 01, 2025

January 2025

Announcements

Bouchet Beta

The YCRC’s first installation at MGHPCC, Bouchet, is currently available for beta testing. For the beta period, we are explicitly and exclusively seeking tightly coupled, parallel workloads. In early 2025, we will be acquiring and installing a large number of general purpose compute nodes as well as GPU-enabled compute nodes. At that point Bouchet will be available to all Yale researchers for computational work involving low-risk data. If you have a suitable workload we encourage you to request access to participate in beta testing.

Software Highlights

MATLAB/2023b now available on Grace and McCleary

Published: January 01, 2025

December 2024

Announcements

New YCRC HPC Compute Charging Model

Effective December 1st 2024, the current YCRC CPU-Hour based service charges have been replaced with new Priority Tier service charges. The YCRC has added a new Priority Tier of partitions that is an opt-in, fast lane for computational jobs. All computation on the “standard” tier of partitions (e.g. day, week, mpi, gpu) no longer incur charges. Private nodes and scavenge partitions continue to not incur charges. Visit Priority Tier documentation for more information and to request access. The new compute charging model was developed in close collaboration with faculty, YCRC staff, and university administrators to ensure the YCRC service charging models support the researchers relying on our systems as well as the needs of the University.

Upcoming Grace Maintenance

Due to the limited updates needed on Grace at this time, the current maintenance period (Dec 3-Dec 4) will bring forth only limited disruptions to our services. The Grace cluster and storage will remain online and available throughout the maintenance period and there will be no disruption to running or pending batch jobs. However, certain services will be unavailable for short periods during the maintenance window. The availability of compute nodes will be reduced at times, so users might experience temporarily increased wait times. Please refer to the Maintenance Announcement email for more details.

Published: December 01, 2024

November 2024

Announcements

YCRC Welcomes Ruth Marinshaw as the New Executive Director

YCRC Team is delighted to welcome Ruth Marinshaw as our new Executive Director. Ruth joined Yale this week to serve as the university's primary technologist to support the computing needs of Yale's research community. She will work with colleagues across campus to implement and operate computational technologies that support Yale faculty, students, and staff in conducting cutting-edge research.

Ruth brings to Yale over twenty years of experience leading technology and research services in higher education. Ruth had held multiple positions at The University of North Carolina Chapel Hill, overseeing research computing services, staff, and systems and creating new service partnerships across the university. Over the last twelve years, Ruth served as chief technology officer for research computing at Stanford University. Under Ruth's leadership, Stanford's research computing and cyberinfrastructure services, systems, facilities, and support have grown significantly. She developed a team of research computing professionals and forged critical partnerships across the institution. Ruth was pivotal in establishing an NVIDIA SuperPOD - a data science and AI-focused research instrument envisioned by Stanford’s faculty.

Throughout her career, Ruth has contributed expertise to national conversations on research computing and currently serves as co-chair of the National Science Foundation's Advisory Committee for Cyberinfrastructure.

With exciting initiatives on the horizon, we look forward to Ruth's leadership and guidance in supporting Yale's research community's current and emerging needs. Welcome aboard, Ruth!

Read the announcement from John Barden, Vice President for Information Technology and Campus Services, and Michael Crair, Vice Provost for Research.

Bouchet Beta Testing for Tightly Coupled Parallel Workflows

The YCRC’s first installation at Massachusetts Green High Performance Computing Center will be the HPC cluster Bouchet*.

The first installation of nodes, approximately 4,000 direct-liquid-cooled cores, will be dedicated to tightly coupled parallel workflows, such as those run in the “mpi” partition on the Grace cluster.

We would like to invite you to participate in Bouchet beta testing. We are seeking tightly coupled, parallel workloads only to participate in this phase of development. If you have a suitable parallel workload and would like to participate in testing, please complete the Bouchet Beta Request Form and we will contact you with additional information about accessing and using Bouchet.

Following the beta testing (early 2025), we will be acquiring and installing thousands of general purpose compute cores as well as GPU-enabled compute nodes. At that time Bouchet will become available to all Yale researchers for computational work involving low-risk data. Stay tuned for more information on availability of the cluster.

*The cluster named after Edward Bouchet (1852-1918) who earned a PhD in physics at Yale University in 1876, making him the first self-identified African American to earn a doctorate from an American university.

Published: November 01, 2024

October 2024

Announcements

Announcing Bouchet

The Bouchet HPC cluster, YCRC's first installation at MGHPCC, will be in beta in Fall 2024. The first installation of nodes, approximately 4,000 direct-liquid-cooled cores, will be dedicated to tightly coupled parallel workflows, such as those run in the “mpi” partition on the Grace cluster. Later this year, we will acquire and install a large number of general-purpose compute nodes and GPU-enabled compute nodes. At that point, Bouchet will be available to all Yale researchers for computational work involving low-risk data. Visit the Bouchet page for more information and updates.

Jobstats on the Web

In our quest to provide detailed information about job performance and efficiency, we have recently enhanced the web-based jobstats portal to show summary statistics and plots of CPU, Memory, GPU, and GPU memory usage over time.

These plots are helpful diagnostics for understanding why jobs fail or how to more efficiently request resources. These plots and statistics are available for in-progress jobs, a great way to keep track of performance while jobs are still running. This tool is part of the User Portal which can be accessed via Open OnDemand for each cluster:

Software Highlights

GROMACS/2023.3-foss-2022b-CUDA-12.1.1-PLUMED-2.9.2 is now available on Grace and McCleary

Published: October 01, 2024

September 2024

Announcements

The YCRC is Hiring!

The YCRC is looking to add permanent members to our Research Support team. If helping others use the clusters and learning about other work done at the YCRC interests you, consider joining the YCRC! If you have any questions about the position, contact Kaylea Nelson (kaylea.nelson@yale.edu).

https://research.computing.yale.edu/about/careers

Clarity Access

Yale is launching the Clarity platform. In its initial phase, Clarity offers an AI chatbot powered by OpenAI’s ChatGPT-4o. Importantly, Clarity provides a “walled-off” environment; its use is limited to Yale faculty, students, and staff, and information entered into its chatbot is not saved or used to train external AI models. Clarity is appropriate for use with all data types, including [high-risk data](https://your.yale.edu/policies-procedures/policies/1604-data-classification-policy, provided that all security standards are observed. Its chatbot is capable of content creation, coding assistance, data and image analysis, text-to-speech, and more. Over time, the platform may expand to incorporate additional AI tools, including other large language models. Clarity is designed to evolve as generative AI develops and the community offers feedback.

Before using the Clarity AI chatbot, please review training resources and guidance on appropriate use.

Job Performance Monitoring

We have recently deployed a new tool for measuring and monitoring job performance called jobstats. Available on all clusters, jobstats provides a report of the utilization of CPU, Memory, and GPU resources for in-progress and recently completed jobs. To generate the report simply run (replacing the ID number of the job in question):

[ab123@grace ~]$ jobstats 123456789

======================================================================
                         Slurm Job Statistics
======================================================================
         Job ID: 123456789
  NetID/Account: ab123/agroup
       Job Name: my_job
          State: RUNNING
          Nodes: 1
      CPU Cores: 4
     CPU Memory: 256GB (64GB per CPU-core)
  QOS/Partition: normal/week
        Cluster: grace
     Start Time: Thu Sep 5, 2024 at 10:58 AM
       Run Time: 1-06:43:41 (in progress)
     Time Limit: 4-04:00:00

                         Overall Utilization
======================================================================
  CPU utilization  [|||||||||||||                            26%]
  CPU memory usage [||||                                      8%]

                         Detailed Utilization
======================================================================
  CPU utilization per node (CPU time used/run time)
      r816u29n04: 1-07:48:36/5-02:54:45 (efficiency=25.9%)

  CPU memory usage per node - used/allocated
      r816u29n04: 19.9GB/256.0GB (5.0GB/64.0GB per core of 4)

Software Highlights

R/4.4.1-foss-2022b is now available on all clusters
R-bundle-Bioconductor/3.19-foss-2022b-R-4.4.1 (INCLUDES SEURAT) is now available on all clusters

Published: September 01, 2024

Milgram Maintenance

August 20-22, 2024

Software Updates

Slurm updated to 23.11.9
NVIDIA drivers updated to 555.42.06
Apptainer updated to 1.3.3

Hardware Updates

We proactively replaced two core file servers, as part of our best practices for GPFS (our high-performance parallel filesystem). This does not change our storage capacity, but allows us to maintain existing services and access future upgrades.

Published: August 22, 2024

August 2024

Announcements

Milgram Maintenance

The biannual scheduled maintenance for the Milgram cluster will be occurring August 20-22. During this time, the cluster will be unavailable. See the Milgram maintenance email announcements for more details.

YCRC HPC User Portal

We have recently deployed a web-based User Portal for Grace, McCleary, and Milgram clusters to help researchers view information about their activity on our clusters. Accessible under the “Utilities” tab on Open OnDemand (or at the links below), the portal currently features five pages with personalized data about your cluster usage and guidance on navigating and operating the clusters. Users can easily track their jobs and visualize their utilization through charts, tables, and graphs, helping to optimize their cluster use. To try out the User portal please visit either:

If you have any suggestions for useful pages for the User Portal, please email hpc@yale.edu

Yale Task Force on Artificial Intelligence

As recommended by the Yale Task Force on Artificial Intelligence, the YCRC will be adding a large number of new high-end GPUs to the clusters to meet growing demand for AI compute resources. Stay tuned for details and updates on availability in the coming months!

Research Support at PEARC24

During the week of July 22nd, the YCRC Research Support team attended the annual PEARC conference, a conference specifically focused on the research computing community. The team had the opportunity to meet with many of our peer institutions to discuss our common challenges and opportunities and attend sessions to learn about new solutions. We look forward to bringing some new ideas to the YCRC this year!

Software Highlights

ORCA/6.0.0-gompi-2022b is now available on Grace and McCleary

Published: August 01, 2024

July 2024

Announcements

Compute Charges Rate Freeze

The compute charging model for the YCRC clusters is currently under review. As a result, we are freezing the per-CPU-hour charge at its current value of $0.0025, effective immediately. For more information on the compute charging model, please see the Billing for HPC services page on the YCRC website.

MATLAB Proxy Server

"MATLAB (Web)" is now available as an Open OnDemand app. A MATLAB session is connected directly to your web browser tab, rather than launched via a Remote Desktop session as with the traditional MATLAB app. This allows more of the requested resources to be dedicated to MATLAB itself. Page through the full App list in Open OnDemand to launch. (Note that this is a work in progress that might not yet have all the functionality of a regular MATLAB session.)

FairShare Weights Adjustment

Periodically we adjust the relative impact of resource allocations on a group’s FairShare (the way that the scheduler determines which job gets scheduled next). We have adjusted the “service unit” weights for memory and GPUs to better match their cost to acquire and maintain:

CPU: 1 SU
Memory: 0.067 (15G/SU)
A100 GPU: 100 SU
non-A100 GPU: 15 SU

For more information about FairShare and how we use it to ensure equity in scheduling, please visit our docs page.

Software Highlights

SBGrid is available on McCleary. Contact us for more information on access.

Published: July 01, 2024

Grace Maintenance

June 4-6, 2024

Software Updates

Slurm updated to 23.11.7
NVIDIA drivers updated to 555.42.02
Apptainer updated to 1.3.1

Hardware Updates

The remaining Broadwell generation nodes have been decommissioned.
The oldest node constraint now returns Cascade Lake generation nodes.
The devel partition is now composed of 5 Cascade Lake generation 6240 nodes and 1 Skylake generation (same as the mpi partition) node.
The FDR InfiniBand fabric has been fully decommissioned, and networking has been updated across the Grace cluster.
The Slayman storage system is no longer available from Grace (but remains accessible from McCleary).

Published: June 06, 2024

June 2024

Announcements

Grace Maintenance

The biannual scheduled maintenance for the Grace cluster will be occurring Jun 4-6. During this time, the cluster will be unavailable. See the Grace maintenance email announcements for more details.

Compute Usage Monitoring in Web Portal

We have developed a suite of tools to enable research groups to monitor their combined utilization of cluster resources. We perform nightly queries of Slurm's database to aggregate cluster usage (in cpu_hours) broken down by user, account, and partition. These data are available both as a command-line utility (getusage) and a recently deployed web-application built into Open OnDemand. This can be accessed directly via:

Grace: ood-grace.ycrc.yale.edu/pun/sys/ycrc_getusage McCleary: ood-mccleary.ycrc.yale.edu/pun/sys/ycrc_getusage Milgram: ood-milgram.ycrc.yale.edu/pun/sys/ycrc_getusage

Additionally, the Getusage web-app can be accessed via the “Utilities” pull-down menu after logging into Open OnDemand.

NAIRR Resources for AI

Looking for compute resource for your AI or AI-enabled research? In the NAIRR Pilot, the US National Science Foundation (NSF), the US Department of Energy (DOE), and numerous other partners are providing access to a set of computing, model, platform and educational resources for projects related to advancing AI research. Applications for resources from the NAIRR pilot are lightweight and we are happy to assist with any questions you may have.

https://nairrpilot.org/opportunities/allocations

Published: June 01, 2024

May 2024

Announcements

Yale Joins MGHPCC

We are excited to share that Yale University has recently become a member of the Massachusetts Green High Performance Computing Center (MGHPCC), a not-for-profit data center designed for computationally-intensive research. The construction of a dedicated space for Yale within the facility and the installation of high-speed networking between Yale's campus and MGHPCC are currently underway. The first High-Performance Computing (HPC) hardware installations are expected to take place later this year. As more information becomes available, we will keep our users updated

OneIT Conference

Earlier this year Yale ITS hosted the first in-person One IT conference, [“Advancing Collaborations: One IT as a Catalyst”](https://your.yale.edu/news/2024/04/conference-edition-capturing-spirit-one-it. IT and IT-adjacent personnel from across campus came together to discuss topics that impact research and university operations. YCRC team members participated in a variety of sessions ranging from Research Storage and Software to the role of AI in higher education.

Additionally, YCRC team members presented two posters. The first, [A Graphical Interface for Research and Education](https://image.s10.sfmc-content.com/lib/fe4515707564047b751572/m/1/83cc1a22-d9d3-498d-8a17-8088de496674.pdf, highlighted Open OnDemand and its barrier-reducing impact on courses and research alike. The second, [Globus: a platform for secure, efficient file transfer](https://image.s10.sfmc-content.com/lib/fe4515707564047b751572/m/1/52559eed-25ab-49c9-80be-6ce99fb95b25.pdf, demonstrated our successful deployment of Globus to improve data management and cross-institutional sharing of research materials.

Software Highlights

Spark is now available on Grace, McCleary and Milgram
Nextflow/22.10.6 is now available on Grace and McCleary

Published: May 01, 2024

April 2024

Announcements

New Grace Nodes

We are pleased to announce the addition of 84 new direct-liquid-cooled compute nodes to the commons partitions (day and week) on Grace. These new nodes are of the Intel Icelake generation and have 48 cores each. These nodes also have increased RAM compared to other nodes on Grace, with 10GB per core. The day partition is now close to 11,000 cores and the week partition is now entirely composed of these nodes. A significant number of purchased nodes of similar design have also been added to respective private partitions and are available to all users via the scavenge partition.

Limited McCleary Maintenance - April 2nd

Due to the limited updates needed on McCleary at this time, the upcoming April maintenance will not be a full 3-day downtime, but rather a one-day maintenance with limited disruption. The McCleary cluster and storage will remain online and available throughout the maintenance period and there will be no disruption to running or pending batch jobs. However, certain services will be unavailable for short periods throughout the day. See maintenance announcement email for full details.

Cluster Node Status in Open OnDemand

A Cluster Node Status app is now available in the Open OnDemand web portal on all clusters. This new app presents information about CPU, GPU and memory utilization for each compute node with the cluster. The app can be found under ‘Utilities’ -> ‘Cluster Node Status’.

Retirement of Grace RHEL7 Apps Tree

As part of our routine deprecation of older software, we removed Grace's old application tree (from before the RedHat 8 upgrade) from the default Standard Environment on March 6th. After March 6th, the older module tree will no longer appear when module avail is run and will fail to load. If you have concerns about missing software, please contact us at hpc@yale.edu.

Software Highlights

R/4.3.2-foss-2022b is now available on Grace and McCleary
- Corresponding Bioconductor and CRAN bundles are also now available
PyTorch/2.1.2-foss-2022b-CUDA-12.0.0 with CUDA is available on Grace and McCleary

Published: April 01, 2024

March 2024

Announcements

CPU Usage Reporting with `getusage`

Researchers frequently wish to get a breakdown of their groups' cluster-usage. While Slurm provides tooling for querying the database, it is not particularly user-friendly. We have developed a tool, getusage, which allows researchers to quickly get insight into their groups’ usage, broken down by date and user, including a monthly summary report. Please try this tool and let us know if there are any enhancement requests or ideas.

Changes to RStudio on the Web Portal

Visual Studio Code (VSCode) is a popular development tool that is widely used by our researchers. While there are several extensions that allow users to connect to remote servers over SSH, these are imperfect and often drops connection. Additionally, these remote sessions connect to the clusters' login nodes, where resources are limited.

To meet the growing demand for this particular development tool, we have deployed an application for Open OnDemand that launches VS Code Server directly on a compute node which can then be accessed via web-browser. This OOD application is called code_server and is available on all clusters. For more information see [our OOD docs page](https://docs.ycrc.yale.edu/clusters-at-yale/access/ood/.

Retirement of Grace RHEL7 Apps Tree

As part of our routine deprecation of older software we removed Grace's old application tree (from before the RedHat 8 upgrade) from the default Standard Environment on March 1st. After March 1st, the older module tree will no longer appear when module avail is run and will fail to load going forward. If you have concerns about any missing software, please contact us at hpc@yale.edu.

Software Highlights

FSL/6.0.5.2-centos7_64 is now available on Milgram
Nextflow/23.10.1 is now available on Grace & McCleary

Published: March 01, 2024

Milgram Maintenance

Februrary 6-8, 2024docs/news/2024-02-milgram.md

Software Updates

Slurm updated to 23.02.7
NVIDIA drivers updated to 545.23.08
Apptainer updated to 1.2.5
Lmod updated to 8.7.32

Upgrade to Red Hat 8

As part of this maintenance, the operating system on Milgram has been upgraded to Red Hat 8.

Jobs submitted prior to maintenance that were held and now released will run under RHEL8 (instead of RHEL7). This may cause some jobs to not run properly, so we encourage you to check on your job output. Our docs page provides information on the RHEL8 upgrade, including fixes for common problems. Please notify hpc@yale.edu if you require assistance.

Changes to Interactive Partitions and Jobs

We have made the following changes to interactive jobs during the maintenance.

The 'interactive' and 'psych_interactive` partitions have been renamed to 'devel' and 'psych_devel', respectively, to bring Milgram in alignment with the other clusters. This change was made on other clusters in recognition that interactive-style jobs (such as OnDemand and 'salloc' jobs) are commonly run outside of the 'interactive' partition. Please adjust your workflows accordingly after the maintenance.

Additionally, all users are limited to 4 interactive app instances (of any type) at one time. Additional instances will be rejected until you delete older open instances. For OnDemand jobs, closing the window does not terminate the interactive app job. To terminate the job, click the "Delete" button in your "My Interactive Apps" page in the web portal.

Please visit the status page at research.computing.yale.edu/system-status for the latest updates. If you have questions, comments, or concerns, please contact us at hpc@yale.edu.

Published: February 08, 2024

February 2024

Announcements

Milgram Maintenance

The biannual scheduled maintenance for the Milgram cluster will be occurring Feb 6-8. During this time, the cluster will be unavailable. See the Milgram maintenance email announcements for more details.

Changes to RStudio on the Web Portal

The “RStudio Server” app on the Open OnDemand web portal has been upgraded to support both the R software modules and R installed via conda. As such the “RStudio Desktop” app has been retired and removed from the web portal. If you still require RStudio Desktop, we provide instructions for running under the “Remote Desktop” app (please note that this is not a recommended practice for most users).

Software Highlights

ChimeraX is now available as an app on the McCleary Open OnDemand web portal
FSL/6.0.5.2-centos7_64 is now available on McCleary

Published: February 01, 2024

January 2024

Announcements

Upcoming Milgram RHEL8 Upgrade

As Red Hat Enterprise Linux (RHEL) 7 approaches its end of life, we will be upgrading the Milgram cluster from RHEL7 to RHEL8 during the February maintenance window. This will bring Milgram in line with our other clusters and provide a number of key benefits: continued security patches and support beyond 2024, updated system libraries to better support modern software, improved node management system

We have set aside rhel8_devel and rhel8_day partitions for use in debugging and testing of workflows before the February maintenance. For more information on testing your workflows see our explainer.

Published: January 01, 2024

Grace Maintenance

December 5-7, 2023

Software Updates

Slurm updated to 23.02.6
NVIDIA drivers updated to 545.23.08
Lmod updated to 8.7.32
Apptainer updated to 1.2.4

Multifactor Authentication (MFA)

Multi-Factor authentication via Duo is now required for ssh for all users on Grace after the maintenance. For most usage, this additional step is minimally invasive and makes our clusters much more secure. However, for users who use graphical transfer tools such as Cyberduck, please see our MFA transfer documentation.

Transfer Node Host Key Change

The ssh host key for Grace's transfer node was changed during the maintenance, which will result in a "WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!" error when you attempt to login. To access the cluster again, first remove the old host keys with the following command (if accessing the cluster via command line):

ssh-keygen -R transfer-grace.ycrc.yale.edu

If you are using a GUI, such as MobaXterm, you will need to manually edit your known host file and remove the list related to Grace. For MobaXterm, this file is located (by default) in Documents/MobaXterm/home/.ssh.

Then attempt a new login and accept the new host key.

Published: December 07, 2023

December 2023

Announcements

Grace Maintenance - Multi-Factor Authentication

The biannual scheduled maintenance for the Grace cluster will be occurring Dec 5-7. During this time, the cluster will be unavailable. See the Grace maintenance email announcements for more details.

Multi-Factor authentication via Duo will be required for ssh for all users on Grace after the maintenance. For most usage, this additional step is minimally invasive and makes our clusters much more secure. However, for users who use graphical transfer tools such as Cyberduck, please see our MFA transfer documentation.

`scavenge_gpu` and `scavenge_mpi`

In addition to the general purpose scavenge partition, we also have two resource specific scavenge partitions, scavenge_gpu (Grace, McCleary) and scavenge_mpi (Grace only). The scavenge_gpu partition contains all GPU enabled nodes, commons and privately owned. Similarly, the scavenge_mpi partition contains all nodes similar to the mpi partition. Both partitions have higher priority for their respective nodes than normal scavenge (meaning jobs submitted to scavenge_gpu or scavenge_mpi will preempt normal scavenge jobs). All scavenge partitions are exempt from CPU charges.

Software Highlights

IMOD/4.12.56_RHEL7-64_CUDA12.1 is now available on McCleary and Grace

Published: December 01, 2023

November 2023

Announcements

Globus Available on Milgram

Globus is now available to move data in and out from Milgram. For increased security, Globus only has access to a staging directory (/gpfs/milgram/globus/$NETID) where you can temporarily store data. Please see our documentation page for more information and reach out to hpc@yale.edu if you have any questions.

RStudio Server Updates

RStudio Server on the OpenDemand web portal for all clusters now starts an R session in a clean environment and will not save the session when you finish. If you want to save your session and reuse it next time, please select the checkbox "Start R from your last saved session".

Published: November 01, 2023

McCleary Maintenance

October 3-5, 2023_

Software Updates

Slurm updated to 23.02.5
NVIDIA drivers updated to 535.104.12
Lmod updated to 8.7.30
Apptainer updated to 1.2.3
System Python updated to 3.11

Published: October 05, 2023

October 2023

Announcements

McCleary Maintenance

The biannual scheduled maintenance for the McCleary cluster will be occurring Oct 3-5. During this time, the cluster will be unavailable. See the McCleary maintenance email announcements for more details.

Interactive jobs on `day` on McCleary

Interactive jobs are now allowed to be run on the day partition on McCleary. Note you are still limited to 4 interactive-style jobs of any kind (salloc or OpenOnDemand) at one time. Additional instances will be rejected until you delete older open instances. For OnDemand jobs, closing the window does not terminate the interactive app job. To terminate the job, click the "Delete" button in your "My Interactive Apps" page in the web portal.

"Papermill" for Jupyter Command-Line Execution

Many scientific workflows start as interactive Jupyter notebooks, and our Open OnDemand portal has dramatically simplified deploying these notebooks on cluster resources. However, the step from running notebooks interactively to running jobs as a batch script can be challenging and is often a barrier to migrating to using sbatch to run workflows non-interactively.

To help solve this problem, there are a handful of utilities that can execute a notebook as if you were manually hitting "shift-Enter" for each cell. Of note is Papermill which provides a powerful set of tools to bridge between interactive and batch-mode computing.

To get started, install papermill into your Conda environments:

module load miniconda
conda install papermill

Then you can simply evaluate a notebook, preserving figures and output inside the notebook, like this:

papermill /path/to/notebook.ipynb

This can be run inside a batch job that might look like this:

#!/bin/bash
#SBATCH -p day
#SBATCH -c 1
#SBATCH -t 6:00:00

module unload miniconda
conda activate my_env
papermill /path/to/notebook.ipynb

Variables can also be parameterized and passed in as command-line options so that you can run multiple copies simultaneously with different input variables. For more information see the [Papermill docs pages](https://papermill.readthedocs.io/.

Published: October 01, 2023

September 2023

Announcements

Grace RHEL8 Upgrade

As Red Hat Enterprise Linux (RHEL) 7 approaches its end of life, we upgraded the Grace cluster from RHEL7 to RHEL8 during the August maintenance window. This brings Grace in line with McCleary and provide a number of key benefits:

continued security patches and support beyond 2023
updated system libraries to better support modern software
improved node management system to facilitate the growing number of nodes on Grace
shared application tree between McCleary and Grace, which brings software parity between clusters

There are a small number of compute nodes in the legacy partition with the old RHEL7 operating system installed for workloads that still need to be migrated. We expect to retire this partition during the Grace December 2023 maintenance. Please contact us if you need help upgrading to RHEL8 in the coming months.

Grace Old Software Deprecation

The RHEL7 application module tree (/gpfs/loomis/apps/avx) is now deprecated and will be removed from the default module environment during the Grace December maintenance. The software will still be available on Grace, but YCRC will no longer provide support for those old packages after December. If you are using a software package in that tree that is not yet installed into the new shared module tree, please let us know as soon as possible so we can help avoid any disruptions.

Software Highlights

intel/2022b toolchain is now available on Grace and McCleary
- MKL 2022.2.1
- Intel MPI 2022.2.1
- Intel Compilers 2022.2.1
foss/2022b toolchain is now available on Grace and McCleary
- FFTW 3.3.10
- ScaLAPACK 2.2.0
- OpenMPI 4.1.4
- GCC 12.2.0

Published: September 01, 2023

Milgram Maintenance

August 22, 2023_

Software Updates

Slurm updated to 22.05.9
NVIDIA drivers updated to 535.86.10
Apptainer updated to 1.2.42
Open OnDemand updated to 2.0.32

Multi-Factor Authentication

Multi-factor authentication is now required for ssh for all users on Milgram. For most usage, this additional step is minimally invasive and makes our clusters much more secure. However, for users who use graphical transfer tools such as Cyberduck, please see our MFA transfer documentation.

Published: August 22, 2023

Grace Maintenance

August 15-17, 2023

Software Updates

Red Hat Enterprise Linux (RHEL) updated to 8.8
Slurm updated to 22.05.9
NVIDIA drivers updated to 535.86.10
Apptainer updated to 1.2.2
Open OnDemand updated to 2.0.32

Upgrade to Red Hat 8

As part of this maintenance, the operating system on Grace has been upgraded to Red Hat 8. A new unified software tree that is shared with the McCleary cluster has been created.

The ssh host keys for Grace's login nodes were changed during the maintenance, which will result in a "WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!" error when you attempt to login. To access the cluster again, first remove the old host keys with the following command (if accessing the cluster via command line):

ssh-keygen -R grace.hpc.yale.edu

If you are using a GUI, such as MobaXterm, you will need to manually edit your known host file and remove the list related to Grace. For MobaXterm, this file is located (by default) in Documents/MobaXterm/home/.ssh.

Then attempt a new login and accept the new host key.

New Open OnDemand (Web Portal) URL

The new URL for the Grace Open OnDemand web portal is https://ood-grace.ycrc.yale.edu.

Published: August 17, 2023

August 2023

Announcements

Ruddle Farewell: July 24, 2023

On the occasion of decommissioning the Ruddle cluster on July 24, the Yale Center for Genome Analysis (YCGA) and the Yale Center for Research Computing (YCRC) would like to acknowledge the profound impact Ruddle has had on computing at Yale. Ruddle provided the compute resources for YCGA's high throughput sequencing and supported genomic computing for hundreds of research groups at YSM and across the University. In February 2016, Ruddle replaced the previous biomedical cluster BulldogN. Since then, it has run more than 24 million user jobs comprising more than 73 million compute hours.

Funding for Ruddle came from NIH grant 1S10OD018521-01, with Shrikant Mane as PI. Ruddle is replaced by a dedicated partition and storage on the new McCleary cluster, which were funded by NIH grant 1S10OD030363-01A1, also awarded to Dr. Mane.

Upcoming Grace Maintenance: August 15-17, 2023

Scheduled maintenance will be performed on the Grace cluster starting on Tuesday, August 15, 2023, at 8:00 am. Maintenance is expected to be completed by the end of day, Thursday, August 17, 2023.

Upcoming Milgram Maintenance: August 22-24, 2023

Scheduled maintenance will be performed on the Milgram cluster starting on Tuesday, August 22, 2023, at 8:00 am. Maintenance is expected to be completed by the end of day, Thursday, August 24, 2023.

Grace Operating System Upgrade

As Red Hat Enterprise Linux (RHEL) 7 approaches its end of life, we will be upgrading the Grace cluster from RHEL7 to RHEL8 during the August maintenance window. This will bring Grace in line with McCleary and provide a number of key benefits:

continued security patches and support beyond 2023
updated system libraries to better support modern software
improved node management system to facilitate the growing number of nodes on Grace
shared application tree between McCleary and Grace, which brings software parity between clusters

Three test partitions are available (rhel8_day, rhel8_gpu, and rhel8_mpi) for use in debugging workflows before the upgrade. These partitions should be accessed from the rhel8_login node.

Software Highlights

Julia/1.9.2-linux-x86_64 available on Grace
Kraken2/2.1.3-gompi-2020b available on McCleary
QuantumESPRESSO/7.0-intel-2020b available on Grace

Published: August 01, 2023

July 2023

Announcements

Red Hat 8 Test partitions on Grace

As Red Hat Enterprise Linux (RHEL) 7 approaches its end of life, we will be upgrading the Grace cluster to RHEL8 during the August 15th-17th maintenance. This will bring Grace in line with McCleary and provide a number of key benefits:

continued security patches and support beyond 2023
updated system libraries to better support modern software
improved node management system to facilitate the growing number of nodes on Grace
shared application tree between McCleary and Grace, which brings software parity between clusters

While we have performed extensive testing, both internally and with the new McCleary cluster, we recognize that there are large numbers of custom workflows on Grace that may need to be modified to work with the new operating system.

Please note: To enable debugging and testing of workflows ahead of the scheduled maintenance, we have set aside rhel8_day, rhel8_gpu, and rhel8_mpi partitions. You should access them from the rhel8_login node.

Two-factor Authentication for McCleary

To assure the security of the cluster and associated services, we have implemented two-factor authentication on the McCleary cluster. To simplify the transition, we have collected a set of best-practices and configurations of many of the commonly used access tools, including CyberDuck, MobaXTerm, and WinSCPon, which you can access on our docs page. If you are using other tools and experiencing issues, please contact us for assistance.

New GPU Nodes on McCleary and Grace

We have installed new GPU nodes for McCleary and Grace, dramatically increasing the number of GPUs available on both clusters. McCleary has 14 new nodes (56 GPUs) added to the gpu partition and six nodes (24 GPUs) added to pi_cryoem. Grace has 12 new nodes, available in the rhel8_gpu partition. Each of the new nodes contains 4 NVIDIA A5000 GPUs, with 24GB of on-board VRAM and PCIe4 connection to improve data-transport time.

Software Highlights

MATLAB/2023a available on all clusters
Beast/2.7.4-GCC-12.2.0 available on McCleary
AFNI/2023.1.07-foss-2020b available on McCleary
FSL 6.0.5.1 (CPU-only and GPU-enabled versions) available on McCleary

Published: July 01, 2023

June 2023

Announcements

McCleary Officially Launches

Today marks the official beginning of the McCleary cluster’s service. In addition to compute nodes migrated from Farnam and Ruddle, McCleary features our first set of direct-to-chip liquid cooled (DLC) nodes, moving YCRC into a more environmentally friendly future. McCleary is significantly larger than the Farnam and Ruddle clusters combined. The new DLC compute nodes are able to run faster and with higher CPU density due to their superior cooling system.

McCleary is named for Beatrix McCleary Hamburg, who received her medical degree in 1948 and was the first female African American graduate of Yale School of Medicine.

Farnam Farewell: June 1, 2023

On the occasion of decommissioning the Farnam cluster on June 1, YCRC would like to acknowledge the profound impact Farnam has had on computing at Yale. Farnam supported biomedical computing at YSM and across the University providing compute resources to hundreds of research groups. Farnam replaced the previous biomedical cluster Louise, and began production in October 2016. Since then, it has run user jobs comprising more than 139 million compute hours. Farnam is replaced by the new cluster McCleary.

Please note: Read-only access to Farnam’s storage system (/gpfs/ysm) will be available on McCleary until July 13, 2023. For more information see McCleary transfer documentation.

Ruddle Decommission: July 1, 2023

The Ruddle cluster will be decommissioned and access will be disabled July 1, 2023. We will be migrating project and sequencing directories from Ruddle to McCleary.

Please note: Users are responsible for moving home and scratch data to McCleary prior to July 1, 2023. For more information and instructions, see our McCleary transfer documentation.

Software Highlights

R/4.3.0-foss-2020b+ available on all clusters. The newest version of R is now available on Grace, McCleary, and Milgram. This updates nearly 1000 packages and can be used in batch jobs and in RStudio sessions via Open OnDemand.
AlphaFold/2.3.2-foss-2020b-CUDA-11.3.1 The latest version of AlphaFold (2.3.2, released in April) has been installed on McCleary and is ready for use. This version fixes a number of bugs and should improve GPU memory usage enabling longer proteins to be studied.
LAMMPS/23Jun2022-foss-2020b-kokkos available on McCleary
RevBayes/1.2.1-GCC-10.2.0 available on McCleary
Spark 3.1.1 (CPU-only and GPU-enabled versions) available on McCleary

Published: June 01, 2023

Upcoming Maintenances

The McCleary cluster will be unavailable from 9am-1pm on Tuesday May 30 while maintenance is performed on the YCGA storage.
The Milgram, Grace and McCleary clusters will not be available from 2pm on Monday June 19 until 10am on Wednesday June 21, due to electrical work being performed in the HPC data center. No changes will be made that impact users of the clusters.
The regular Grace maintenance that had been scheduled for June 6-8 will be performed on August 15-17. This change is being made in preparation for the upgrade to RHEL 8 on Grace.

Published: May 23, 2023

May 2023

Announcements

Farnam Decommission: June 1, 2023

After many years of supporting productive science, the Farnam cluster will be decommissioned this summer as we transition to the newly deployed McCleary cluster. Logins will be disabled June 1, 2023, which will mark the official end of Farnam’s service. Read-only access to Farnam’s storage system (/gpfs/ysm) will be available on McCleary until July 13, 2023. All data on YSM (that you want to keep) will need to be transferred off YSM, either to non-HPC storage or to McCleary project space by you prior to YSM’s retirement.

Ruddle Decommission: July 1, 2023

After many years of serving YCGA, the Ruddle cluster will also be decommissioned this summer as we transition to the newly deployed McCleary cluster. Logins will be disabled July 1, 2023, which will mark the official end of Ruddle’s service. We will be migrating project and sequencing directories from Ruddle to McCleary. However, you are responsible for moving home and scratch data to McCleary before July 1, 2023.

Please begin to migrate your data and workloads to McCleary at your earliest convenience and reach out with any questions.

McCleary Transition Reminder

With our McCleary cluster now in a production stable state, we ask all Farnam users to ensure all home, project and scratch data the group wishes to keep is migrated to the new cluster ahead of the June 1st decommission. As June 1st is the formal retirement of Farnam, compute service charges on McCleary commons partitions will begin at this time. Ruddle users will have until July 1st to access the Ruddle and migrate their home and scratch data as needed. Ruddle users will NOT need to migrate their project directories; those will be automatically transferred to McCleary. As previously established on Ruddle, all jobs in the YCGA partitions will be exempt from compute service charges on the new cluster. For more information visit our McCleary Transition documentation.

Software Highlights

Libmamba solver for conda 23.1.0+ available on all clusters. Conda installations 23.1.0 and newer are now configured to use the faster environment solving algorithm developed by mamba by default. You can simply use conda install and enjoy the significantly faster solve times.
GSEA available in McCleary and Ruddle OOD. Gene Set Enrichment Analysis (GSEA) is now available in McCleary OOD and Ruddle OOD for all users. You can access it by clicking “Interactive Apps'' and then selecting “GSEA”. GSEA is a popular computational method to do functional analysis of multi omics data. Data files for GSEA are not centrally stored on the clusters, so you will need to download them from the GSEA website by yourself.
NAG/29-GCCcore-11.2.0 available on Grace
AFNI/2023.1.01-foss-2020b-Python-3.8.6 on McCleary

Published: May 01, 2023

April 2023

Announcements

McCleary in Production Status

During March, we have been adding nodes to McCleary, including large memory nodes (4 TiB), GPU nodes and migrating most of the commons nodes from Farnam to McCleary (that are not being retired). Moreover, we have finalized the setup of McCleary and the system is now production stable. Please feel comfortable to migrate your data and workloads from Farnam and Ruddle to McCleary at your earliest convenience.

New YCGA Nodes Online on McCleary

McCleary now has over 3000 new cores dedicated to YCGA work! We encourage you to test your workloads and prepare to migrate from Ruddle to McCleary at your earliest convenience. More information can be found here.

Software Highlights

QuantumESPRESSO/7.1-intel-2020b available on Grace
RELION/4.0.1 available on McCleary
miniconda/23.1.0 available on all clusters
scikit-learn/0.23.2-foss-2020b on Grace and McCleary
seff-array updated to 0.4 on Grace, McCleary and Milgram

Published: April 01, 2023

March 2023

Announcements

McCleary Now Available

The new McCleary HPC cluster is now available for active Farnam and Ruddle users–all other researchers who conduct life sciences research can request an account using our Account Request form. Farnam and Ruddle will be retired in mid-2023 so we encourage all users on those clusters to transition their work to McCleary at your earliest convenience. If you see any issues on the new cluster or have any questions, please let us know at hpc@yale.edu.

Open OnDemand VSCode Available Everywhere

A new OOD app code-server is now available on all YCRC clusters, including Milgram and McCleary. The new code-server allows you to run VSCode in a browser on a compute node. All users who have been running VSCode on a login node via the ssh extension should switch to code-server at their earliest convenience. Unlike VSCode on the login node, the new app also enables you to use GPUs, to allocate large memory nodes, and to specify a private partition (if applicable) The app is still in beta version and your feedback is much appreciated.

Software Highlights

GPU-enabled LAMMPS (LAMMPS/23Jun2022-foss-2020b-kokkos-CUDA-11.3.1) is now available on Grace.
AlphaFold/2.3.1-fosscuda-2020b is now available on Farnam and McCleary.

Published: March 01, 2023

Milgram Maintenance

February 7, 2023

Software Updates

Slurm updated to 22.05.7
NVIDIA drivers updated to 525.60.13
Apptainer updated to 1.1.4
Open OnDemand updated to 2.0.29

Hardware Updates

Milgram’s network was restructured to reduce latency, and improve resiliency.

Published: February 07, 2023

February 2023

Announcements

Milgram Maintenance

The biannual scheduled maintenance for the Milgram cluster will be occurring Feb 7-9. During this time, the cluster will be unavailable. See the Milgram maintenance email announcements for more details.

McCleary Launch

The YCRC is pleased to announce the launch of the new McCleary HPC cluster. The McCleary HPC cluster will be Yale's first direct-to-chip liquid cooled cluster, moving the YCRC and the Yale research computing community into a more environmentally friendly future. McCleary will be available in a “beta” phase to Farnam and Ruddle users later on this month. Keep an eye on your email for further announcements about McCleary’s availability.

Published: February 01, 2023

January 2023

Announcements

Open OnDemand VSCode

A new OOD app code-server is now available on all clusters, except Milgram (coming in Feb). Code-server allows you to run VSCode in a browser on a compute node. All users who have been running VSCode on a login node via the ssh extension should switch to code-server immediately. The app allows you to use GPUs, to allocate large memories, and to specify a private partition (if you have the access), things you won’t be able to do if you run VSCode on a login node. The app is still in beta version and your feedback is much appreciated.

Milgram Transfer Node

Milgram now has a node dedicated to data transfers to and from the cluster. To access the node from within Milgram, run ssh transfer from the login node. To upload or download data from Milgram via the transfer node, use the hostname transfer-milgram.hpc.yale.edu (must be on VPN). More information can be found in our Transfer Data documentation.

With the addition of the new transfer node, we ask that the login nodes are no longer used for data transfers to limit impact on regular login activities.

Published: January 01, 2023

Grace Maintenance

December 6-8, 2022

Software Updates

Slurm updated to 22.05.6
NVIDIA drivers updated to 520.61.05
Apptainer updated to 1.1.3
Open OnDemand updated to 2.0.28

Hardware Updates

Roughly 2 racks worth of equipment were moved to upgrade the effective InfiniBand connection speeds of several compute nodes (from 56 to 100 Gbps)
The InfiniBand network was modified to increase capacity and allow for additional growth
Some parts of the regular network were improved to shorten network paths and increase shared-uplink bandwidth

Loomis Decommission

The Loomis GPFS filesystem has been retired and unmounted from Grace, Farnam, and Ruddle. For additional information please see the Loomis Decommission page.

Published: December 08, 2022

December 2022

Announcements

Grace & Gibbs Maintenance

The biannual scheduled maintenance for the Grace cluster will be occurring December 6-8. During this time, the cluster will be unavailable. Additionally, the Gibbs filesystem will be unavailable on Farnam and Ruddle on Tuesday, December 6th to deploy a critical firmware upgrade. See the maintenance email announcements for more details.

Loomis Decommission

The Loomis GPFS filesystem will be retired and unmounted from Grace and Farnam during the Grace December maintenance starting on December 6th. All data except for a few remaining private filesets have already been transferred to other systems (e.g., current software, home, scratch to Palmer and project to Gibbs). The remaining private filesets are being transferred to Gibbs in advance of the maintenance and owners should have received communications accordingly. The only potential user impact of the retirement is on anyone using the older, deprecated software trees. Otherwise, the Loomis retirement should have no user impact but please reach out if you have any concerns or believe you are still using data located on Loomis. See the Loomis Decommission documentation for more information.

Apptainer Upgrade on Grace and Ruddle

The newest version of Apptainer (v1.1, available now on Ruddle and, after December maintenance, on Grace) comes the ability to create containers without needing elevated privileges (i.e. sudo access). This greatly simplifies the container workflow as you no longer need a separate system to build a container from a definition file. You can simply create a definition file and run the build command.

For example, to create a simple toy container from this def file (lolcow.def):

BootStrap: docker
From: ubuntu:20.04

%post
    apt-get -y update
    apt-get -y install cowsay lolcat

%environment
    export LC_ALL=C
    export PATH=/usr/games:$PATH

%runscript
    date | cowsay | lolcat

You can run:

salloc -p interactive -c 4 
apptainer build lolcow.sif lolcow.def

This upgrade is live on Ruddle and will be applied on Grace during the December maintenance. For more information, please see the Apptainer documentation site and our docs page on containers.

Software Highlights

RELION/4.0.0-fosscuda-2020b for cryo-EM/cryo-tomography data processing is now available on Farnam. RELION/3.1 will no longer be updated by the RELION developer. Note that data processed with RELION 4 are not backwards compatible with RELION 3.

Published: December 01, 2022

Ruddle Maintenance

November 1, 2022

Software Updates

Security updates
Slurm updated to 22.05.5
Apptainer updated to 1.1.2
Open OnDemand updated to 2.0.28

Hardware Updates

No hardware changes during this maintenance.

Published: November 01, 2022

November 2022

Announcements

Ruddle Maintenance

The biannual scheduled maintenance for the Ruddle cluster will be occurring Nov 1-3. During this time, the cluster will be unavailable. See the Ruddle maintenance email announcements for more details.

Grace and Milgram Maintenance Schedule Change

We will be adjusting the timing of Grace and Milgram's scheduled maintenance periods. Starting this December, Grace's maintenance periods will occur in December and June, with the next scheduled for December 6-8, 2022. Milgram's next maintenance will instead be performed in February and August, with the next scheduled for February 7-9, 2023. Please refer to previously sent communications for more information and see the full maintenance schedule for next year on our status page.

Requeue after Timeout

The YCRC clusters all have maximum time-limits that sometimes are shorter than a job needs to finish. This can be a frustration for researchers trying to get a simulation or a project finished. However, a number of workflows have the ability to periodically save the status of a process to a file and restart from where it left off. This is often referred to as "checkpointing" and is built into many standard software tools, like Gaussian and Gromacs.

Slurm is able to send a signal to your job just before it runs out of time. Upon receiving this signal, you can have your job save its current status and automatically submit a new version of the job which picks up where it left off. Here is an example of a simple script that resubmits a job after receiving the TIMEOUT signal:

#!/bin/bash
#SBATCH -p day
#SBATCH -t 24:00:00
#SBATCH -c 1
#SBATCH --signal=B:10@30 # send the signal `10` at 30s before job finishes
#SBATCH --requeue        # mark this job eligible for requeueing

# define a `trap` that catches the signal and requeues the job
trap "echo -n 'TIMEOUT @ '; date; echo 'Resubmitting...'; scontrol requeue ${SLURM_JOBID}  " 10

# run the main code, with the `&` to “background” the task
./my_code.exe &

# wait for either the main code to finish to receive the signal
wait

This tells Slurm to send SIGNAL10 at ~30s before the job finishes. Then we define an action (or trap) based on this signal which requeues the job. Don’t forget to add the & to the end of the main executable and the wait command so that the trap is able to catch the signal.

Software Highlights

MATLAB/2022b is now available on all clusters.

Published: November 01, 2022

Farnam Maintenance

October 4-5, 2022

Software Updates

Security updates
Slurm updated to 22.05.3
NVIDIA drivers updated to 515.65.01
Lmod updated to 8.7
Apptainer updated to 1.0.3
Open OnDemand updated to 2.0.28

Hardware Updates

No hardware changes during this maintenance.

Published: October 05, 2022

October 2022

Announcements

Farnam Maintenance

The biannual scheduled maintenance for the Farnam cluster will be occurring Oct 4-6. During this time, the cluster will be unavailable. See the Farnam maintenance email announcements for more details.

Gibbs Maintenance

Additionally, the Gibbs storage system will be unavailable on Grace and Ruddle on Oct 4 to deploy an urgent firmware fix. All jobs on those clusters will be held, and no new jobs will be able to start during the upgrade to avoid job failures.

New Command for Interactive Jobs

The new version of Slurm (the scheduler) has improved the process of launching an interactive compute job. Instead of the clunky srun --pty bash syntax from previous versions, this is now replaced with salloc. In addition, the interactive partition is now the default partition for jobs launched using salloc. Thus a simple (1 core, 1 hour) interactive job can be requested like this:

salloc

which will submit the job and then move your shell to the allocated compute node.

For MPI users, this allows multi-node parallel jobs to be properly launched inside an interactive compute job, which did not work as expected previously. For example, here is a two-node job, launched with salloc and then a parallel job-step launched with srun:

[user@grace1 ~]$ salloc --nodes 2 --ntasks 2 --cpus-per-task 1
salloc: Nodes p09r07n[24,28] are ready for job

[user@p09r07n24 ~]$ srun hostname
p09r07n24.grace.hpc.yale.internal
P09r07n28.grace.hpc.yale.internal

For more information on salloc, please refer to Slurm’s documentation.

Software Highlights

cellranger/7.0.1 is now available on Farnam.
LAMMPS/23Jun2022-foss-2020b-kokkos is now available on Grace.

Published: October 01, 2022

September 2022

Announcements

Software Module Extensions

Our software module utility (Lmod) has been enhanced to enable searching for Python and R (among other software) extensions. This is a very helpful way to know which software modules contain a specific library or package. For example, to see what versions of ggplot2 are available, use the module spider command.

$ module spider ggplot2
--------------------------------------------------------
  ggplot2:
--------------------------------------------------------
     Versions:
        ggplot2/3.3.2 (E)
        ggplot2/3.3.3 (E)
        ggplot2/3.3.5 (E)

$ module spider ggplot2/3.3.5
-----------------------------------------------------------
  ggplot2: ggplot2/3.3.5 (E)
-----------------------------------------------------------
    This extension is provided by the following modules. To access the extension you must load one of the following modules. Note that any module names in parentheses show the module location in the software hierarchy.

       R/4.2.0-foss-2020b

This indicates that by loading the R/4.2.0-foss-2020b module you will gain access to ggplot2/3.3.5.

Software Highlights

topaz/0.2.5-fosscuda-2020b for use with RELION (fosscuda-2020b toolchain) is now available as a module on Farnam.

Published: September 01, 2022

Grace Maintenance

August 2-4, 2022

Software Updates

Security updates
Slurm updated to 22.05.2
NVIDIA drivers updated to 515.48.07 (except for nodes with K80 GPUs, which are stranded at 470.129.06)
Singularity replaced by Apptainer version 1.0.3 (note: the "singularity" command will still work as expected)
Lmod updated to 8.7
Open OnDemand updated to 2.0.26

Hardware Updates

Core components of the ethernet network were upgraded to improve performance and increase overall capacity.

Loomis Decommission and Project Data Migration

After over eight years in service, the primary storage system on Grace, Loomis (/gpfs/loomis), will be retired later this year.

Project. We have migrated all of the Loomis project space (/gpfs/loomis/project) to the Gibbs storage system at /gpfs/gibbs/project during the maintenance. You will need to update your scripts and workflows to point to the new location (/gpfs/gibbs/project/<group>/<netid>). The "project" symlink in your home directory has been updated to point to your new space (with a few exceptions described below), so scripts using the symlinked path will not need to be updated. If you have jobs in a pending state going into the maintenance that used the absolute Loomis path, we recommend canceling, updating and then re-submitting those jobs so they do not fail.

If you had a project space that exceeds the no-cost allocation (4 TiB), you have received a separate communication from us with details about your data migration. In these instances, your group has been granted a new, empty "project" space with the default no-cost quota. Any scripts will need to be updated accordingly.

Conda. By default, all conda environments are installed into your project directory. However, most conda environments do not survive being moved from one location to another, so you may need to regenerate your conda environment(s). To assist with this, we provide conda-export documentation.

R. Similarly, in 2022 we started redirecting user R packages to your project space to conserve home directory usage. If your R environment is not working as expected, we recommend deleting the old installation (found in ~/project/R/<version>) and rerunning install.packages.

Custom Software Installation. If you or your group had any self-installed software in the project directory, it is possible that the migration will have broken the software and it will need to be recompiled. Contact us if you need assistance recompiling.

Scratch60. The Loomis scratch space (/gpfs/loomis/scratch60) is now read-only. All data in that directory will be purged in 60 days on October 3, 2022. Any data in /gpfs/loomis/scratch60 you wish to retain needs to be copied into another location by that date (such as your Gibbs project or Palmer scratch).

Changes to Non-Interactive Sessions

Non-interactive sessions (e.g. file transfers, commands sent over ssh) no longer load the standard cluster environment to alleviate performance issues due to unnecessary module loads. Please contact us if this change affects your workflow so we can resolve the issue or provide a workaround.

Published: August 04, 2022

August 2022

Announcements

Grace Maintenance & Storage Changes

The biannual scheduled maintenance for the Grace cluster will be occurring August 2-4. During this time, the cluster will be unavailable. See the Grace maintenance email announcement for more details.

During the maintenance, significant changes will be made to the project and scratch60 directories on Grace. See our website for more information and updates.

SpinUp Researcher Image & Containers

Yale offers a simple portal for creating cloud-based compute resources called SpinUp. These cloud instances are hosted on Amazon Web Services, but have access to Yale services like Active Directory, DNS, and Storage at Yale. SpinUp offers a range of services including virtual machines, web servers, remote storage, and databases.

Part of this service is a Researcher Image, an Ubuntu-based system which contains a suite of pre-installed commonly utilized software utilities, including: - PyTorch, TensorFlow, Keras, and other GPU-accelerated deep learning frameworks - GCC, Cmake, Go, and other development tools - Singularity/Apptainer and Docker for container development

We recommend researchers looking to develop containers for use on YCRC HPC resources to utilize SpinUp to build containers which can then be copied to the clusters.

If there are software utilities or commonly used tools that you would like added to the Researcher Image, let us know and we can work with the Cloud Team to get them integrated.

Software Highlights

AFNI/2022.1.14 is now available on Farnam and Milgram.
cellranger/7.0.0 is now available on Grace.

Published: August 01, 2022

July 2022

Announcements

Loomis Decommission

After almost a decade in service, the primary storage system on Grace, Loomis (/gpfs/loomis), will be retired later this year. The usage and capacity on Loomis will be replaced by two existing YCRC storage systems, Palmer and Gibbs, which are already available on Grace. Data in Loomis project storage will be migrated to /gpfs/gibbs/project during the upcoming August Grace maintenance. See the Loomis Decommission documenation for more information and updates.

Updates to OOD Jupyter App

OOD Jupyter App has been updated to handle conda environments more intelligently. Instead of listing all the conda envs in your account, the app now lists only the conda environments with Jupyter installed. If you do not see your desired environment listed in the dropdown, check that you have installed Jupyter in that environment. In addition, the “jupyterlab” checkbox in the app will only be visible if the environment selected has jupyterlab installed.

YCRC conda environment

ycrc_conda_env.list has been replaced by ycrc_conda_env.sh. To update your conda enviroments in OOD for the Jupyter App and RStudio Desktop (with Conda R), please run ycrc_conda_env.sh update.

Software Highlights

miniconda/4.12.0 is now available on all clusters
RStudio/2022.02.3-492 is now available on all clusters. This is currently the only version that is compatible with the graphic engine used by R/4.2.0-foss-2020b.
fmriprep/21.0.2 is now available on Milgram.
cellranger/7.0.0 is now available on Farnam.

Published: July 01, 2022

Milgram Maintenance

June 7-8, 2022

Software Updates

Security updates
Slurm updated to 21.08.8-2
NVIDIA drivers updated to 515.43.04
Singularity replaced by Apptainer version 1.0.2 (note: the "singularity" command will still work as expected)
Lmod updated to 8.7
Open OnDemand updated to 2.0.23

Hardware Updates

The hostnames of the compute nodes on Milgram were changed to bring them in line with YCRC naming conventions.

Changes to non-interactive sessions

Non-interactive sessions (e.g. file transfers, commands sent over ssh) no longer load the standard cluster environment to alleviate performance issues due to unnecessary module loads. Please contact us if this change affects your workflow so we can resolve the issue or provide a workaround.

Published: June 08, 2022

June 2022

Announcements

Farnam Decommission & McCleary Announcement

After more than six years in service, we will be retiring the Farnam HPC cluster later this year. Farnam will be replaced with a new HPC cluster, McCleary. The McCleary HPC cluster will be Yale's first direct-to-chip liquid cooled cluster, moving the YCRC and the Yale research computing community into a more environmentally friendly future. For more information about the decommission process and the launch of McCleary, see our website.

RStudio (with module R) has been retired from Open OnDemand as of June 1st

Please switch to RStudio Server which provides a better user experience. For users using a conda environment with RStudio, RStudio (with Conda R) will continue to be served on Open OnDemand.

Milgram Maintenance

The biannual scheduled maintenance for the Milgram cluster will be occurring June 7-9. During this time, the cluster will be unavailable. See the Milgram maintenance email announcements for more details.

Software Highlights

QTLtools/1.3.1-foss-2020b is now available on Farnam.
R/4.2.0-foss-2020b is available on all clusters.
Seurat for R/4.2.0 is now available on all clusters through the R-bundle-Bioconductor/3.15-foss-2020b-R-4.2.0 module along with many other packages. Please check to see if any packages you need are available in these modules before running install.packages.

Published: June 01, 2022

Ruddle Maintenance

May 2, 2022

Software Updates

Security updates
Slurm updated to 21.08.7
Singularity replaced by Apptainer version 1.0.1 (note: the "singularity" command will still work as expected)
Lmod updated to 8.7

Changes to non-interactive sessions

Non-interactive sessions (e.g. file transfers, commands sent over ssh) no longer load the standard cluster environment to alleviate performance issues due to unnecessary module loads. Please contact us if this change affects your workflow so we can resolve the issue or provide a workaround.

Published: May 01, 2022

May 2022

Announcements

Ruddle Maintenance

The biannual scheduled maintenance for the Ruddle cluster will be occurring May 3-5. During this time, the cluster will be unavailable. See the Ruddle maintenance email announcements for more details.

Remote Visualization with Hardware Acceleration

VirtualGL is installed on all GPU nodes on Grace, Farnam, and Milgram to provide hardware accelerated 3D rendering. Instructions on how to use VirtualGL to accelerate your 3D applications can be found at https://docs.ycrc.yale.edu/clusters-at-yale/guides/virtualgl/.

Software Highlights

Singularity is now called "Apptainer". Singularity is officially named “Apptainer” as part of its move to the Linux Foundation. The new command apptainer works as drop-in replacement for singularity. However, the previous singularity command will also continue to work for the foreseeable future so no change is needed. The upgrade to Apptainer is on Grace, Farnam and Ruddle (as of the maintenance completion). Milgram will be upgraded to Apptainer during the June maintenance.
Slurm has been upgraded to version 21.08.6 on Grace
MATLAB/2022a is available on all clusters

Published: May 01, 2022

Farnam Maintenance

April 4-7, 2022

Software Updates

Security updates
Slurm updated to 21.08.6
NVIDIA drivers updated to 510.47.03 (note: driver for NVIDIA K80 GPUs was upgraded to 470.103.01)
Singularity replaced by Apptainer version 1.0.1 (note: the "singularity" command will still work as expected)
Open OnDemand updated to 2.0.20

Hardware Updates

Four new nodes with 4 NVIDIA GTX3090 GPUs each have been added

Changes to the `bigmem` Partition

Jobs requesting less than 120G of memory are no longer allowed in the "bigmem" partition. Please submit these jobs to the general or scavenge partitions instead.

Changes to non-interactive sessions

Non-interactive sessions (e.g. file transfers, commands sent over ssh) no longer load the standard cluster environment to alleviate performance issues due to unnecessary module loads. Please contact us if this change affects your workflow so we can resolve the issue or provide a workaround.

Published: April 07, 2022

April 2022

Announcements

Updates to R on Open OnDemand

RStudio Server is out of beta! With the deprecation of R 3.x (see below), we will be removing RStudio Desktop with module R from Open OnDemand on June 1st.

Improvements to R install.packages Paths

Starting with the R 4.1.0 software module, we now automatically set an environment variable (R_LIBS_USER) which directs these packages to be stored in your project space. This will helps ensure that packages are not limited by home-space quotas and that packages installed for different versions of R are properly separated from each other. Previously installed packages should still be available and there should be no disruption from the change.

Instructions for Running a MySQL Server on the Clusters

Occasionally it could be useful for a user to run their own MySQL database server on one of the clusters. Until now, that has not been possible, but recently we found a way using singularity. Instructions may be found in our new MySQL guide.

Software Highlights

R 3.x modules have been deprecated on all clusters and are no longer supported. If you need to continue to use an older version of R, look at our R conda documentation.
R/4.1.0-foss-2020b is now available on all clusters.
Seurat/4.1.0-foss-2020b-R-4.1.0 (for using the Seurat R package) is now available on all clusters.

Published: April 01, 2022

March 2022

Announcements

Snapshots

Snapshots are now available on all clusters for home and project spaces. Snapshots enable self-service restoration of modified or deleted files for at least 2 days in the past. See our User Documentation for more details on availability and instructions.

OOD File Browser Tip: Shortcuts

You can add shortcuts to your favorite paths in the OOD File Browser. See our OOD documentation for instructions on setting up shortcuts.

Software Highlights

R/4.1.0-foss-2020b is now on Grace.
GCC/11.2.0 is now on Grace.

Published: March 01, 2022

Grace Maintenance

February 3-6, 2022

Software Updates

Latest security patches applied
Slurm updated to version 21.08.5
NVIDIA driver updated to version 510.39.01 (except for nodes with K80 GPUs which are stranded at 470.82.01)
Singularity updated to version 3.8.5
Open OnDemand updated to version 2.0.20

Hardware Updates

Changes have been made to networking to improve performance of certain older compute nodes

Changes to Grace Home Directories

During the maintenance, all home directories on Grace have been moved to our new all-flash storage filesystem, Palmer. The move is in anticipation of the decommissioning of Loomis at the end of the year and will provide a robust login experience by protecting home directory interactions from data intensive compute jobs.

Due to this migration, your home directory path has changed from /gpfs/loomis/home.grace/<netid> to /vast/palmer/home.grace/<netid>. Your home directory can always be referenced in bash and submission scripts and from the command line with the $HOME variable. Please update any scripts and workflows accordingly.

Interactive Jobs

We have added an additional way to request an interactive job. The Slurm command salloc can be used to start an interactive job similar to srun --pty bash. In addition to being a simpler command (no --pty bash is needed), salloc jobs can be used to interactively test mpirun executables.

Palmer scratch

Palmer is out of beta! We have fixed the issue with Plink on Palmer, so now you can use Palmer scratch for any workloads. See https://docs.ycrc.yale.edu/data/hpc-storage#60-day-scratch for more information on Palmer scratch.

Published: February 06, 2022

February 2022

Announcements

Grace Maintenance

The biannual scheduled maintenance for the Grace cluster will be occurring February 1-3. During this time, the cluster will be unavailable. See the Grace maintenance email announcement for more details.

Data Transfers

For non-Milgram users doing data transfers, transfers should not be performed on the login nodes. We have a few alternative ways to get better networking and reduce the impact on the clusters’ login nodes:

Dedicated transfer node. Each cluster has a dedicated transfer node, transfer-<cluster>.hpc.yale.edu. You can ssh directly to this node and run commands.
“transfer” Slurm partition. This is a small partition managed by the scheduler for doing data transfer. You can submit jobs to it using srun/sbatch -p transfer … *For recurring or periodic data transfers (such as using cron), please use Slurm’s scrontab to schedule jobs that run on the transfer partition instead.
Globus. For robust transfers of larger amount of data, see our Globus documentation.

More info about data transfers can be found in our Data Transfer documentation.

Software Highlights

Rclone is now installed on all nodes and loading the module is no longer necessary.
MATLAB/2021b is now on all clusters.
Julia/1.7.1-linux-x86_64 is now on all clusters.
Mathematica/13.0.0 is now on Grace.
QuantumESPRESSO/6.8-intel-2020b and QuantumESPRESSO/7.0-intel-2020b are now on Grace.
Mathematica documentation has been updated with regards to configuring parallel jobs.

Published: February 01, 2022

1 2 3 4 5 6 7 8 9 10 11 12

Last update: June 9, 2022