News
November 2024
Announcements
YCRC Welcomes Ruth Marinshaw as the New Executive Director
YCRC Team is delighted to welcome Ruth Marinshaw as our new Executive Director. Ruth joined Yale this week to serve as the university's primary technologist to support the computing needs of Yale's research community. She will work with colleagues across campus to implement and operate computational technologies that support Yale faculty, students, and staff in conducting cutting-edge research.
Ruth brings to Yale over twenty years of experience leading technology and research services in higher education. Ruth had held multiple positions at The University of North Carolina Chapel Hill, overseeing research computing services, staff, and systems and creating new service partnerships across the university. Over the last twelve years, Ruth served as chief technology officer for research computing at Stanford University. Under Ruth's leadership, Stanford's research computing and cyberinfrastructure services, systems, facilities, and support have grown significantly. She developed a team of research computing professionals and forged critical partnerships across the institution. Ruth was pivotal in establishing an NVIDIA SuperPOD - a data science and AI-focused research instrument envisioned by Stanford’s faculty.
Throughout her career, Ruth has contributed expertise to national conversations on research computing and currently serves as co-chair of the National Science Foundation's Advisory Committee for Cyberinfrastructure.
With exciting initiatives on the horizon, we look forward to Ruth's leadership and guidance in supporting Yale's research community's current and emerging needs. Welcome aboard, Ruth!
Bouchet Beta Testing for Tightly Coupled Parallel Workflows
The YCRC’s first installation at Massachusetts Green High Performance Computing Center will be the HPC cluster Bouchet*.
The first installation of nodes, approximately 4,000 direct-liquid-cooled cores, will be dedicated to tightly coupled parallel workflows, such as those run in the “mpi” partition on the Grace cluster.
We would like to invite you to participate in Bouchet beta testing. We are seeking tightly coupled, parallel workloads only to participate in this phase of development. If you have a suitable parallel workload and would like to participate in testing, please complete the Bouchet Beta Request Form and we will contact you with additional information about accessing and using Bouchet.
Following the beta testing (early 2025), we will be acquiring and installing thousands of general purpose compute cores as well as GPU-enabled compute nodes. At that time Bouchet will become available to all Yale researchers for computational work involving low-risk data. Stay tuned for more information on availability of the cluster.
*The cluster named after Edward Bouchet (1852-1918) who earned a PhD in physics at Yale University in 1876, making him the first self-identified African American to earn a doctorate from an American university.
October 2024
Announcements
Announcing Bouchet
The Bouchet HPC cluster, YCRC's first installation at MGHPCC, will be in beta in Fall 2024. The first installation of nodes, approximately 4,000 direct-liquid-cooled cores, will be dedicated to tightly coupled parallel workflows, such as those run in the “mpi” partition on the Grace cluster. Later this year, we will acquire and install a large number of general-purpose compute nodes and GPU-enabled compute nodes. At that point, Bouchet will be available to all Yale researchers for computational work involving low-risk data. Visit the Bouchet page for more information and updates.
Jobstats on the Web
In our quest to provide detailed information about job performance and efficiency, we have recently enhanced the web-based jobstats portal to show summary statistics and plots of CPU, Memory, GPU, and GPU memory usage over time.
These plots are helpful diagnostics for understanding why jobs fail or how to more efficiently request resources. These plots and statistics are available for in-progress jobs, a great way to keep track of performance while jobs are still running. This tool is part of the User Portal which can be accessed via Open OnDemand for each cluster:
Software Highlights
- GROMACS/2023.3-foss-2022b-CUDA-12.1.1-PLUMED-2.9.2 is now available on Grace and McCleary
September 2024
Announcements
The YCRC is Hiring!
The YCRC is looking to add permanent members to our Research Support team. If helping others use the clusters and learning about other work done at the YCRC interests you, consider joining the YCRC! If you have any questions about the position, contact Kaylea Nelson (kaylea.nelson@yale.edu).
https://research.computing.yale.edu/about/careers
Clarity Access
Yale is launching the Clarity platform. In its initial phase, Clarity offers an AI chatbot powered by OpenAI’s ChatGPT-4o. Importantly, Clarity provides a “walled-off” environment; its use is limited to Yale faculty, students, and staff, and information entered into its chatbot is not saved or used to train external AI models. Clarity is appropriate for use with all data types, including [high-risk data](https://your.yale.edu/policies-procedures/policies/1604-data-classification-policy, provided that all security standards are observed. Its chatbot is capable of content creation, coding assistance, data and image analysis, text-to-speech, and more. Over time, the platform may expand to incorporate additional AI tools, including other large language models. Clarity is designed to evolve as generative AI develops and the community offers feedback.
Before using the Clarity AI chatbot, please review training resources and guidance on appropriate use.
Job Performance Monitoring
We have recently deployed a new tool for measuring and monitoring job performance called jobstats
. Available on all clusters, jobstats
provides a report of the utilization of CPU, Memory, and GPU resources for in-progress and recently completed jobs. To generate the report simply run (replacing the ID number of the job in question):
[ab123@grace ~]$ jobstats 123456789
======================================================================
Slurm Job Statistics
======================================================================
Job ID: 123456789
NetID/Account: ab123/agroup
Job Name: my_job
State: RUNNING
Nodes: 1
CPU Cores: 4
CPU Memory: 256GB (64GB per CPU-core)
QOS/Partition: normal/week
Cluster: grace
Start Time: Thu Sep 5, 2024 at 10:58 AM
Run Time: 1-06:43:41 (in progress)
Time Limit: 4-04:00:00
Overall Utilization
======================================================================
CPU utilization [||||||||||||| 26%]
CPU memory usage [|||| 8%]
Detailed Utilization
======================================================================
CPU utilization per node (CPU time used/run time)
r816u29n04: 1-07:48:36/5-02:54:45 (efficiency=25.9%)
CPU memory usage per node - used/allocated
r816u29n04: 19.9GB/256.0GB (5.0GB/64.0GB per core of 4)
Software Highlights
- R/4.4.1-foss-2022b is now available on all clusters
- R-bundle-Bioconductor/3.19-foss-2022b-R-4.4.1 (INCLUDES SEURAT) is now available on all clusters
Milgram Maintenance
August 20-22, 2024
Software Updates
- Slurm updated to 23.11.9
- NVIDIA drivers updated to 555.42.06
- Apptainer updated to 1.3.3
Hardware Updates
- We proactively replaced two core file servers, as part of our best practices for GPFS (our high-performance parallel filesystem). This does not change our storage capacity, but allows us to maintain existing services and access future upgrades.
August 2024
Announcements
Milgram Maintenance
The biannual scheduled maintenance for the Milgram cluster will be occurring August 20-22. During this time, the cluster will be unavailable. See the Milgram maintenance email announcements for more details.
YCRC HPC User Portal
We have recently deployed a web-based User Portal for Grace, McCleary, and Milgram clusters to help researchers view information about their activity on our clusters. Accessible under the “Utilities” tab on Open OnDemand (or at the links below), the portal currently features five pages with personalized data about your cluster usage and guidance on navigating and operating the clusters. Users can easily track their jobs and visualize their utilization through charts, tables, and graphs, helping to optimize their cluster use. To try out the User portal please visit either:
If you have any suggestions for useful pages for the User Portal, please email hpc@yale.edu
Yale Task Force on Artificial Intelligence
As recommended by the Yale Task Force on Artificial Intelligence, the YCRC will be adding a large number of new high-end GPUs to the clusters to meet growing demand for AI compute resources. Stay tuned for details and updates on availability in the coming months!
Research Support at PEARC24
During the week of July 22nd, the YCRC Research Support team attended the annual PEARC conference, a conference specifically focused on the research computing community. The team had the opportunity to meet with many of our peer institutions to discuss our common challenges and opportunities and attend sessions to learn about new solutions. We look forward to bringing some new ideas to the YCRC this year!
Software Highlights
- ORCA/6.0.0-gompi-2022b is now available on Grace and McCleary
July 2024
Announcements
Compute Charges Rate Freeze
The compute charging model for the YCRC clusters is currently under review. As a result, we are freezing the per-CPU-hour charge at its current value of $0.0025, effective immediately. For more information on the compute charging model, please see the Billing for HPC services page on the YCRC website.
MATLAB Proxy Server
"MATLAB (Web)" is now available as an Open OnDemand app. A MATLAB session is connected directly to your web browser tab, rather than launched via a Remote Desktop session as with the traditional MATLAB app. This allows more of the requested resources to be dedicated to MATLAB itself. Page through the full App list in Open OnDemand to launch. (Note that this is a work in progress that might not yet have all the functionality of a regular MATLAB session.)
FairShare Weights Adjustment
Periodically we adjust the relative impact of resource allocations on a group’s FairShare (the way that the scheduler determines which job gets scheduled next). We have adjusted the “service unit” weights for memory and GPUs to better match their cost to acquire and maintain:
CPU: 1 SU
Memory: 0.067 (15G/SU)
A100 GPU: 100 SU
non-A100 GPU: 15 SU
For more information about FairShare and how we use it to ensure equity in scheduling, please visit our docs page.
Software Highlights
- SBGrid is available on McCleary. Contact us for more information on access.
Grace Maintenance
June 4-6, 2024
Software Updates
- Slurm updated to 23.11.7
- NVIDIA drivers updated to 555.42.02
- Apptainer updated to 1.3.1
Hardware Updates
- The remaining Broadwell generation nodes have been decommissioned.
- The
oldest
node constraint now returns Cascade Lake generation nodes. - The
devel
partition is now composed of 5 Cascade Lake generation 6240 nodes and 1 Skylake generation (same as the mpi partition) node. - The FDR InfiniBand fabric has been fully decommissioned, and networking has been updated across the Grace cluster.
- The Slayman storage system is no longer available from Grace (but remains accessible from McCleary).
June 2024
Announcements
Grace Maintenance
The biannual scheduled maintenance for the Grace cluster will be occurring Jun 4-6. During this time, the cluster will be unavailable. See the Grace maintenance email announcements for more details.
Compute Usage Monitoring in Web Portal
We have developed a suite of tools to enable research groups to monitor their combined utilization of cluster resources. We perform nightly queries of Slurm's database to aggregate cluster usage (in cpu_hours) broken down by user, account, and partition. These data are available both as a command-line utility (getusage) and a recently deployed web-application built into Open OnDemand. This can be accessed directly via:
Grace: ood-grace.ycrc.yale.edu/pun/sys/ycrc_getusage McCleary: ood-mccleary.ycrc.yale.edu/pun/sys/ycrc_getusage Milgram: ood-milgram.ycrc.yale.edu/pun/sys/ycrc_getusage
Additionally, the Getusage web-app can be accessed via the “Utilities” pull-down menu after logging into Open OnDemand.
NAIRR Resources for AI
Looking for compute resource for your AI or AI-enabled research? In the NAIRR Pilot, the US National Science Foundation (NSF), the US Department of Energy (DOE), and numerous other partners are providing access to a set of computing, model, platform and educational resources for projects related to advancing AI research. Applications for resources from the NAIRR pilot are lightweight and we are happy to assist with any questions you may have.
May 2024
Announcements
Yale Joins MGHPCC
We are excited to share that Yale University has recently become a member of the Massachusetts Green High Performance Computing Center (MGHPCC), a not-for-profit data center designed for computationally-intensive research. The construction of a dedicated space for Yale within the facility and the installation of high-speed networking between Yale's campus and MGHPCC are currently underway. The first High-Performance Computing (HPC) hardware installations are expected to take place later this year. As more information becomes available, we will keep our users updated
OneIT Conference
Earlier this year Yale ITS hosted the first in-person One IT conference, [“Advancing Collaborations: One IT as a Catalyst”](https://your.yale.edu/news/2024/04/conference-edition-capturing-spirit-one-it. IT and IT-adjacent personnel from across campus came together to discuss topics that impact research and university operations. YCRC team members participated in a variety of sessions ranging from Research Storage and Software to the role of AI in higher education.
Additionally, YCRC team members presented two posters. The first, [A Graphical Interface for Research and Education](https://image.s10.sfmc-content.com/lib/fe4515707564047b751572/m/1/83cc1a22-d9d3-498d-8a17-8088de496674.pdf, highlighted Open OnDemand and its barrier-reducing impact on courses and research alike. The second, [Globus: a platform for secure, efficient file transfer](https://image.s10.sfmc-content.com/lib/fe4515707564047b751572/m/1/52559eed-25ab-49c9-80be-6ce99fb95b25.pdf, demonstrated our successful deployment of Globus to improve data management and cross-institutional sharing of research materials.
Software Highlights
- Spark is now available on Grace, McCleary and Milgram
- Nextflow/22.10.6 is now available on Grace and McCleary
April 2024
Announcements
New Grace Nodes
We are pleased to announce the addition of 84 new direct-liquid-cooled compute nodes to the commons partitions (day and week) on Grace. These new nodes are of the Intel Icelake generation and have 48 cores each. These nodes also have increased RAM compared to other nodes on Grace, with 10GB per core. The day partition is now close to 11,000 cores and the week partition is now entirely composed of these nodes. A significant number of purchased nodes of similar design have also been added to respective private partitions and are available to all users via the scavenge partition.
Limited McCleary Maintenance - April 2nd
Due to the limited updates needed on McCleary at this time, the upcoming April maintenance will not be a full 3-day downtime, but rather a one-day maintenance with limited disruption. The McCleary cluster and storage will remain online and available throughout the maintenance period and there will be no disruption to running or pending batch jobs. However, certain services will be unavailable for short periods throughout the day. See maintenance announcement email for full details.
Cluster Node Status in Open OnDemand
A Cluster Node Status app is now available in the Open OnDemand web portal on all clusters. This new app presents information about CPU, GPU and memory utilization for each compute node with the cluster. The app can be found under ‘Utilities’ -> ‘Cluster Node Status’.
Retirement of Grace RHEL7 Apps Tree
As part of our routine deprecation of older software, we removed Grace's old application tree (from before the RedHat 8 upgrade) from the default Standard Environment on March 6th. After March 6th, the older module tree will no longer appear when module avail
is run and will fail to load. If you have concerns about missing software, please contact us at hpc@yale.edu.
Software Highlights
- R/4.3.2-foss-2022b is now available on Grace and McCleary
- Corresponding Bioconductor and CRAN bundles are also now available
- PyTorch/2.1.2-foss-2022b-CUDA-12.0.0 with CUDA is available on Grace and McCleary
March 2024
Announcements
CPU Usage Reporting with getusage
Researchers frequently wish to get a breakdown of their groups' cluster-usage. While Slurm provides tooling for querying the database, it is not particularly user-friendly. We have developed a tool, getusage
, which allows researchers to quickly get insight into their groups’ usage, broken down by date and user, including a monthly summary report. Please try this tool and let us know if there are any enhancement requests or ideas.
Changes to RStudio on the Web Portal
Visual Studio Code (VSCode) is a popular development tool that is widely used by our researchers. While there are several extensions that allow users to connect to remote servers over SSH, these are imperfect and often drops connection. Additionally, these remote sessions connect to the clusters' login nodes, where resources are limited.
To meet the growing demand for this particular development tool, we have deployed an application for Open OnDemand that launches VS Code Server directly on a compute node which can then be accessed via web-browser. This OOD application is called code_server
and is available on all clusters. For more information see [our OOD docs page](https://docs.ycrc.yale.edu/clusters-at-yale/access/ood/.
Retirement of Grace RHEL7 Apps Tree
As part of our routine deprecation of older software we removed Grace's old application tree (from before the RedHat 8 upgrade) from the default Standard Environment on March 1st. After March 1st, the older module tree will no longer appear when module avail
is run and will fail to load going forward. If you have concerns about any missing software, please contact us at hpc@yale.edu.
Software Highlights
- FSL/6.0.5.2-centos7_64 is now available on Milgram
- Nextflow/23.10.1 is now available on Grace & McCleary
Milgram Maintenance
Februrary 6-8, 2024docs/news/2024-02-milgram.md
Software Updates
- Slurm updated to 23.02.7
- NVIDIA drivers updated to 545.23.08
- Apptainer updated to 1.2.5
- Lmod updated to 8.7.32
Upgrade to Red Hat 8
As part of this maintenance, the operating system on Milgram has been upgraded to Red Hat 8.
Jobs submitted prior to maintenance that were held and now released will run under RHEL8 (instead of RHEL7). This may cause some jobs to not run properly, so we encourage you to check on your job output. Our docs page provides information on the RHEL8 upgrade, including fixes for common problems. Please notify hpc@yale.edu if you require assistance.
Changes to Interactive Partitions and Jobs
We have made the following changes to interactive jobs during the maintenance.
The 'interactive' and 'psych_interactive` partitions have been renamed to 'devel' and 'psych_devel', respectively, to bring Milgram in alignment with the other clusters. This change was made on other clusters in recognition that interactive-style jobs (such as OnDemand and 'salloc' jobs) are commonly run outside of the 'interactive' partition. Please adjust your workflows accordingly after the maintenance.
Additionally, all users are limited to 4 interactive app instances (of any type) at one time. Additional instances will be rejected until you delete older open instances. For OnDemand jobs, closing the window does not terminate the interactive app job. To terminate the job, click the "Delete" button in your "My Interactive Apps" page in the web portal.
Please visit the status page at research.computing.yale.edu/system-status for the latest updates. If you have questions, comments, or concerns, please contact us at hpc@yale.edu.
February 2024
Announcements
Milgram Maintenance
The biannual scheduled maintenance for the Milgram cluster will be occurring Feb 6-8. During this time, the cluster will be unavailable. See the Milgram maintenance email announcements for more details.
Changes to RStudio on the Web Portal
The “RStudio Server” app on the Open OnDemand web portal has been upgraded to support both the R software modules and R installed via conda. As such the “RStudio Desktop” app has been retired and removed from the web portal. If you still require RStudio Desktop, we provide instructions for running under the “Remote Desktop” app (please note that this is not a recommended practice for most users).
Software Highlights
- ChimeraX is now available as an app on the McCleary Open OnDemand web portal
- FSL/6.0.5.2-centos7_64 is now available on McCleary
January 2024
Announcements
Upcoming Milgram RHEL8 Upgrade
As Red Hat Enterprise Linux (RHEL) 7 approaches its end of life, we will be upgrading the Milgram cluster from RHEL7 to RHEL8 during the February maintenance window. This will bring Milgram in line with our other clusters and provide a number of key benefits: continued security patches and support beyond 2024, updated system libraries to better support modern software, improved node management system
We have set aside rhel8_devel and rhel8_day partitions for use in debugging and testing of workflows before the February maintenance. For more information on testing your workflows see our explainer.
Grace Maintenance
December 5-7, 2023
Software Updates
- Slurm updated to 23.02.6
- NVIDIA drivers updated to 545.23.08
- Lmod updated to 8.7.32
- Apptainer updated to 1.2.4
Multifactor Authentication (MFA)
Multi-Factor authentication via Duo is now required for ssh for all users on Grace after the maintenance. For most usage, this additional step is minimally invasive and makes our clusters much more secure. However, for users who use graphical transfer tools such as Cyberduck, please see our MFA transfer documentation.
Transfer Node Host Key Change
The ssh host key for Grace's transfer node was changed during the maintenance, which will result in a "WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!" error when you attempt to login. To access the cluster again, first remove the old host keys with the following command (if accessing the cluster via command line):
ssh-keygen -R transfer-grace.ycrc.yale.edu
If you are using a GUI, such as MobaXterm, you will need to manually edit your known host file and remove the list related to Grace.
For MobaXterm, this file is located (by default) in Documents/MobaXterm/home/.ssh
.
Then attempt a new login and accept the new host key.
December 2023
Announcements
Grace Maintenance - Multi-Factor Authentication
The biannual scheduled maintenance for the Grace cluster will be occurring Dec 5-7. During this time, the cluster will be unavailable. See the Grace maintenance email announcements for more details.
Multi-Factor authentication via Duo will be required for ssh for all users on Grace after the maintenance. For most usage, this additional step is minimally invasive and makes our clusters much more secure. However, for users who use graphical transfer tools such as Cyberduck, please see our MFA transfer documentation.
scavenge_gpu
and scavenge_mpi
In addition to the general purpose scavenge partition, we also have two resource specific scavenge partitions, scavenge_gpu
(Grace, McCleary) and scavenge_mpi
(Grace only). The scavenge_gpu
partition contains all GPU enabled nodes, commons and privately owned. Similarly, the scavenge_mpi
partition contains all nodes similar to the mpi
partition. Both partitions have higher priority for their respective nodes than normal scavenge (meaning jobs submitted to scavenge_gpu
or scavenge_mpi
will preempt normal scavenge jobs). All scavenge partitions are exempt from CPU charges.
Software Highlights
- IMOD/4.12.56_RHEL7-64_CUDA12.1 is now available on McCleary and Grace
November 2023
Announcements
Globus Available on Milgram
Globus is now available to move data in and out from Milgram. For increased security, Globus only has access to a staging directory (/gpfs/milgram/globus/$NETID
) where you can temporarily store data. Please see our documentation page for more information and reach out to hpc@yale.edu if you have any questions.
RStudio Server Updates
RStudio Server on the OpenDemand web portal for all clusters now starts an R session in a clean environment and will not save the session when you finish. If you want to save your session and reuse it next time, please select the checkbox "Start R from your last saved session".
McCleary Maintenance
October 3-5, 2023_
Software Updates
- Slurm updated to 23.02.5
- NVIDIA drivers updated to 535.104.12
- Lmod updated to 8.7.30
- Apptainer updated to 1.2.3
- System Python updated to 3.11
October 2023
Announcements
McCleary Maintenance
The biannual scheduled maintenance for the McCleary cluster will be occurring Oct 3-5. During this time, the cluster will be unavailable. See the McCleary maintenance email announcements for more details.
Interactive jobs on day
on McCleary
Interactive jobs are now allowed to be run on the day
partition on McCleary. Note you are still limited to 4 interactive-style jobs of any kind (salloc or OpenOnDemand) at one time. Additional instances will be rejected until you delete older open instances. For OnDemand jobs, closing the window does not terminate the interactive app job. To terminate the job, click the "Delete" button in your "My Interactive Apps" page in the web portal.
"Papermill" for Jupyter Command-Line Execution
Many scientific workflows start as interactive Jupyter notebooks, and our Open OnDemand portal has dramatically simplified deploying these notebooks on cluster resources. However, the step from running notebooks interactively to running jobs as a batch script can be challenging and is often a barrier to migrating to using sbatch
to run workflows non-interactively.
To help solve this problem, there are a handful of utilities that can execute a notebook as if you were manually hitting "shift-Enter" for each cell. Of note is Papermill which provides a powerful set of tools to bridge between interactive and batch-mode computing.
To get started, install papermill into your Conda environments:
module load miniconda
conda install papermill
Then you can simply evaluate a notebook, preserving figures and output inside the notebook, like this:
papermill /path/to/notebook.ipynb
This can be run inside a batch job that might look like this:
#!/bin/bash
#SBATCH -p day
#SBATCH -c 1
#SBATCH -t 6:00:00
module purge miniconda
conda activate my_env
papermill /path/to/notebook.ipynb
Variables can also be parameterized and passed in as command-line options so that you can run multiple copies simultaneously with different input variables. For more information see the [Papermill docs pages](https://papermill.readthedocs.io/.
September 2023
Announcements
Grace RHEL8 Upgrade
As Red Hat Enterprise Linux (RHEL) 7 approaches its end of life, we upgraded the Grace cluster from RHEL7 to RHEL8 during the August maintenance window. This brings Grace in line with McCleary and provide a number of key benefits:
- continued security patches and support beyond 2023
- updated system libraries to better support modern software
- improved node management system to facilitate the growing number of nodes on Grace
- shared application tree between McCleary and Grace, which brings software parity between clusters
There are a small number of compute nodes in the legacy
partition with the old RHEL7 operating system installed for workloads that still need to be migrated. We expect to retire this partition during the Grace December 2023 maintenance. Please contact us if you need help upgrading to RHEL8 in the coming months.
Grace Old Software Deprecation
The RHEL7 application module tree (/gpfs/loomis/apps/avx
) is now deprecated and will be removed from the default module environment during the Grace December maintenance. The software will still be available on Grace, but YCRC will no longer provide support for those old packages after December. If you are using a software package in that tree that is not yet installed into the new shared module tree, please let us know as soon as possible so we can help avoid any disruptions.
Software Highlights
-
intel/2022b toolchain is now available on Grace and McCleary
- MKL 2022.2.1
- Intel MPI 2022.2.1
- Intel Compilers 2022.2.1
-
foss/2022b toolchain is now available on Grace and McCleary
- FFTW 3.3.10
- ScaLAPACK 2.2.0
- OpenMPI 4.1.4
- GCC 12.2.0
Milgram Maintenance
August 22, 2023_
Software Updates
- Slurm updated to 22.05.9
- NVIDIA drivers updated to 535.86.10
- Apptainer updated to 1.2.42
- Open OnDemand updated to 2.0.32
Multi-Factor Authentication
Multi-factor authentication is now required for ssh for all users on Milgram. For most usage, this additional step is minimally invasive and makes our clusters much more secure. However, for users who use graphical transfer tools such as Cyberduck, please see our MFA transfer documentation.
Grace Maintenance
August 15-17, 2023
Software Updates
- Red Hat Enterprise Linux (RHEL) updated to 8.8
- Slurm updated to 22.05.9
- NVIDIA drivers updated to 535.86.10
- Apptainer updated to 1.2.2
- Open OnDemand updated to 2.0.32
Upgrade to Red Hat 8
As part of this maintenance, the operating system on Grace has been upgraded to Red Hat 8. A new unified software tree that is shared with the McCleary cluster has been created.
The ssh host keys for Grace's login nodes were changed during the maintenance, which will result in a "WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!" error when you attempt to login. To access the cluster again, first remove the old host keys with the following command (if accessing the cluster via command line):
ssh-keygen -R grace.hpc.yale.edu
If you are using a GUI, such as MobaXterm, you will need to manually edit your known host file and remove the list related to Grace.
For MobaXterm, this file is located (by default) in Documents/MobaXterm/home/.ssh
.
Then attempt a new login and accept the new host key.
New Open OnDemand (Web Portal) URL
The new URL for the Grace Open OnDemand web portal is https://ood-grace.ycrc.yale.edu.
August 2023
Announcements
Ruddle Farewell: July 24, 2023
On the occasion of decommissioning the Ruddle cluster on July 24, the Yale Center for Genome Analysis (YCGA) and the Yale Center for Research Computing (YCRC) would like to acknowledge the profound impact Ruddle has had on computing at Yale. Ruddle provided the compute resources for YCGA's high throughput sequencing and supported genomic computing for hundreds of research groups at YSM and across the University. In February 2016, Ruddle replaced the previous biomedical cluster BulldogN. Since then, it has run more than 24 million user jobs comprising more than 73 million compute hours.
Funding for Ruddle came from NIH grant 1S10OD018521-01, with Shrikant Mane as PI. Ruddle is replaced by a dedicated partition and storage on the new McCleary cluster, which were funded by NIH grant 1S10OD030363-01A1, also awarded to Dr. Mane.
Upcoming Grace Maintenance: August 15-17, 2023
Scheduled maintenance will be performed on the Grace cluster starting on Tuesday, August 15, 2023, at 8:00 am. Maintenance is expected to be completed by the end of day, Thursday, August 17, 2023.
Upcoming Milgram Maintenance: August 22-24, 2023
Scheduled maintenance will be performed on the Milgram cluster starting on Tuesday, August 22, 2023, at 8:00 am. Maintenance is expected to be completed by the end of day, Thursday, August 24, 2023.
Grace Operating System Upgrade
As Red Hat Enterprise Linux (RHEL) 7 approaches its end of life, we will be upgrading the Grace cluster from RHEL7 to RHEL8 during the August maintenance window. This will bring Grace in line with McCleary and provide a number of key benefits:
- continued security patches and support beyond 2023
- updated system libraries to better support modern software
- improved node management system to facilitate the growing number of nodes on Grace
- shared application tree between McCleary and Grace, which brings software parity between clusters
Three test partitions are available (rhel8_day
, rhel8_gpu
, and rhel8_mpi
) for use in debugging workflows before the upgrade. These partitions should be accessed from the rhel8_login
node.
Software Highlights
- Julia/1.9.2-linux-x86_64 available on Grace
- Kraken2/2.1.3-gompi-2020b available on McCleary
- QuantumESPRESSO/7.0-intel-2020b available on Grace
July 2023
Announcements
Red Hat 8 Test partitions on Grace
As Red Hat Enterprise Linux (RHEL) 7 approaches its end of life, we will be upgrading the Grace cluster to RHEL8 during the August 15th-17th maintenance. This will bring Grace in line with McCleary and provide a number of key benefits:
- continued security patches and support beyond 2023
- updated system libraries to better support modern software
- improved node management system to facilitate the growing number of nodes on Grace
- shared application tree between McCleary and Grace, which brings software parity between clusters
While we have performed extensive testing, both internally and with the new McCleary cluster, we recognize that there are large numbers of custom workflows on Grace that may need to be modified to work with the new operating system.
Please note: To enable debugging and testing of workflows ahead of the scheduled maintenance, we have set aside rhel8_day
, rhel8_gpu
, and rhel8_mpi
partitions. You should access them from the rhel8_login
node.
Two-factor Authentication for McCleary
To assure the security of the cluster and associated services, we have implemented two-factor authentication on the McCleary cluster. To simplify the transition, we have collected a set of best-practices and configurations of many of the commonly used access tools, including CyberDuck, MobaXTerm, and WinSCPon, which you can access on our docs page. If you are using other tools and experiencing issues, please contact us for assistance.
New GPU Nodes on McCleary and Grace
We have installed new GPU nodes for McCleary and Grace, dramatically increasing the number of GPUs available on both clusters. McCleary has 14 new nodes (56 GPUs) added to the gpu partition and six nodes (24 GPUs) added to pi_cryoem
. Grace has 12 new nodes, available in the rhel8_gpu
partition. Each of the new nodes contains 4 NVIDIA A5000 GPUs, with 24GB of on-board VRAM and PCIe4 connection to improve data-transport time.
Software Highlights
- MATLAB/2023a available on all clusters
- Beast/2.7.4-GCC-12.2.0 available on McCleary
- AFNI/2023.1.07-foss-2020b available on McCleary
- FSL 6.0.5.1 (CPU-only and GPU-enabled versions) available on McCleary
June 2023
Announcements
McCleary Officially Launches
Today marks the official beginning of the McCleary cluster’s service. In addition to compute nodes migrated from Farnam and Ruddle, McCleary features our first set of direct-to-chip liquid cooled (DLC) nodes, moving YCRC into a more environmentally friendly future. McCleary is significantly larger than the Farnam and Ruddle clusters combined. The new DLC compute nodes are able to run faster and with higher CPU density due to their superior cooling system.
McCleary is named for Beatrix McCleary Hamburg, who received her medical degree in 1948 and was the first female African American graduate of Yale School of Medicine.
Farnam Farewell: June 1, 2023
On the occasion of decommissioning the Farnam cluster on June 1, YCRC would like to acknowledge the profound impact Farnam has had on computing at Yale. Farnam supported biomedical computing at YSM and across the University providing compute resources to hundreds of research groups. Farnam replaced the previous biomedical cluster Louise, and began production in October 2016. Since then, it has run user jobs comprising more than 139 million compute hours. Farnam is replaced by the new cluster McCleary.
Please note: Read-only access to Farnam’s storage system (/gpfs/ysm) will be available on McCleary until July 13, 2023. For more information see McCleary transfer documentation.
Ruddle Decommission: July 1, 2023
The Ruddle cluster will be decommissioned and access will be disabled July 1, 2023. We will be migrating project and sequencing directories from Ruddle to McCleary.
Please note: Users are responsible for moving home and scratch data to McCleary prior to July 1, 2023. For more information and instructions, see our McCleary transfer documentation.
Software Highlights
- R/4.3.0-foss-2020b+ available on all clusters. The newest version of R is now available on Grace, McCleary, and Milgram. This updates nearly 1000 packages and can be used in batch jobs and in RStudio sessions via Open OnDemand.
- AlphaFold/2.3.2-foss-2020b-CUDA-11.3.1 The latest version of AlphaFold (2.3.2, released in April) has been installed on McCleary and is ready for use. This version fixes a number of bugs and should improve GPU memory usage enabling longer proteins to be studied.
- LAMMPS/23Jun2022-foss-2020b-kokkos available on McCleary
- RevBayes/1.2.1-GCC-10.2.0 available on McCleary
- Spark 3.1.1 (CPU-only and GPU-enabled versions) available on McCleary
Upcoming Maintenances
-
The McCleary cluster will be unavailable from 9am-1pm on Tuesday May 30 while maintenance is performed on the YCGA storage.
-
The Milgram, Grace and McCleary clusters will not be available from 2pm on Monday June 19 until 10am on Wednesday June 21, due to electrical work being performed in the HPC data center. No changes will be made that impact users of the clusters.
-
The regular Grace maintenance that had been scheduled for June 6-8 will be performed on August 15-17. This change is being made in preparation for the upgrade to RHEL 8 on Grace.
May 2023
Announcements
Farnam Decommission: June 1, 2023
After many years of supporting productive science, the Farnam cluster will be decommissioned this summer as we transition to the newly deployed McCleary cluster. Logins will be disabled June 1, 2023, which will mark the official end of Farnam’s service. Read-only access to Farnam’s storage system (/gpfs/ysm) will be available on McCleary until July 13, 2023. All data on YSM (that you want to keep) will need to be transferred off YSM, either to non-HPC storage or to McCleary project space by you prior to YSM’s retirement.
Ruddle Decommission: July 1, 2023
After many years of serving YCGA, the Ruddle cluster will also be decommissioned this summer as we transition to the newly deployed McCleary cluster. Logins will be disabled July 1, 2023, which will mark the official end of Ruddle’s service. We will be migrating project and sequencing directories from Ruddle to McCleary. However, you are responsible for moving home and scratch data to McCleary before July 1, 2023.
Please begin to migrate your data and workloads to McCleary at your earliest convenience and reach out with any questions.
McCleary Transition Reminder
With our McCleary cluster now in a production stable state, we ask all Farnam users to ensure all home, project and scratch data the group wishes to keep is migrated to the new cluster ahead of the June 1st decommission. As June 1st is the formal retirement of Farnam, compute service charges on McCleary commons partitions will begin at this time. Ruddle users will have until July 1st to access the Ruddle and migrate their home and scratch data as needed. Ruddle users will NOT need to migrate their project directories; those will be automatically transferred to McCleary. As previously established on Ruddle, all jobs in the YCGA partitions will be exempt from compute service charges on the new cluster. For more information visit our McCleary Transition documentation.
Software Highlights
- Libmamba solver for conda 23.1.0+ available on all clusters. Conda installations 23.1.0 and newer are now configured to use the faster environment solving algorithm developed by
mamba
by default. You can simply useconda install
and enjoy the significantly faster solve times. - GSEA available in McCleary and Ruddle OOD. Gene Set Enrichment Analysis (GSEA) is now available in McCleary OOD and Ruddle OOD for all users. You can access it by clicking “Interactive Apps'' and then selecting “GSEA”. GSEA is a popular computational method to do functional analysis of multi omics data. Data files for GSEA are not centrally stored on the clusters, so you will need to download them from the GSEA website by yourself.
- NAG/29-GCCcore-11.2.0 available on Grace
- AFNI/2023.1.01-foss-2020b-Python-3.8.6 on McCleary
April 2023
Announcements
McCleary in Production Status
During March, we have been adding nodes to McCleary, including large memory nodes (4 TiB), GPU nodes and migrating most of the commons nodes from Farnam to McCleary (that are not being retired). Moreover, we have finalized the setup of McCleary and the system is now production stable. Please feel comfortable to migrate your data and workloads from Farnam and Ruddle to McCleary at your earliest convenience.
New YCGA Nodes Online on McCleary
McCleary now has over 3000 new cores dedicated to YCGA work! We encourage you to test your workloads and prepare to migrate from Ruddle to McCleary at your earliest convenience. More information can be found here.
Software Highlights
- QuantumESPRESSO/7.1-intel-2020b available on Grace
- RELION/4.0.1 available on McCleary
- miniconda/23.1.0 available on all clusters
- scikit-learn/0.23.2-foss-2020b on Grace and McCleary
- seff-array updated to 0.4 on Grace, McCleary and Milgram
March 2023
Announcements
McCleary Now Available
The new McCleary HPC cluster is now available for active Farnam and Ruddle users–all other researchers who conduct life sciences research can request an account using our Account Request form. Farnam and Ruddle will be retired in mid-2023 so we encourage all users on those clusters to transition their work to McCleary at your earliest convenience. If you see any issues on the new cluster or have any questions, please let us know at hpc@yale.edu.
Open OnDemand VSCode Available Everywhere
A new OOD app code-server is now available on all YCRC clusters, including Milgram and McCleary. The new code-server allows you to run VSCode in a browser on a compute node. All users who have been running VSCode on a login node via the ssh extension should switch to code-server at their earliest convenience. Unlike VSCode on the login node, the new app also enables you to use GPUs, to allocate large memory nodes, and to specify a private partition (if applicable) The app is still in beta version and your feedback is much appreciated.
Software Highlights
- GPU-enabled LAMMPS (LAMMPS/23Jun2022-foss-2020b-kokkos-CUDA-11.3.1) is now available on Grace.
- AlphaFold/2.3.1-fosscuda-2020b is now available on Farnam and McCleary.
Milgram Maintenance
February 7, 2023
Software Updates
- Slurm updated to 22.05.7
- NVIDIA drivers updated to 525.60.13
- Apptainer updated to 1.1.4
- Open OnDemand updated to 2.0.29
Hardware Updates
- Milgram’s network was restructured to reduce latency, and improve resiliency.
February 2023
Announcements
Milgram Maintenance
The biannual scheduled maintenance for the Milgram cluster will be occurring Feb 7-9. During this time, the cluster will be unavailable. See the Milgram maintenance email announcements for more details.
McCleary Launch
The YCRC is pleased to announce the launch of the new McCleary HPC cluster. The McCleary HPC cluster will be Yale's first direct-to-chip liquid cooled cluster, moving the YCRC and the Yale research computing community into a more environmentally friendly future. McCleary will be available in a “beta” phase to Farnam and Ruddle users later on this month. Keep an eye on your email for further announcements about McCleary’s availability.
January 2023
Announcements
Open OnDemand VSCode
A new OOD app code-server is now available on all clusters, except Milgram (coming in Feb). Code-server allows you to run VSCode in a browser on a compute node. All users who have been running VSCode on a login node via the ssh extension should switch to code-server immediately. The app allows you to use GPUs, to allocate large memories, and to specify a private partition (if you have the access), things you won’t be able to do if you run VSCode on a login node. The app is still in beta version and your feedback is much appreciated.
Milgram Transfer Node
Milgram now has a node dedicated to data transfers to and from the cluster. To access the node from within Milgram, run ssh transfer
from the login node. To upload or download data from Milgram via the transfer node, use the hostname transfer-milgram.hpc.yale.edu
(must be on VPN). More information can be found in our Transfer Data documentation.
With the addition of the new transfer node, we ask that the login nodes are no longer used for data transfers to limit impact on regular login activities.
Grace Maintenance
December 6-8, 2022
Software Updates
- Slurm updated to 22.05.6
- NVIDIA drivers updated to 520.61.05
- Apptainer updated to 1.1.3
- Open OnDemand updated to 2.0.28
Hardware Updates
- Roughly 2 racks worth of equipment were moved to upgrade the effective InfiniBand connection speeds of several compute nodes (from 56 to 100 Gbps)
- The InfiniBand network was modified to increase capacity and allow for additional growth
- Some parts of the regular network were improved to shorten network paths and increase shared-uplink bandwidth
Loomis Decommission
The Loomis GPFS filesystem has been retired and unmounted from Grace, Farnam, and Ruddle. For additional information please see the Loomis Decommission page.
December 2022
Announcements
Grace & Gibbs Maintenance
The biannual scheduled maintenance for the Grace cluster will be occurring December 6-8. During this time, the cluster will be unavailable. Additionally, the Gibbs filesystem will be unavailable on Farnam and Ruddle on Tuesday, December 6th to deploy a critical firmware upgrade. See the maintenance email announcements for more details.
Loomis Decommission
The Loomis GPFS filesystem will be retired and unmounted from Grace and Farnam during the Grace December maintenance starting on December 6th. All data except for a few remaining private filesets have already been transferred to other systems (e.g., current software, home, scratch to Palmer and project to Gibbs). The remaining private filesets are being transferred to Gibbs in advance of the maintenance and owners should have received communications accordingly. The only potential user impact of the retirement is on anyone using the older, deprecated software trees. Otherwise, the Loomis retirement should have no user impact but please reach out if you have any concerns or believe you are still using data located on Loomis. See the Loomis Decommission documentation for more information.
Apptainer Upgrade on Grace and Ruddle
The newest version of Apptainer (v1.1, available now on Ruddle and, after December maintenance, on Grace) comes the ability to create containers without needing elevated privileges (i.e. sudo
access). This greatly simplifies the container workflow as you no longer need a separate system to build a container from a definition file. You can simply create a definition file and run the build command.
For example, to create a simple toy container from this def file (lolcow.def
):
BootStrap: docker
From: ubuntu:20.04
%post
apt-get -y update
apt-get -y install cowsay lolcat
%environment
export LC_ALL=C
export PATH=/usr/games:$PATH
%runscript
date | cowsay | lolcat
You can run:
salloc -p interactive -c 4
apptainer build lolcow.sif lolcow.def
This upgrade is live on Ruddle and will be applied on Grace during the December maintenance. For more information, please see the Apptainer documentation site and our docs page on containers.
Software Highlights
- RELION/4.0.0-fosscuda-2020b for cryo-EM/cryo-tomography data processing is now available on Farnam. RELION/3.1 will no longer be updated by the RELION developer. Note that data processed with RELION 4 are not backwards compatible with RELION 3.
Ruddle Maintenance
November 1, 2022
Software Updates
- Security updates
- Slurm updated to 22.05.5
- Apptainer updated to 1.1.2
- Open OnDemand updated to 2.0.28
Hardware Updates
- No hardware changes during this maintenance.
November 2022
Announcements
Ruddle Maintenance
The biannual scheduled maintenance for the Ruddle cluster will be occurring Nov 1-3. During this time, the cluster will be unavailable. See the Ruddle maintenance email announcements for more details.
Grace and Milgram Maintenance Schedule Change
We will be adjusting the timing of Grace and Milgram's scheduled maintenance periods. Starting this December, Grace's maintenance periods will occur in December and June, with the next scheduled for December 6-8, 2022. Milgram's next maintenance will instead be performed in February and August, with the next scheduled for February 7-9, 2023. Please refer to previously sent communications for more information and see the full maintenance schedule for next year on our status page.
Requeue after Timeout
The YCRC clusters all have maximum time-limits that sometimes are shorter than a job needs to finish. This can be a frustration for researchers trying to get a simulation or a project finished. However, a number of workflows have the ability to periodically save the status of a process to a file and restart from where it left off. This is often referred to as "checkpointing" and is built into many standard software tools, like Gaussian and Gromacs.
Slurm is able to send a signal to your job just before it runs out of time. Upon receiving this signal, you can have your job save its current status and automatically submit a new version of the job which picks up where it left off. Here is an example of a simple script that resubmits a job after receiving the TIMEOUT
signal:
#!/bin/bash
#SBATCH -p day
#SBATCH -t 24:00:00
#SBATCH -c 1
#SBATCH --signal=B:10@30 # send the signal `10` at 30s before job finishes
#SBATCH --requeue # mark this job eligible for requeueing
# define a `trap` that catches the signal and requeues the job
trap "echo -n 'TIMEOUT @ '; date; echo 'Resubmitting...'; scontrol requeue ${SLURM_JOBID} " 10
# run the main code, with the `&` to “background” the task
./my_code.exe &
# wait for either the main code to finish to receive the signal
wait
This tells Slurm to send SIGNAL10
at ~30s before the job finishes. Then we define an action (or trap
) based on this signal which requeues the job. Don’t forget to add the &
to the end of the main executable and the wait
command so that the trap is able to catch the signal.
Software Highlights
- MATLAB/2022b is now available on all clusters.
Farnam Maintenance
October 4-5, 2022
Software Updates
- Security updates
- Slurm updated to 22.05.3
- NVIDIA drivers updated to 515.65.01
- Lmod updated to 8.7
- Apptainer updated to 1.0.3
- Open OnDemand updated to 2.0.28
Hardware Updates
- No hardware changes during this maintenance.
October 2022
Announcements
Farnam Maintenance
The biannual scheduled maintenance for the Farnam cluster will be occurring Oct 4-6. During this time, the cluster will be unavailable. See the Farnam maintenance email announcements for more details.
Gibbs Maintenance
Additionally, the Gibbs storage system will be unavailable on Grace and Ruddle on Oct 4 to deploy an urgent firmware fix. All jobs on those clusters will be held, and no new jobs will be able to start during the upgrade to avoid job failures.
New Command for Interactive Jobs
The new version of Slurm (the scheduler) has improved the process of launching an interactive compute job. Instead of the clunky srun --pty bash
syntax from previous versions, this is now replaced with salloc
. In addition, the interactive partition is now the default partition for jobs launched using salloc
. Thus a simple (1 core, 1 hour) interactive job can be requested like this:
salloc
which will submit the job and then move your shell to the allocated compute node.
For MPI users, this allows multi-node parallel jobs to be properly launched inside an interactive compute job, which did not work as expected previously. For example, here is a two-node job, launched with salloc
and then a parallel job-step launched with srun
:
[user@grace1 ~]$ salloc --nodes 2 --ntasks 2 --cpus-per-task 1
salloc: Nodes p09r07n[24,28] are ready for job
[user@p09r07n24 ~]$ srun hostname
p09r07n24.grace.hpc.yale.internal
P09r07n28.grace.hpc.yale.internal
For more information on salloc
, please refer to Slurm’s documentation.
Software Highlights
- cellranger/7.0.1 is now available on Farnam.
- LAMMPS/23Jun2022-foss-2020b-kokkos is now available on Grace.
September 2022
Announcements
Software Module Extensions
Our software module utility (Lmod) has been enhanced to enable searching for Python and R (among other software) extensions. This is a very helpful way to know which software modules contain a specific library or package. For example, to see what versions of ggplot2
are available, use the module spider
command.
$ module spider ggplot2
--------------------------------------------------------
ggplot2:
--------------------------------------------------------
Versions:
ggplot2/3.3.2 (E)
ggplot2/3.3.3 (E)
ggplot2/3.3.5 (E)
$ module spider ggplot2/3.3.5
-----------------------------------------------------------
ggplot2: ggplot2/3.3.5 (E)
-----------------------------------------------------------
This extension is provided by the following modules. To access the extension you must load one of the following modules. Note that any module names in parentheses show the module location in the software hierarchy.
R/4.2.0-foss-2020b
This indicates that by loading the R/4.2.0-foss-2020b
module you will gain access to ggplot2/3.3.5
.
Software Highlights
- topaz/0.2.5-fosscuda-2020b for use with RELION (fosscuda-2020b toolchain) is now available as a module on Farnam.
Grace Maintenance
August 2-4, 2022
Software Updates
- Security updates
- Slurm updated to 22.05.2
- NVIDIA drivers updated to 515.48.07 (except for nodes with K80 GPUs, which are stranded at 470.129.06)
- Singularity replaced by Apptainer version 1.0.3 (note: the "singularity" command will still work as expected)
- Lmod updated to 8.7
- Open OnDemand updated to 2.0.26
Hardware Updates
- Core components of the ethernet network were upgraded to improve performance and increase overall capacity.
Loomis Decommission and Project Data Migration
After over eight years in service, the primary storage system on Grace, Loomis (/gpfs/loomis
), will be retired later this year.
Project. We have migrated all of the Loomis project space (/gpfs/loomis/project
) to the Gibbs storage system at /gpfs/gibbs/project
during the maintenance. You will need to update your scripts and workflows to point to the new location (/gpfs/gibbs/project/<group>/<netid>
). The "project" symlink in your home directory has been updated to point to your new space (with a few exceptions described below), so scripts using the symlinked path will not need to be updated. If you have jobs in a pending state going into the maintenance that used the absolute Loomis path, we recommend canceling, updating and then re-submitting those jobs so they do not fail.
If you had a project space that exceeds the no-cost allocation (4 TiB), you have received a separate communication from us with details about your data migration. In these instances, your group has been granted a new, empty "project" space with the default no-cost quota. Any scripts will need to be updated accordingly.
Conda. By default, all conda environments are installed into your project directory. However, most conda environments do not survive being moved from one location to another, so you may need to regenerate your conda environment(s). To assist with this, we provide conda-export documentation.
R. Similarly, in 2022 we started redirecting user R packages to your project space to conserve home directory usage. If your R environment is not working as expected, we recommend deleting the old installation (found in ~/project/R/<version>
) and rerunning install.packages.
Custom Software Installation. If you or your group had any self-installed software in the project directory, it is possible that the migration will have broken the software and it will need to be recompiled. Contact us if you need assistance recompiling.
Scratch60. The Loomis scratch space (/gpfs/loomis/scratch60
) is now read-only. All data in that directory will be purged in 60 days on October 3, 2022. Any data in /gpfs/loomis/scratch60
you wish to retain needs to be copied into another location by that date (such as your Gibbs project or Palmer scratch).
Changes to Non-Interactive Sessions
Non-interactive sessions (e.g. file transfers, commands sent over ssh) no longer load the standard cluster environment to alleviate performance issues due to unnecessary module loads. Please contact us if this change affects your workflow so we can resolve the issue or provide a workaround.
August 2022
Announcements
Grace Maintenance & Storage Changes
The biannual scheduled maintenance for the Grace cluster will be occurring August 2-4. During this time, the cluster will be unavailable. See the Grace maintenance email announcement for more details.
During the maintenance, significant changes will be made to the project and scratch60 directories on Grace. See our website for more information and updates.
SpinUp Researcher Image & Containers
Yale offers a simple portal for creating cloud-based compute resources called SpinUp. These cloud instances are hosted on Amazon Web Services, but have access to Yale services like Active Directory, DNS, and Storage at Yale. SpinUp offers a range of services including virtual machines, web servers, remote storage, and databases.
Part of this service is a Researcher Image, an Ubuntu-based system which contains a suite of pre-installed commonly utilized software utilities, including: - PyTorch, TensorFlow, Keras, and other GPU-accelerated deep learning frameworks - GCC, Cmake, Go, and other development tools - Singularity/Apptainer and Docker for container development
We recommend researchers looking to develop containers for use on YCRC HPC resources to utilize SpinUp to build containers which can then be copied to the clusters.
If there are software utilities or commonly used tools that you would like added to the Researcher Image, let us know and we can work with the Cloud Team to get them integrated.
Software Highlights
- AFNI/2022.1.14 is now available on Farnam and Milgram.
- cellranger/7.0.0 is now available on Grace.
July 2022
Announcements
Loomis Decommission
After almost a decade in service, the primary storage system on Grace, Loomis (/gpfs/loomis
), will be retired later this year. The usage and capacity on Loomis will be replaced by two existing YCRC storage systems, Palmer and Gibbs, which are already available on Grace. Data in Loomis project storage will be migrated to /gpfs/gibbs/project
during the upcoming August Grace maintenance. See the Loomis Decommission documenation for more information and updates.
Updates to OOD Jupyter App
OOD Jupyter App has been updated to handle conda environments more intelligently. Instead of listing all the conda envs in your account, the app now lists only the conda environments with Jupyter installed. If you do not see your desired environment listed in the dropdown, check that you have installed Jupyter in that environment. In addition, the “jupyterlab” checkbox in the app will only be visible if the environment selected has jupyterlab installed.
YCRC conda environment
ycrc_conda_env.list
has been replaced by ycrc_conda_env.sh
. To update your conda enviroments in OOD for the Jupyter App and RStudio Desktop (with Conda R), please run ycrc_conda_env.sh update
.
Software Highlights
- miniconda/4.12.0 is now available on all clusters
- RStudio/2022.02.3-492 is now available on all clusters. This is currently the only version that is compatible with the graphic engine used by R/4.2.0-foss-2020b.
- fmriprep/21.0.2 is now available on Milgram.
- cellranger/7.0.0 is now available on Farnam.
Milgram Maintenance
June 7-8, 2022
Software Updates
- Security updates
- Slurm updated to 21.08.8-2
- NVIDIA drivers updated to 515.43.04
- Singularity replaced by Apptainer version 1.0.2 (note: the "singularity" command will still work as expected)
- Lmod updated to 8.7
- Open OnDemand updated to 2.0.23
Hardware Updates
- The hostnames of the compute nodes on Milgram were changed to bring them in line with YCRC naming conventions.
Changes to non-interactive sessions
Non-interactive sessions (e.g. file transfers, commands sent over ssh) no longer load the standard cluster environment to alleviate performance issues due to unnecessary module loads. Please contact us if this change affects your workflow so we can resolve the issue or provide a workaround.
June 2022
Announcements
Farnam Decommission & McCleary Announcement
After more than six years in service, we will be retiring the Farnam HPC cluster later this year. Farnam will be replaced with a new HPC cluster, McCleary. The McCleary HPC cluster will be Yale's first direct-to-chip liquid cooled cluster, moving the YCRC and the Yale research computing community into a more environmentally friendly future. For more information about the decommission process and the launch of McCleary, see our website.
RStudio (with module R) has been retired from Open OnDemand as of June 1st
Please switch to RStudio Server which provides a better user experience. For users using a conda environment with RStudio, RStudio (with Conda R) will continue to be served on Open OnDemand.
Milgram Maintenance
The biannual scheduled maintenance for the Milgram cluster will be occurring June 7-9. During this time, the cluster will be unavailable. See the Milgram maintenance email announcements for more details.
Software Highlights
- QTLtools/1.3.1-foss-2020b is now available on Farnam.
- R/4.2.0-foss-2020b is available on all clusters.
- Seurat for R/4.2.0 is now available on all clusters through the R-bundle-Bioconductor/3.15-foss-2020b-R-4.2.0 module along with many other packages. Please check to see if any packages you need are available in these modules before running
install.packages
.
Ruddle Maintenance
May 2, 2022
Software Updates
- Security updates
- Slurm updated to 21.08.7
- Singularity replaced by Apptainer version 1.0.1 (note: the "singularity" command will still work as expected)
- Lmod updated to 8.7
Changes to non-interactive sessions
Non-interactive sessions (e.g. file transfers, commands sent over ssh) no longer load the standard cluster environment to alleviate performance issues due to unnecessary module loads. Please contact us if this change affects your workflow so we can resolve the issue or provide a workaround.
May 2022
Announcements
Ruddle Maintenance
The biannual scheduled maintenance for the Ruddle cluster will be occurring May 3-5. During this time, the cluster will be unavailable. See the Ruddle maintenance email announcements for more details.
Remote Visualization with Hardware Acceleration
VirtualGL is installed on all GPU nodes on Grace, Farnam, and Milgram to provide hardware accelerated 3D rendering. Instructions on how to use VirtualGL to accelerate your 3D applications can be found at https://docs.ycrc.yale.edu/clusters-at-yale/guides/virtualgl/.
Software Highlights
- Singularity is now called "Apptainer". Singularity is officially named “Apptainer” as part of its move to the Linux Foundation. The new command
apptainer
works as drop-in replacement forsingularity
. However, the previoussingularity
command will also continue to work for the foreseeable future so no change is needed. The upgrade to Apptainer is on Grace, Farnam and Ruddle (as of the maintenance completion). Milgram will be upgraded to Apptainer during the June maintenance. - Slurm has been upgraded to version 21.08.6 on Grace
- MATLAB/2022a is available on all clusters
Farnam Maintenance
April 4-7, 2022
Software Updates
- Security updates
- Slurm updated to 21.08.6
- NVIDIA drivers updated to 510.47.03 (note: driver for NVIDIA K80 GPUs was upgraded to 470.103.01)
- Singularity replaced by Apptainer version 1.0.1 (note: the "singularity" command will still work as expected)
- Open OnDemand updated to 2.0.20
Hardware Updates
- Four new nodes with 4 NVIDIA GTX3090 GPUs each have been added
Changes to the bigmem
Partition
Jobs requesting less than 120G of memory are no longer allowed in the "bigmem" partition. Please submit these jobs to the general or scavenge partitions instead.
Changes to non-interactive sessions
Non-interactive sessions (e.g. file transfers, commands sent over ssh) no longer load the standard cluster environment to alleviate performance issues due to unnecessary module loads. Please contact us if this change affects your workflow so we can resolve the issue or provide a workaround.
April 2022
Announcements
Updates to R on Open OnDemand
RStudio Server is out of beta! With the deprecation of R 3.x (see below), we will be removing RStudio Desktop with module R from Open OnDemand on June 1st.
Improvements to R install.packages Paths
Starting with the R 4.1.0 software module, we now automatically set an environment variable (R_LIBS_USER
) which directs these packages to be stored in your project space. This will helps ensure that packages are not limited by home-space quotas and that packages installed for different versions of R are properly separated from each other. Previously installed packages should still be available and there should be no disruption from the change.
Instructions for Running a MySQL Server on the Clusters
Occasionally it could be useful for a user to run their own MySQL database server on one of the clusters. Until now, that has not been possible, but recently we found a way using singularity. Instructions may be found in our new MySQL guide.
Software Highlights
- R 3.x modules have been deprecated on all clusters and are no longer supported. If you need to continue to use an older version of R, look at our R conda documentation.
- R/4.1.0-foss-2020b is now available on all clusters.
- Seurat/4.1.0-foss-2020b-R-4.1.0 (for using the Seurat R package) is now available on all clusters.
March 2022
Announcements
Snapshots
Snapshots are now available on all clusters for home and project spaces. Snapshots enable self-service restoration of modified or deleted files for at least 2 days in the past. See our User Documentation for more details on availability and instructions.
OOD File Browser Tip: Shortcuts
You can add shortcuts to your favorite paths in the OOD File Browser. See our OOD documentation for instructions on setting up shortcuts.
Software Highlights
- R/4.1.0-foss-2020b is now on Grace.
- GCC/11.2.0 is now on Grace.
Grace Maintenance
February 3-6, 2022
Software Updates
- Latest security patches applied
- Slurm updated to version 21.08.5
- NVIDIA driver updated to version 510.39.01 (except for nodes with K80 GPUs which are stranded at 470.82.01)
- Singularity updated to version 3.8.5
- Open OnDemand updated to version 2.0.20
Hardware Updates
- Changes have been made to networking to improve performance of certain older compute nodes
Changes to Grace Home Directories
During the maintenance, all home directories on Grace have been moved to our new all-flash storage filesystem, Palmer. The move is in anticipation of the decommissioning of Loomis at the end of the year and will provide a robust login experience by protecting home directory interactions from data intensive compute jobs.
Due to this migration, your home directory path has changed from /gpfs/loomis/home.grace/<netid>
to /vast/palmer/home.grace/<netid>
.
Your home directory can always be referenced in bash and submission scripts and from the command line with the $HOME
variable. Please update any scripts and workflows accordingly.
Interactive Jobs
We have added an additional way to request an interactive job. The Slurm command salloc
can be used to start an interactive job similar to srun --pty bash
. In addition to being a simpler command (no --pty bash
is needed), salloc
jobs can be used to interactively test mpirun
executables.
Palmer scratch
Palmer is out of beta! We have fixed the issue with Plink on Palmer, so now you can use Palmer scratch for any workloads. See https://docs.ycrc.yale.edu/data/hpc-storage#60-day-scratch for more information on Palmer scratch.
February 2022
Announcements
Grace Maintenance
The biannual scheduled maintenance for the Grace cluster will be occurring February 1-3. During this time, the cluster will be unavailable. See the Grace maintenance email announcement for more details.
Data Transfers
For non-Milgram users doing data transfers, transfers should not be performed on the login nodes. We have a few alternative ways to get better networking and reduce the impact on the clusters’ login nodes:
- Dedicated transfer node. Each cluster has a dedicated transfer node,
transfer-<cluster>.hpc.yale.edu
. You can ssh directly to this node and run commands. - “transfer” Slurm partition. This is a small partition managed by the scheduler for doing data transfer. You can submit jobs to it using
srun/sbatch -p transfer …
*For recurring or periodic data transfers (such as using cron), please use Slurm’s scrontab to schedule jobs that run on the transfer partition instead. - Globus. For robust transfers of larger amount of data, see our Globus documentation.
More info about data transfers can be found in our Data Transfer documentation.
Software Highlights
- Rclone is now installed on all nodes and loading the module is no longer necessary.
- MATLAB/2021b is now on all clusters.
- Julia/1.7.1-linux-x86_64 is now on all clusters.
- Mathematica/13.0.0 is now on Grace.
- QuantumESPRESSO/6.8-intel-2020b and QuantumESPRESSO/7.0-intel-2020b are now on Grace.
- Mathematica documentation has been updated with regards to configuring parallel jobs.