Hopper
Hopper is a shared-use resource for all researchers at Yale University for high performance computation of ePHI, NIH Controlled-Access Data, CUI and certain other types of sensitive data. Hopper consists of a variety of standard compute and GPU-enabled nodes and mounts an encrypted shared filesystem. The Hopper cluster is named for the computer scientist and United States Navy Rear Admiral Grace Murray Hopper, who received her Ph.D. in Mathematics from Yale in 1934.
Hopper is jointly supported by the YCRC and HSIT to ensure a high level of service and facilitate secure computational research. Hopper is one of a number of secure computing environments, such as SpinUp+ and CHP, developed to support a variety of secure computing needs. Support staff are available to assist researchers with identifying and accessing the most suitable compute resources for their research projects.
Access the Cluster
Projects
Unlikely other YCRC HPC systems, access to the system is granted on a per project basis, so a single PI may have multiple projects.
Projects are named <pi_netid>_<project_code>
.
A request for a specific project will need to be submitted by a PI and approved before user accounts can be created.
Once a project is approved, the project's PI is responisible for the approval of additional user accounts. PIs are also required to conduct a quarterly review of user accounts on their projects. Failure to complete the review by the due date will result deactivation of the project (inability to submit jobs or access data for the project) until the review is complete.
Email us at research.computing@yale.edu to inquire about using Hopper.
Accounts
All accounts must be associated with an active project and approved by projects' PIs. Accounts will be deactivated when any of the following occurs:
- Account is no longer associated with any active projects
- Account is inactive for more than one year
- Owner leaves the university
- Owner fails to renew required training
Before using Hopper, all users must successfully complete the a training program which includes NIST 800-171 and HIPAA training. Once the associated PI approves the account, the user will get an email prompt to the training. Users can take the training in advance if desired via the button below. This training must be renewed annually.
Log In
Info
Connections to Hopper can only be made from the Yale VPN (access.yale.edu
)--even if you are already on campus (YaleSecure or ethernet). See our VPN page for setup instructions.
Once you have an account, the cluster can be accessed through the Virtual Desktop Infrastructure (VDI). The VDI functions as the ‘login node’ and isolates the user from their host computer. The VDI provides a virtual desktop with the standard YCRC cluster interfaces, such as the Open OnDemand Web Portal (coming soon!) and command line terminal access, to access files, run commands, and launch jobs.
To access the VDI, navigate to hopper1.ycrc.yale.edu
in a web browser.
Security Restrictions
To protect the security of the data on Hopper and to comply with NIST 800-171 and HIPAA regulations, Hopper has a number of addition restrictions beyond other YCRC systems.
- The VDI prevents copy/pasting to the host computer, prevents file transfers (see below for how to transfer files) and enforces idle session timeouts.
- Screenshots, screen recording and screen sharing (e.g. via Zoom) are strictly prohibited (see below for how to record and report issues).
- If you know you will be away from your computer for more than 10 minutes, you must disconnect from the VDI. This can be easily done by simply closing the browser tab.
- You must access Hopper from a private location, such as your home or office. Access from public locations such as coffee shopts, transportation hubs or libraries is not allowed.
- Do not put sensitive data (e.g. patient information, personal identifiers) in directory names or job names, which might inadvertently expose this information.
Report an Issue
If you run into an issue on Hopper and would think it would be helpful to take a picture of your session (e.g. to record an error message), click the "Report an Issue" icon on your VDI desktop. This will place a capture of your screen in a folder where it can be reviewed by YCRC staff. Please notify YCRC staff in your help request if you have recorded your issue in this way.
Transfer Data
Data may only be transferred to and from Hopper using an approved method as described below. By default, all internet access is blocked, with only certain approved remote sites whitelisted. All transfers of any type will be logged, and users remain responsible for following the restrictions that apply to their data.
High-Risk Data
All high-risk data transfers, either onto or out of the cluster, require approval.
Submit High-Risk Incoming Data Transfer Request
For outgoing high-risk data transfers, contact us at research.computing@yale.edu.
Low-Risk Data
Low-risk files, such as scripts or low-risk data, can be uploaded to Hopper using Globus via the "Yale CRC Hopper Low Risk" collection into a user-specific staging directory. For details on using Globus, please read our Globus documentation. The user then must submit a request to the YCRC to have the transfer approved and then the data will be transferred to the desired location on Hopper by YCRC staff. If your data is large (>200G), please submit your request prior to uploading the data so we can facilitate the larger transfer.
Downloading of low-risk files from the cluster is the same process, but in reverse. submit a request to the YCRC to export your data, and once approved staff will transfer the data to a user-specific directory on the Globus server. Then you can retrieve your data using Globus at your convenience. Data will be purged from the staging area after a TBD amount of time.
Submit Low-Risk Transfer Request
Software
All software must be approved and installed by YCRC staff, typically as software modules. No software may be installed by users. A researcher's own analysis scripts, such as Python, R, MATLAB or bash scripts, do not qualify as software and are permissible to upload and run on the cluster without approval. If you are unclear if your workflow qualifies as software, please contact your administrator for clarifications.
R and Python Packages
We have set up a monitored proxy to PyPI and CRAN to allow you to install your own
Python and R packages using the standard methods (i.e. pip
, install.packages
).
From the hopper1
login node (where you go when you connect via the ThinLinc VDI), you can
use conda with the default Conda repo (not conda-forge or bioconductor) and pip
to create your own environments.
LLM Models
LLM models, such as Llama, qualify as software so must be approved and installed by YCRC staff.
We have made commonly requested LLM models available as software modules for easier offline use.
To use an offline LLM model, run module load <module name>
.
Run module display <module name>
to determine the environment variable for the model path.
Reference the environment variable (e.g. LLM_LLAMA
) for the model path in your python commands. For example:
model_path = os.environ["LLM_LLAMA"]
model = LlamaForCausalLM.from_pretrained(model_path, local_files_only=True)
OR
model_path = os.environ["LLM_LLAMA"]
pipeline = transformers.pipeline("text-generation", model=model_path, model_kwargs={"torch_dtype": torch.bfloat16}, device_map="auto")
If you need additional LLM models that are not yet installed, contact us to request that we add it. Please be selective when requesting very large models (e.g. > 100B parameters)--due to their large size we can only host a limited number of these models.
Submit Jobs
Jobs are run using Slurm in the usual way, either using interactive or batch allocations.
All jobs must specify a ‘project’ account using the -A
flag.
By default, nodes are shared with multiple jobs and users.
If the security of your project requires isolation, you must submit your jobs with -X
flag to ensure an exclusive allocation.
Such projects will be clearly identified during the approval and onboarding process.
At the moment, Slurm will not send job status emails, so please login to check the status of your jobs.
While the VDI will lock sessions after 20 minutes of idle time, jobs submitted to the scheduler and sessions in the Web Portal will continue to run until they either complete (in the case of batch jobs), reach the limit of the requested wall time or are terminated by the user.
Partitions and Hardware
See each tab below for more information about the available common use partitions.
Use the day partition for most batch jobs. This is the default if you don't specify one with --partition
.
Request Defaults
Unless specified, your jobs will run with the following options to salloc
and sbatch
options for this partition.
--time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120
Available Compute Nodes
Requests for --cpus-per-task
and --mem
can't exceed what is available on a single compute node.
Count | CPU Type | CPUs/Node | Memory/Node (GiB) | Node Features |
---|---|---|---|---|
52 | cpugen:emeraldrapids | 64 | 976 | cpugen:emeraldrapids, cpumodel:8562Y+, common:yes |
Use the devel partition to jobs with which you need ongoing interaction. For example, exploratory analyses or debugging compilation.
Request Defaults
Unless specified, your jobs will run with the following options to salloc
and sbatch
options for this partition.
--time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120
Available Compute Nodes
Requests for --cpus-per-task
and --mem
can't exceed what is available on a single compute node.
Count | CPU Type | CPUs/Node | Memory/Node (GiB) | Node Features |
---|---|---|---|---|
2 | cpugen:emeraldrapids | 64 | 976 | cpugen:emeraldrapids, cpumodel:8562Y+, common:yes |
Use the gpu partition for jobs that make use of GPUs. You must request GPUs explicitly with the --gpus
option in order to use them. For example, --gpus=a5000:2
would request 2 NVIDIA RTX A5000 GPUs per node.
Request Defaults
Unless specified, your jobs will run with the following options to salloc
and sbatch
options for this partition.
--time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120
GPU jobs need GPUs!
Jobs submitted to this partition do not request a GPU by default. You must request one with the --gpus
option.
Available Compute Nodes
Requests for --cpus-per-task
and --mem
can't exceed what is available on a single compute node.
Count | CPU Type | CPUs/Node | Memory/Node (GiB) | GPU Type | GPUs/Node | vRAM/GPU (GB) | Node Features |
---|---|---|---|---|---|---|---|
9 | cpugen:emeraldrapids | 48 | 976 | a5000 | 4 | 24 | cpugen:emeraldrapids, cpumodel:6542Y, gpumodel:a5000, common:yes |
9 | cpugen:sapphirerapids | 48 | 976 | l40s | 4 | 48 | cpugen:sapphirerapids, cpumodel:6442Y, gpumodel:l40s, common:yes |
10 | cpugen:sapphirerapids | 48 | 976 | a40 | 4 | 48 | cpugen:sapphirerapids, cpumodel:6442Y, gpumodel:a40, common:yes |
15 | cpugen:sapphirerapids | 48 | 976 | h100 | 4 | 80 | cpugen:sapphirerapids, cpumodel:6442Y, gpumodel:h100, common:yes |
Use the bigmem partition for jobs that have memory requirements other partitions can't handle.
Request Defaults
Unless specified, your jobs will run with the following options to salloc
and sbatch
options for this partition.
--time=01:00:00 --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=5120
Available Compute Nodes
Requests for --cpus-per-task
and --mem
can't exceed what is available on a single compute node.
Count | CPU Type | CPUs/Node | Memory/Node (GiB) | Node Features |
---|---|---|---|---|
4 | cpugen:emeraldrapids | 64 | 1953 | cpugen:emeraldrapids, cpumodel:8562Y+, common:yes |
2 | cpugen:sapphirerapids | 64 | 3906 | cpugen:sapphirerapids, cpumodel:8462Y+, common:yes |
Storage
Hopper has access to one filesystem called weston
.
Weston is a VAST filesystem similar to the palmer
filesystem on Grace and McCleary, with the addition of encryption-at-rest for all user data.
For more details on the different storage spaces, see our Cluster Storage documentation.
You can check your current storage usage & limits by running the getquota
command.
You have shortcuts in your home directory to each project's storage spaces that follow the form ~/work_<project>
and ~/scratch_<project>
.
You can also get a list of the absolute paths to your directories with the mydirectories
command.
Top-level folder permissions are managed by YCRC and cannot be modified by users.
Only users in a specific project will be able to access that project's storage spaces.
For information on data recovery, see the Backups and Snapshots documentation.
Warning
Files stored in scratch
are purged if they are older than 60 days. You will receive an email alert one week before they are deleted. Artificial extension of scratch file expiration is forbidden without explicit approval from the YCRC. Please purchase storage if you need additional longer term storage.
Fileset | Root Directory | Storage | File Count | Backups | Snapshots | Notes |
---|---|---|---|---|---|---|
home | /home |
125GiB/user | 500,000 | No | 7 days | |
work | /nfs/weston/work_<project> |
1TiB/project | 5,000,000 | No | 7 days | |
scratch | /nfs/weston/scratch_<project> |
10TiB/project | 15,000,000 | No | No |