Each research group is provided with storage space for research data on the GPFS parallel filesytems on the clusters. The storage is separated into three tiers: home, project, and scratch. You can monitor your storage usage by running the
getquota command on a cluster.
The only storage backed up on every cluster is
HPC Storage Locations
Home storage is a small amount of space to store your scripts, notes, final products (e.g. figures), etc. Home storage is backed up daily.
In general, project storage is intended to be the primary storage location for HPC research data in active use.
60-Day Scratch (
Use this space to keep intermediate files that can be regenerated/reconstituted if necessary. Files older than 60 days will be deleted automatically. This space is not backed up, and you may be asked to delete files younger than 60 days old if this space fills up.
HPC Storage Best Practices
Prevent Large Numbers of Small Files
Parallel fileystems, like the ones attached to our clusters, perform poorly with very large numbers of small files. For this reason, there are file count limits on all accounts to provide a safety net against excessive file creation. However, we expect users to manage their own file counts by altering workflows to reduce file creation, deleting unneeded files, and compressing (using tar) collections of files no longer in use.
Backups and Snapshots
Retrieve Data from Home Backups
Contact us at firstname.lastname@example.org with your netid and the list of files/directories you would like restored.
Retrieve Data from Snapshots (Farnam & Milgram)
Farnam and Milgram all run snapshots nightly on portions of their filesystems so that you can retrieve mistakenly modified or removed files for yourself. As long as your files existed in the form you want them in before the most recent midnight, they can probably be recovered. Snapshot directory structure mirrors the files that are being tracked with a prefix, listed in the table below.
|File set||Snapshot Prefix|
For example, if you wanted to recover the most recent snapshot of the file
/gpfs/ysm/home/rdb9/scripts/doit.sh, its path would be
/gpfs/ysm/.snapshots/$(date +%Y%m%d-0000)/home/rdb9/scripts/doit.sh. Similarly, the file
/gpfs/milgram/project/bjornson/rdb9/doit.sh (a file in the bjornson group's project directory owned by rdb9) could be recovered at
/gpfs/milgram/project/bjornson/.snapshots/$(date +%Y%m%d-0000)/rdb9/doit.sh .
Because of the way snapshots are stored, sizes will not be correctly reported until you copy your files/directories back out of the