GPUs and CUDA
Access the GPU Nodes
To access the GPU nodes you must request them with the scheduler.
An example Slurm command to request an interactive job on the gpu partition with X forwarding and 1/2 of a GPU node (10 cores and 1 K80):
srun --pty --x11 -p gpu -c 10 -t 24:00:00 --gres=gpu:2 --gres-flags=enforce-binding bash
--gres=gpu:2 option asks for two gpus, and the
--gres-flags=enforce-binding option ensures you get two GPUs on the same card, and that the CPUs you are allocated are on the same bus as your GPU. Note that the
--gres count is per node, not per task or core.
To submit a batch job, include the following directives (in addition to your core, time, etc requests):
#SBATCH -p gpu #SBATCH --gres=gpu:1 #SBATCH --gres-flags=enforce-binding
Please do not use nodes with GPUs unless your application or job can make use of them. Any jobs found running in a GPU partition without a GPU will be terminated without warning.
You can check the available GPUs and their current usage with the command nvidia-smi.
Request Specific GPUs
To request a specific type of GPU (e.g. a P100) for each node in your job, you specify the GPU type in the
Some codes require double-precision capable GPUs. If applicable, you can request any node with a compatible GPU by using the
doubleprecision feature (e.g. K80, P100 or V100).
#SBATCH -C doubleprecision
Conversely, you can use the
singleprecision feature to request nodes that have single-precision only GPUs (e.g. GTX 1080, RTX 2080).
CUDA and cuDNN are available as modules where applicable. On your cluster of choice use toolkit is installed on the GPU nodes. To see what is available, type:
module avail cuda
The CUDA libraries you load will allow you to compile code against them. To run CUDA-enabled code you must also be running on a node with a gpu allocated and a compatible driver installed. The minimum driver versions are as follows (borrowed from this nvidia developer site):
CUDA 10.0: 410.48 CUDA 9.2: 396.xx CUDA 9.1: 390.xx CUDA 9.0: 384.xx CUDA 8.0: 375.xx CUDA 7.5: 352.xx CUDA 7.0: 346.xx CUDA 6.5: 340.xx CUDA 6.0: 331.xx
To check the insalled version of the nvidia drivers, you can also use
[user@gpu01 ~]$ nvidia-smi Tue Jan 16 10:31:45 2018 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 375.66 Driver Version: 375.66 | ...
Here we see that the node gpu01 is running driver version 375.66. It can run CUDA 8 but not CUDA 9 yet.