Tensorflow
Tensorflow and tensorflow-gpu are now the same package since tensorflow 2 was released. We do have some modules of tensorflow available for use with existing programs. You can find tensorflow versions using the command:
module avail tensorflow
Installing Tensorflow
If you need a specific version of tensorflow or are working with specific python packages in a miniconda environment, then you will likely need to install your own version of tensorflow. Each version of tensorflow requires a specific version of CUDA and cudnn to be installed. You can refer to this website.
This table outlines how to install each version of tensorflow from 2.15-2.11:
module load miniconda
conda create -n tf16 python=3.11.*
pip install tensorflow[and-cuda]==2.16.*
#tensorflow can't find cuda libraries, need to tell it
mkdir -p $CONDA_PREFIX/etc/conda/activate.d
echo 'NVIDIA_DIR=$(dirname $(dirname $(python -c "import nvidia.cudnn;print(nvidia.cudnn.__file__)")))' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
echo 'for dir in $NVIDIA_DIR/*; do if [ -d "$dir/lib" ]; then export LD_LIBRARY_PATH="$dir/lib:$LD_LIBRARY_PATH"; fi; done'
####make changes permanent
conda deactivate
conda activate tf16
module load miniconda
conda create -n tf15 python=3.11.*
pip install tensorflow[and-cuda]==2.15.*
module load miniconda
conda create -n tf14 python=3.11.*
conda activate tf14
pip install tensorflow==2.14.0 nvidia-cuda-runtime-cu11==11.8.89 nvidia-cublas-cu11==11.11.3.6 nvidia-cufft-cu11==10.9.0.58 nvidia-cudnn-cu11==8.7.0.84 nvidia-curand-cu11==10.3.0.86 nvidia-cusolver-cu11==11.4.1.48 nvidia-cusparse-cu11==11.7.5.86 nvidia-nccl-cu11==2.16.5 nvidia-cuda-cupti-cu11==11.8.87 nvidia-cuda-nvcc-cu11==11.8.89
# Store system paths to cuda libraries for gpu communication
mkdir -p $CONDA_PREFIX/etc/conda/activate.d
echo 'CUDNN_PATH=$(dirname $(python -c "import nvidia.cudnn;print(nvidia.cudnn.__file__)"))' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/:$CUDNN_PATH/lib' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
#deactivate and reactivate environment to permanently keep cuda libraries
conda deactivate
conda activate tf14
module load miniconda
conda create --name tf-condacuda python=3.11.* numpy pandas matplotlib jupyter cudatoolkit=11.8.0
conda activate tf-condacuda
pip install nvidia-cudnn-cu11==8.6.0.163
# Store system paths to cuda libraries for gpu communication
mkdir -p $CONDA_PREFIX/etc/conda/activate.d
echo 'CUDNN_PATH=$(dirname $(python -c "import nvidia.cudnn;print(nvidia.cudnn.__file__)"))' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/:$CUDNN_PATH/lib' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
#install tensorflow
pip install tensorflow==2.12.*
#deactivate and reactivate environment to permanently keep cuda libraries
conda deactivate
conda activate tf-condacuda
module load miniconda
conda create --name tf-condacuda python=3.10.* numpy pandas matplotlib jupyter cudatoolkit=11.3.1 cudnn=8.2.1
conda activate tf-condacuda
# Store system paths to cuda libraries for gpu communication
mkdir -p $CONDA_PREFIX/etc/conda/activate.d
echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/' > $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
#install tensorflow
pip install tensorflow==2.11.*
#deactivate and reactivate environment to permanently keep cuda libraries
conda deactivate
conda activate tf-condacuda
Test tensorflow gpu detection
If tensorflow is installed correctly, then it should be able to detect any GPUs that are allocated. You can test your tensorflow installation using these steps:
#####request compute allocation with a gpu node
salloc --partition=gpu_devel --cpus-per-task=1 --gpus=1 -t 4:00:00
#####load tensorflow miniconda environment
module load miniconda
conda activate YOUR_TF_ENVIRONMENT
#####run tensorflow validation test
python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
This should print out a line of text that lists the recognized GPU. If this fails, please reach out to YCRC support for further assistance.
Additional Tensorflow packages
ptxas or nvvm
Use case: Tensorflow missing ptxas or complaining about can't find $CUDA/nvvm/libdevice:
module load miniconda
conda activate YOUR_TF_ENVIRONMENT
conda install -c nvidia cuda-nvcc
mkdir -p $CONDA_PREFIX/etc/conda/activate.d
echo 'export XLA_FLAGS=--xla_gpu_cuda_data_dir=$CONDA_PREFIX' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
conda deactivate
conda activate YOUR_TF_ENVIRONMENT
Tensorboard
Tensorboard comes installed with any tensorflow installation. Tensorboard is a visual software package for tensorflow that allows for graphical analysis of tensorflow processes. It is built to work in google colab and jupyter notebooks.
However, it requires a web browser to function and must therefore be launched using remote desktop on OOD rather than OOD jupyter apps.