Python with Conda

For researchers who have Python (or R package--see bottom) requirements beyond the most common packages (e.g. Numpy, Scipy, Pandas), we recommend using Anaconda. Using Anaconda's Conda package manager, you can create and manage packages and environments. These allow you to easily switch between versions of Python libraries and applications for different projects.

Many other software applications have also started to use Conda as a package manager. It has become a popular choice for managing pipelines that involve several tools, especially with multiple languages.

The Miniconda Module

For your convenience, we provide a relatively recent version of Miniconda (a minimal set of Anaconda libraries) as a module. It serves to bootstrap your personal environments. By using this module, you do not need to download your own copy of Conda, which will prevent unnecessary file and storage usage in your directories.

Note: If you are on Milgram and run out of space in your home directory for Conda, you can either reinstall your environment in your project space (see below) or contact us at hpc@yale.edu for help with your home quota.

Setup Your Environment

Load the Miniconda Module

module load miniconda

You can save this to your default module collection by using module save. See our module documentation for more details.

Default Install Locations

By default on all clusters, we set the CONDA_ENVS_PATH and CONDA_PKGS_DIRS environment variables to conda_envs and conda_pkgs in your project directory where there is more quota available. Conda will install to and search in these directories for environments and cached packages.

Create a conda Environment

To create an environment (saved to the first location in $CONDA_ENVS_PATH or to ~/.conda/envs) use the conda create command. You should give your environments names that are meaningful to you, so you can more easily keep track of which serves which project or purpose. You can also use environments manage groups of packages that have conflicting prerequisites.

Because dependency resolution is hard and messy, we find specifying as many packages as possible at environment creation time can help minimize broken dependencies. Although often unavoidable for Python, we also recommend against heavily mixing the use of conda and pip to install applications. If needed, try to get as much installed with conda, then use pip to get the rest of the way to your desired environment.

Tip

For added reproducibility and control, specify versions of packages to be installed using conda with packagename=version syntax. E.g. numpy=1.14

For example, if you have a legacy application that needs Python 2 and OpenBLAS:

conda create -n legacy_application python=2.7 openblas

If you want a good starting point for interactive development of scientific Python scripts:

conda create -n py37_dev python=3.7 numpy scipy pandas matplotlib ipython jupyter

Conda Channels

There are also community-lead collections of unofficial packages that you can use with conda called channels. A few popular examples are Conda Forge and Bioconda. See the conda docs for more info about managing channels.

You could use the Conda Forge channel to install Brian2

conda create -n brian2 --channel conda-forge brian2

Bioconda provides recent versions of various bioinformatics tools, for example:

conda create -n bioinfo --channel conda-forge --channel bioconda biopython bedtools bowtie2 repeatmasker

Channel priority decreases from left to right - the first argument is higher priority than the second.

Using Your Environment

To use the applications in your environment, make sure you have the miniconda module loaded then run the following:

source activate env_name

Warning

We do not recommend putting source activate commands in your .bashrc file. This can lead to issues in interactive or batch jobs. If you do have issues with an environment in an interactive or batch job, trying re-entering the environment by calling source deactivate before rerunning source activate env_name.

Interactive

Your conda environments will not follow you into job allocations. Make sure to activate them after your interactive job begins.

In a Job Script

To make sure that you are running in your project environment in a submission script, make sure to include the following lines in your submission script before running any other commands or scripts (but after your Slurm directives):

#!/bin/bash
#SBATCH --partition=general
#SBATCH --job-name=my_conda_job
#SBATCH --cpus-per-task 4
#SBATCH --mem-per-cpu=6000


module load miniconda

source activate env_name
python analyses.py

Find and Install Additional Packages

You can search Anaconda Cloud for any packages you would like to install. Once in your conda environment, you can install any additional packages using conda install:

Python

conda install numpy

Troubleshoot

"Permission Denied"

If you get a permission denied error while trying to conda or pip install a package, make sure you have created an environment and activated it or activated an existing one first.

"-bash: activate: No such file or directory"

If you get the above error, it is likely that you don't have the necessary module file loaded. Try loading the minconda module and rerunning your source activate env_name command.

"could not find environment:"

This error means that the version of Anaconda/Miniconda you have loaded doesn't recognize the environment name you have supplied. Make sure you have the miniconda module loaded (and not a different Python module) and have previously created this environment. You can see a list of previously created environments by running:

conda info --envs

Additional Conda Commands

List Installed Packages

source activate env_name
conda list

Delete a Conda Environment

conda env remove --name env_name

Share your Conda Environment

If you want to share or back up a conda environment, you can export it to a file. To do so you need to run the following, replacing env_name with the desired environment.

source activate env_name
conda env export > env_name_environment.yml
# on another machine or account, run
conda env create -f env_name_environment.yml

Conda for R

Conda can also be used under certain circumstances to install and manage R packages. All R packages are prepended with r-.

Create new environment:

conda create -n r_env r-essentials r-base

Install additional packages:

conda install r-ggplot2

A list of officially supported R packages can be found here, but many additional packages are also available (e.g. via the conda-forge channel) and can be found using search on Anaconda Cloud.


Last update: March 11, 2020