Shareable Reproducible HPC Containers

Ben Evans

October 22, 2019

Outline for Today

  • Containers Background
  • Docker & Singularity
  • Running Singularity
  • Development Workflow
    • Using Docker to build


echo "this is one line of code split \
      for your viewing pleasure"

This is a link



  • Container Image: A self-contained, read-only file(s) used to run application(s)

  • Container: A running instance of an image

Three methods of control

  • Process isolation
  • Resource limits
  • Security


Linux Namespaces for hiding various aspects of host system from container.

  • Can’t see other processes
  • Can be used to modify networking
  • Chrome uses Namespaces for sandboxes

Resource Limits

Linux cgroups to limit RAM, CPU cores, etc.


When user is trusted: SELinx, AppArmor

When user is untrusted: run container as user

Should I Use Containers?

Pro Con
Light-weight Linux-only*
Fast Startup Another layer of abstraction
Shareable Additional development complexity
Reproducible Licensed software can be tricky

Motivating Example

GPU-enabled IPython w/TensorFlow on a GPU node:

srun --pty -p gpu_devel -c 2 --gres gpu:1 --mem 16G \
  singularity exec --nv \
  docker://tensorflow/tensorflow:latest-gpu-jupyter \

Better Example

Saved container for viral-ngs pipeline:

# once
singularity build viral-ngs-1.25.0.sif \
# subsequently
singularity exec viral-ngs-1.25.0.sif -h



  • Service runs to orchestrate
  • Images are composed of separate files: layers
  • Designed to be run with elevated privileges



  • No services needed to run
  • Images are single files
  • Designed to be run as unprivileged user


  • Admins are happy
  • Existing scripts, paths can work
  • Data are where you left them

How To Singularity

build a container image

singularity build my_container.sif \
  • Can only build from registries on cluster
  • Can also copy an image file from elsewhere

Run the Container

singularity run image.sif

singularity exec docker://org/image Rscript analyze.R

singularity shell -s /bin/bash shub://user/image

run (default behavior)

singularity run image.sif --arg=42
  • Default action specified by CMD or %runscript
  • Additional arguments are passed to default action
  • If image is +x and on path, can just be executed

exec a command

singularity exec docker://tensorflow/tensorflow:latest-gpu \
    python /home/ben/
  • Run first argument after container
  • Must exist/be on the PATH inside container

shell session

singularity shell -s /bin/bash sl-linux.sif
  • Use a different shell sh with -s/--shell

inspect an image

singularity inspect my_pipes.sif
singularity inspect -r my_pipes.sif
  • Use -r to show runscript

Singularity Runtime Config

  • Type: docker, shub or library
  • Registry: default
  • Namespace: username, org, etc
  • Repo: image name
  • Tag: name (latest) or hash (@sha256:1a6hz…)

Environment Variables

Set before running to add to container:

# prefixing new variables with with SINGULARITYENV_
export SINGULARITYENV_BLASTDB=/data/db/blast
export SINGULARITYENV_PREPEND_PATH=/opt/important/bin
export SINGULARITYENV_APPEND_PATH=/opt/fallback/bin
export SINGULARITYENV_PATH=/only/path

Cache Location

To change where image files are cached:

# default is ~/.singularity
export SINGULARITY_CACHEDIR=~/scratch60/.singularity
# or
export SINGULARITY_CACHEDIR=/tmp/${USER}/.singularity
  • Can get big fast

Change Container Filesystem View

Add host directory to the container with -B/--bind:

singularity run --bind /path/outside:/path/inside \
  • container may expect files somewhere, e.g. /data

Private DockerHub Repos

To specify DockerHub credentials:

singularity build --docker-login pytorch-19.09.sif \

Where did this come from?

Quick way to determine which files are from image:

singularity run/exec/shell --containall ...
  • Only container image files are available


Bind GPU drivers properly when CUDA installed inside container:

singularity run/exec/shell --nv ...


  • Have recent and same version in container & host
  • mpirun inside container needs more setup

RStudio Example

I want to run RStudio and Tidyverse.


Job file

#SBATCH -c 4 -t 2-00:00:00
mkdir -p /tmp/${USER}
singularity run -B /tmp/${USER}:/tmp \

Reverse ssh tunnel:

ssh -NL 8787:cxxnxx:8787

Then connect to http://localhost:8787

Not ideal…

  • According to docs, we have to rebuild the container
  • Change /etc/rstudio/rserver.conf

Dev Workflow

When you have to configure your own

id est

  • Use docker to build everything
  • Use singularity to when you can’t use docker


  • Docker/Docker Hub ecosystem large, stable
  • Docker re-builds can be faster
  • Can auto-build git repos to docker*
  • More easily use docker on most platforms

Best Practices

  • Don’t install anything to root’s home, /root
  • Don’t put container valuables in $TMP or $HOME
  • Use CMD for default runtime behavior
  • Maybe call ldconfig at the end of your Dockerfile


Container recipes

  • File always named Dockerfile
  • Text file with setup scripts
  • Split up into instructions
  • Each instruction is a layer

A half-fix for my RStudio issue

FROM rocker/geospatial:3.5.1
LABEL maintainer="" version=0.01

RUN echo "www-port=${RSTUDIO_PORT}" >> /etc/rstudio/rserver.conf

FROM a base image

FROM ubuntu:bionic
FROM ubuntu@sha256:6d0e0c26489e33f5a6f0020edface2727db9489744ecc9b4f50c7fa671f23c49
  • Required, usually first
  • Hashes are more reproducible

LABEL your image

LABEL maintainer="Ben Evans <>"
LABEL help="help message"
  • Good to at least specify a maintainer email

ENV variables

ENV PATH=/opt/my_app/bin:$PATH MY_DB=/opt/my_app/db ...
  • Available for subsequent layers, and at runtime

RUN commands

RUN apt-get update && \
    apt-get install openmpi-bin \
                    openmpi-common \
                    wget \
  • Always chain update and install together
  • One package per line, alphabetical

COPY files

COPY <host_src>... <container_dest>
  • Try to download them instead


Specify a default action.

CMD ["/opt/conda/bin/ipython", "notebook"]
  • Used for docker run and singularity run
  • Also ENTRYPOINT, they interact

Docker Image Development


  • Put most troublesome/changing parts at the end
  • Use git to version your Dockerfile
  • Keep build directory clean
  • Look into multi-stage builds

inspect an image

docker inspect ubuntu:bionic
docker inspect --format='{{index .RepoDigests 0}}' \

docker inspect -f '{{.Config.Entrypoint}} {{.Config.Cmd}}' \

build locally

cd /path/to/Dockerfile_dir/
docker build -t custom_ubuntu:testing .
  • Notice the period at the end for current directory
  • Use -t to tag your builds

image ls

docker image ls
rocker/rstudio  latest  879f3fd2bee9  39 hours ago  1.12GB
ubuntu          bionic  93fd78260bd1  13 days ago   86.2MB

run locally

docker run --rm custom_ubuntu:dev
docker run --rm -ti custom_ubuntu:testing /bin/bash
  • Use --rm to clean up container after it exits
  • Use --volume to bind directories to container
  • Use -e to set environment variables
    • -e USERID=$UID can avoid permission woes

push to cloud

export DOCKER_USERNAME="username"
docker login
docker tag custom_ubuntu:testing ${DOCKER_USERNAME}/my_image:v0.1
docker push ${DOCKER_USERNAME}/my_image

prune uneeded things

Clean up every now and again.

docker system prune
WARNING! This will remove:
        - all stopped containers
        - all networks not used by at least one container
        - all dangling images
        - all dangling build cache
Are you sure you want to continue? [y/N]

Docker Documentation

Install Docker on MacOS, Windows, and Linux

Ubuntu and CentOS Docker Hub pages

Dockerfile reference & best practices

docker CLI reference

Singularity docs

Install Singularity

Container definition reference

YCRC abridged Singularity docs