Parallel Programming with Python

Thomas Langford, Ph.D.

Yale Center for Research Computing

Yale Wright Laboratory

November 11, 2020

Tools and Requirements

  • Language: Python 3.8
  • Modules: pandas, numpy, multiprocessing, PIL (for imamge processing), mpi4py, matplotlib, cupy (for GPU parallelism)
  • Jupyter notebook

Comment: Python 2 versus 3

How to Follow Along

Clone from GitHub

Navigate to the GitHub repository: https://github.com/ycrc/parallel_python

qr

Clone or download the zip file that contains this notebook and required data.

MyBinder

Launches a live AWS container with the required tools installed and ready to go:

https://mybinder.org/v2/gh/ycrc/parallel_python/master

Outline and Overview

  • Introduction to parallel concepts
  • Classes of parallel problems
  • Python implementations of parallel processesing
  • GPU Parallelism
  • Tools for further exploration

Introduction to parallel concepts

Serial Execution

Typical programs operate lines sequentially:

In [2]:
# Define an array of numbers
foo = np.array([0, 1, 2, 3, 4, 5])

# Define a function that squares numbers
def bar(x):
    return x * x

# Loop over each element and perform an action on it
for element in foo:

        # Print the result of bar
        print(bar(element))
0
1
4
9
16
25

The map function

A key tool that we will utilize later is called map. This lets us apply a function to each element in a list or array:

In [3]:
# (Very) inefficient way to define a map function
def my_map(function, array):
    # create a container for the results
    output = []

    # loop over each element
    for element in array:
        
        # add the intermediate result to the container
        output.append(function(element))
    
    # return the now-filled container
    return output
In [4]:
my_map(bar, foo)
Out[4]:
[0, 1, 4, 9, 16, 25]

Python has a helpfully provided a map function in the standard library:

In [5]:
list(map(bar, foo))

# NB: in python3 `map` is a generator, so we need to cast it to a list for this comparison
Out[5]:
[0, 1, 4, 9, 16, 25]

The built-in map function is much more flexible and featured than ours, so it's best to use that one instead.

Parallel Workers

In the example we showed before, no step of the map call depend on the other steps.

Rather than waiting for the function to loop over each value, we could create multiple instances of the function bar and apply it to each value simultaneously.

This is achieved with the multiprocessing module and a pool of workers.

The Mutiprocessing module

The multiprocessing module has a number of functions to help simplify parallel processing.

One such tool is the Pool class. It allows us to set up a group of processes to excecute tasks in parallel. This is called a pool of worker processes.

First we will create the pool with a specified number of workers. We will then use our map utility to apply a function to our array.

In [6]:
import multiprocessing

# Create a pool of processes
with multiprocessing.Pool(processes=6) as pool:
    # map the `np.square` function on our `foo` array
    result = pool.map(np.square, foo)

# output the results
print(result)
[0, 1, 4, 9, 16, 25]

The difference here is that each element of this list is being handled by a different process.

To show how this is actually being handled, let's create a new function:

In [7]:
def parallel_test(x):
    # print the index of the job and it's process ID number
    print(f"x = {x}, PID = {os.getpid()}\n")

Now we can map this function on the foo array from before. First with the built-in map function:

In [8]:
list(map(parallel_test, foo));
x = 0, PID = 150565

x = 1, PID = 150565

x = 2, PID = 150565

x = 3, PID = 150565

x = 4, PID = 150565

x = 5, PID = 150565

We see that each step is being handled by the same process and are excecuted in order.

Now we try the same process using multiprocessing:

In [9]:
with multiprocessing.Pool(processes=6) as pool:
    result = pool.map(parallel_test, foo)
x = 2, PID = 150587
x = 3, PID = 150588
x = 1, PID = 150586
x = 4, PID = 150589
x = 5, PID = 150590





x = 0, PID = 150585

Two things are worth noting:

  1. Each element is processed by a different PID
  2. The tasks are not executed in order!

Key Take-aways

  1. The map function is designed to apply the same function to each item in an iterator
  2. In serial processing, this works like a for-loop
  3. Parallel execution sets up multiple worker processes that act separately and simultaneously

Embarassingly parallel problems

Many problems can be simply converted to parallel execution with the multiprocessing module.

Example 1: Monte Carlo Pi Calculation

  • Run multiple instances of the same simulation with different random number generator seeds
  • Define a function to calculate pi that takes the random seed as input, then map it on an array of random seeds
In [10]:
fig, ax = plt.subplots(nrows=1,ncols=1, figsize=(5,5))
x = np.linspace(0,1,100)
plt.fill_between(x, np.sqrt(1-x**2),0,alpha=0.1)
plt.xlim(0,1.03);plt.ylim(0,1.03);plt.xlabel('X');plt.ylabel('Y');

x = np.random.random(size=100)
y = np.random.random(size=100)

plt.plot(x,y,marker='.',linestyle='None');
2020-11-18T10:37:36.574402 image/svg+xml Matplotlib v3.3.2, https://matplotlib.org/
In [11]:
def pi_mc(seed):
    num_trials = 500000
    counter = 0
    np.random.seed(seed)
    
    for j in range(num_trials):
        x_val = np.random.random_sample()
        y_val = np.random.random_sample()

        radius = x_val**2 + y_val**2

        if radius < 1:
            counter += 1
            
    return 4*counter/num_trials

Serial vs Parallel

In [12]:
%timeit pi_mc(1)
1 loop, best of 5: 319 ms per loop
In [13]:
seed_array = list(range(4))
%timeit list(map(pi_mc, seed_array))
1 loop, best of 5: 1.27 s per loop
In [14]:
%%timeit

with multiprocessing.Pool(processes=4) as pool:
    result = pool.map(pi_mc, seed_array)
1 loop, best of 5: 422 ms per loop

While the serial execution scales up linearly (~4x longer than one loop), the parallel execution doesn't quite reach the single loop performance. There is some overhead in setting up the threads that needs to be considered.

Example 2: Processing multiple input files

Say we have a number of input files, like .jpg images, that we want to perform the same actions on, like rotate by 180 degrees and convert to a different format.

We can define a function that takes a file as input and performs these actions, then map it on a list of files.

In [15]:
# import python image library functions
from PIL import Image

from matplotlib.pyplot import imshow
%matplotlib inline
In [16]:
#Read image
im = Image.open( './data/kings_cross.jpg' )
#Display image
im
Out[16]:
In [17]:
im.rotate(angle=180)
Out[17]: