April 22, 2020

Outline

What we’ll cover today

  • Intro & general tips
  • R performance tips: patterns to use and avoid
  • Vectors & Matrices
  • Memory management, large tables
  • Profiling and Benchmarking
  • Loops

General advice

  • If you don’t understand something, try some experiments
  • Browse the documentation, learn its jargon and quirks
  • Break your code into functions when appropriate
  • Use functions to reduce the need for global variables
  • Write tests for your functions
  • Use git to keep track of changes
  • Learn to distribute your code as a package

Is R slow?

Is R slow?

Sometimes, but well written R programs are usually fast enough.

  • Designed to make programming easier
    • Speed was not the primary design criteria
  • Slow programs often a result of bad programming practices or not understanding how R works
  • There are various options for calling C or C++ functions from R

R performance before you start

Premature optimization is the root of all evil – Donald Knuth

  • Become familiar with R’s vector and apply functions
  • Consider specialized performance packages
    • E.g. data.table, bigmemory, dplyr, RSQLite, snow, multicore, parallel
  • Consider using external optimizations (OpenBLAS/MKL)
  • Don’t use an R GUI when performance is important

R tuning advice

  • Be methodical but don’t get carried away with micro-optimizations
  • Use monitoring tools such as top, Activity Monitor, Task Manager
  • Use vector functions
  • Avoid duplication of objects
  • Pre-allocate result vectors
  • Profile your code and run benchmarks
  • Byte-compile with cmpfun, or call a compiled language (e.g. C, C++)

Vectors & Matrices

Vectors are central to good R programming

  • Fast, since implemented as a single C or Fortran function
  • Concise and easy to read
  • Can often replace for loops
  • However, heavy use can result in high memory usage

Useful vector functions

  • math operators: +, -, *, /, ^, %/%, %%
  • math functions: abs, sqrt, exp, log, log10, cos, sin, tan, sum, prod
  • logical operators: &, |, !
  • relational operators: ==, !=, <, >, <=, >=
  • string functions: nchar, tolower, toupper, grep, sub, gsub, strsplit
  • conditional function: ifelse (pure R code)
  • misc: which, which.min, which.max, pmax, pmin, is.na, any, all, rnorm, runif, sprintf, rev, paste, as.integer, as.character

Dynamic features of vectors

initialize x and fill it with zeros

n <- 10
x <- double(n)
x
 [1] 0 0 0 0 0 0 0 0 0 0

Extend x by assignment

x[15] <- 100
x
 [1]   0   0   0   0   0   0   0   0   0   0  NA  NA  NA  NA 100

Dynamic features of vectors

Resize/truncate x

length(x) <- 5
x
[1] 0 0 0 0 0

rnorm vector function

x <- rnorm(10)
x
 [1] -0.4056235  1.3891037  0.2174550  2.3246995  1.2380935 -1.4947794
 [7]  0.2385808 -0.5493807 -1.3513489 -0.2154774

Vector indexing

Extract subvector

x[3:6]
[1]  0.217455  2.324700  1.238094 -1.494779

Extract elements using result of vector relational operation

x[x > 0]
[1] 1.3891037 0.2174550 2.3246995 1.2380935 0.2385808

Vector indexing

You can also use an index to assign values

x[is.na(x)] <- 0

Matrix indexing

Make a new matrix

m <- matrix(rnorm(100), 10, 10)

Extract 2X3 submatrix (non-consecutive columns)

m[3:4, c(5,7,9)]
           [,1]       [,2]      [,3]
[1,] -2.6358462 -0.1964796 -1.439384
[2,]  0.9059187 -0.8001697 -1.865229

Matrix indexing

Extract arbitrary elements as vector

m[cbind(3:6, c(2,4,6,9))]
[1]  0.2120675  0.5209789 -1.1947893  1.0693606

Extract elements using result of vector relational operation

head(m[m > 0])
[1] 0.6022579 0.3916312 1.9208050 0.1392003 0.2249831 0.6924349

Matrix indexing

You can also use a matrix index to assign values

m[is.na(m)] <- 0

Memory Considerations

Memory in R

  • Avoid duplicating objects, especially big ones or those in loops
  • Look into memory efficient libraries
  • Look into other formats to store data

Beware of object duplication


  • R uses pass by value semantics for function arguments
  • In general, this requires making copies of objects
    • Functions must return modified object
  • R tries to avoid copying unless necessary

Example of object duplication

tracemem reports when an object is duplicated, which is very useful for debugging performance problems.

In this example, object duplication is expected and helpful.

x <- double(10)
tracemem(x)
[1] "<0x55a6d0c827d8>"
y <- x
y[1] <- 10
tracemem[0x55a6d0c827d8 -> 0x55a6d0c81cd8]: eval eval withVisible withCallingHandlers handle timing_fn evaluate_call <Anonymous> evaluate in_dir block_exec call_block process_group.block process_group withCallingHandlers process_file <Anonymous> <Anonymous> 

Example of object duplication

.Internal(inspect(x))
@55a6d0c827d8 14 REALSXP g0c5 [NAM(7),TR] (len=10, tl=0) 0,0,0,0,0,...
.Internal(inspect(y))
@55a6d0c81cd8 14 REALSXP g0c5 [NAM(1),TR] (len=10, tl=0) 10,0,0,0,0,...
x[1] <- 50
tracemem[0x55a6d0c827d8 -> 0x55a6d04f5a78]: eval eval withVisible withCallingHandlers handle timing_fn evaluate_call <Anonymous> evaluate in_dir block_exec call_block process_group.block process_group withCallingHandlers process_file <Anonymous> <Anonymous> 

Example of unexpected object duplication

Passing matrix to non-primitive function such as nrow poisons for duplication

> m <- matrix(0, 3, 3)
> tracemem(m)
[1] "<0x7fc168d29df0>"
> m[1,1] <- 1
> nrow(m)
[1] 3
> m[1,1] <- 2
tracemem[0x7fc168d29df0 -> 0x7fc168d21f58]:

Splitting problem into smaller tasks

R makes it easy to read entire data sets in one operation, but reading it in parts can be much more efficient.

  • Splitting the problem into smaller tasks is compatible with parallel computing techniques
  • The foreach & iterators packages provide tools to split inputs into smaller pieces
  • Use Linux commands (split, awk, etc) or other languages (e.g. Python to preprocess
    • Split data files and remove unneeded fields

Beware of read.table

The read.table function is commonly used for reading data files, but it can be very slow on large files

  • Use of the colClasses argument can improve performance
  • colClasses can be used to skip a column, using less memory
  • It can be faster to read a file in smaller chunks using the nrows argument
  • The scan function can be faster
  • Consider using similar functions, such as readr, data.table, sqldf, and bigmemory

Example csv

head -n 3 sample_people.csv
seq,name/first,name/last,age,street,city,state,zip,dollar,pick,date
1,Leona,Burton,61,Afjoj Loop,Dibemato,VT,57577,273.47,YELLOW,05/26/1946
2,Nina,Patrick,48,Tana Square,Namumene,MT,55670,6774.89,GREEN,02/14/1936

colClasses Comparison

Read in 99,999 rows of sample data

system.time(read.csv('sample_people.csv'))
   user  system elapsed 
  1.300   0.021   1.320 
system.time(read.csv("sample_people.csv", header=TRUE, colClasses = 
    c("integer", "character", "character", "integer", "character", "character", 
      "character", "character", "numeric", "factor", "character")))
   user  system elapsed 
  0.248   0.000   0.248 

An alternate universe

readr

  • Part of tidyverse
  • Generally faster than read.table
library(readr)
options(readr.num_columns = 0)
system.time(read_csv("sample_people.csv"))
   user  system elapsed 
  0.203   0.008   0.211 

More alternative libraries

data.table

  • Like data.frame, but without row labels

bigmemory

  • Defines mutable matrix objects that aren’t automatically duplicated

big.matrix

  • Can use a backing file that is memory mapped

biganalytics

  • apply, biglm, bigglm, bigkmeans, colmax

Save data in binary format

Saving data in a binary format can make it much faster to read the data later. There are a variety of functions available to do that:

  • save/load
  • writeBin/readBin
  • write.big.matrix/read.big.matrix (from bigmemory)

SQLite in R

Consider putting data into an SQLite database.

  • RSQLite package is easy to use
  • Easy to get subsets of the data into a data frame
  • Command line tool very useful for experimenting with queries
  • Database can be accessed from many different languages
  • The sqldf package may be useful also
  • Can be quite slow

Speed up your Basic Linear Algebra Subprograms

Vanilla vs OpenBLAS

Profiling and Benchmarking

Find your marble, then start chiseling

Profiling vs Benchmarking

Profiling

  • If you’ve decided you need your code to perform better, profile first
  • Profiling helps isolate hot spots
  • Time spent here will likely yeild best ROI

Benchmarking

  • With hot spots in hand, examine the code and propose alternatives
  • While ensuring the reesults are the same, ask which performs best

Profiling

R profiling tools

R has builtin support for profiling, but there are additional

packages available:

  • proftools
  • profvis (RStudio integration)

Basic profiling with proftools

f <- function(a) { g1(a) + g2(2 * a) }

g1 <- function(a) { h(a) }

g2 <- function(a) { sqrt(a) }

h <- function(a) {
  b <- double(length(a))
  for (i in seq_along(a)) {
    b[i] <- sqrt(a[i])
  }
  b
}

Basic profiling with proftools

x <- 1:1000000
Rprof('prof.out')
for (i in 1:10) {
  y <- f(x)
}
Rprof(NULL)
summaryRprof("prof.out")$by.self
         self.time self.pct total.time total.pct
"h"           1.78       89       1.86        93
"g2"          0.10        5       0.10         5
"double"      0.08        4       0.08         4
"f"           0.02        1       1.98        99
"paste"       0.02        1       0.02         1

Basic profiling with profvis

Can also do this in RStudio, e.g. Profile -> Start Profile

profvis({
for (i in 1:10) {
  y <- f(x)
}
})

Benchmarking

Knowing where code is slow via profiling, use benchmarking tools

  • Put problem code into a functions
  • Benchmark different versions of code for comparison
  • system.time is useful for long running code
  • microbenchmark package is useful for analyzing short running code

Loops

Wherefore art thou performant… or not?

Are for loops in R slow?

  • Not all for loops are bad
  • Most common mistakes involve for loops.
  • The classic mistake is not preallocating a result vector.

Example 1

Create a vector of length n where all values are x

Example 1: a bad for loop

bad.for <- function(n,x) {
  result <- NULL
  for (i in 1:n) {
    result[i] <- x
  }
  result
}
  • Large number of iterations
  • Tiny amount of computation per iteration
  • Item result vector is reallocated and copied on each iteration
  • Triggering garbage collection periodically

Example 1: a better for loop

okay.for <- function(n,x) {
  result <- double(n)
  for (i in 1:n) {
    result[i] <- x
  }
  result
}

Improvement over the previous example, but it’s still slow because of the many tiny iterations.

Example 1: a puzzle loop

strange.for <- function(n, x) {
  result <- NULL
  for (i in n:1) {
    result[i] <- x
  }
  result
}

Is this loop faster or slower than the previous two?

Example 1: using a vector function

# use of vector assignment
vector.assn <- function(n, x) {
  result <- double(n)
  result[] <- x
  result
}

We can also use vector assignment

Example 1: using R built-in function

Example 1: testing

Make sure functions produce identical output

n <- 10000
x <- 7
bad.result        <- bad.for(n, x)
okay.result       <- okay.for(n, x)
strange.result    <- strange.for(n, x)
vector.result     <- vector.assn(n, x)
built.result      <- built.in(n, x)
c(identical(bad.result, okay.result),
identical(bad.result, strange.result),
identical(bad.result, vector.result),
identical(bad.result, built.result))
[1] TRUE TRUE TRUE TRUE

Example 1: benchmark results

res <- microbenchmark(bad=bad.for(n,x), okay=okay.for(n,x), strange=strange.for(n,x),
                      vector=vector.assn(n,x), builtin=built.in(n,x))
kable(summary(res, unit="relative"))
expr min lq mean median uq max neval
bad 199.156045 196.233778 93.472189 186.788741 160.062897 6.475202 100
okay 95.191405 92.625413 43.237866 87.084876 74.758367 3.359628 100
strange 95.790237 92.912893 42.203484 87.622573 75.647121 1.327639 100
vector 2.993742 2.930285 2.186368 2.798131 2.424152 1.573075 100
builtin 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 100

Example 1: benchmark plot

autoplot(res)

Example 2

Create a matrix with n rows and x columns

Each value in the matrix is sampled from normal distribution, \(\mu=0 , \sigma=1\)

Example 2: another bad for loop

Generate a matrix of values sampled from normal distribution n rows, x columns

bad.norm <- function(n,x) {
  m <- NULL
  for (i in 1:n) {
    m <- rbind(m, rnorm(x))
  }
  m
}

Example 2: preallocation of result vector

Just like before, we build a matrix and populate it with a for loop

ok.norm <- function(n,x) {
  m <- matrix(0, nrow=n, ncol=x)
  for (i in 1:n) {
    m[i,] <- rnorm(100)
  }
  m
}

Example 2: use lapply and rbind

don’t need to preallocate matrix

lapply.norm <- function(n,x) {
  do.call('rbind', lapply(1:n, function(i) rnorm(x)))
}

Example 2: Compute all rows at once

best.norm <- function(n,x) {
  m <- rnorm(x * n)
  dim(m) <- c(x, n)
  t(m)
}

Example 2: Test data

n <- 600
x <- 100
# Verify correct results
set.seed(123); bad.result <- bad.norm(n,x)
set.seed(123); ok.result <- ok.norm(n,x)
set.seed(123); lapply.result <- lapply.norm(n,x)
set.seed(123); best.result <- best.norm(n,x)

c(identical(bad.result, ok.result),
identical(bad.result, lapply.result),
identical(bad.result, best.result))
[1] TRUE TRUE TRUE

Example 2: benchmarks

res <- microbenchmark(bad=bad.norm(n,x), ok=ok.norm(n,x),
                        lapply=lapply.norm(n,x), best=best.norm(n,x))
kable(summary(res, unit="relative"))
expr min lq mean median uq max neval
bad 12.046303 12.618224 12.047673 12.733923 12.155410 13.131735 100
ok 1.470018 1.501110 1.467030 1.489001 1.469674 1.136864 100
lapply 1.466219 1.471615 1.498865 1.486366 1.489676 1.103843 100
best 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 100

Example 2: benchmark plot

autoplot(res)

Just in Time (JIT) compiling your functions

Results in your function as bytecode

enableJIT(0)
[1] 3
fun.for <- function(x, seed=1423) {
  set.seed(seed)
  y <- double(length(x))
  for (i in seq_along(x)) {
    y[i] <- rnorm(1) * x[i]
  }
  y
}
fun.for.compiled <- cmpfun(fun.for)

Benchmarking JIT

x <- 10000
res <- microbenchmark(fun.for=fun.for(x),
                      fun.for.compiled=fun.for.compiled(x))
kable(summary(res, unit="relative"))
expr min lq mean median uq max neval
fun.for 1.089921 1.089702 1.212659 1.072328 1.069257 1.89812 100
fun.for.compiled 1.000000 1.000000 1.000000 1.000000 1.000000 1.00000 100

Benchmarking JIT

autoplot(res)

END

Resources