Some time ago I discovered a very surprising performance issue while working
with the Rcpp
package, and
thought I’d share the example I discovered.
At the time, I was interested in computing the vector of leverages for a linear model (which you need for certain covariance matrices). This vector is constructed from the diagonal elements of the “hat” (or “influence”) matrix H, such that
$$ H = X (X^T X)^{-1} X^T$$
In base R, you might write the following function to compute this for you:
leverage <- function(X) {
diag(X %*% solve(t(X) %*% X) %*% t(X))
}
In other words, it’s a simple matrix calculation: the exact kind of thing you
might want to optimize with Rcpp
.
Computing Leverages using Rcpp
For fun, I decided to implement this same computation in Rcpp
code,
specifically using the RcppEigen
package, since having matrix classes is really nice in this case.
My first version looked like the following:
// [[Rcpp::depends(RcppEigen)]]
#include <RcppEigen.h>
using namespace Rcpp;
using Eigen::Map;
using Eigen::VectorXd;
using Eigen::MatrixXd;
typedef Map<MatrixXd> MapMatd;
// [[Rcpp::export]]
VectorXd leverage_cpp(const NumericMatrix& XX) {
const MapMatd X(as<MapMatd>(XX));
const MatrixXd H = X * (X.adjoint() * X).inverse() * X.adjoint();
return H.diagonal();
}
But later I was feeling strongly about brevity, and opted for this instead:
// [[Rcpp::export]]
VectorXd leverage_cpp_short(const NumericMatrix& XX) {
const MapMatd X(as<MapMatd>(XX));
return (X * (X.adjoint() * X).inverse() * X.adjoint()).diagonal();
}
You can load these into R by using the sourceCpp
function
library(RcppEigen)
#sourceCpp("code above")
And we can confirm that they produce the same results as the pure R version by
using one of the built-in data sets, called attitude
:
data("attitude", package = "datasets")
model <- lm(rating ~ ., data = attitude)
X <- model.matrix(model)
all.equal(leverage(X), leverage_cpp(X))
## [1] "names for target but not for current"
all.equal(leverage(X), leverage_cpp_short(X))
## [1] "names for target but not for current"
Notice that RcppEigen
does not set the names
of these vectors in the same
way that diag
does, so the vectors are not precisely identical.
Benchmarking R Against Rcpp Functions
The first thing that comes to mind when rewriting any R code using Rcpp
is:
how much faster is it? The brillaint microbenchmark
package provides an easy
way to test this:
library(microbenchmark)
library(ggplot2)
# Benchmark the results.
test <- microbenchmark(
leverage_r = leverage(X),
leverage_cpp = leverage_cpp(X),
leverage_cpp_short = leverage_cpp_short(X),
times = 1000L
)
test
## Unit: microseconds
## expr min lq mean median uq max neval
## leverage_r 73.378 87.126 103.80609 91.880 100.933 3683.704 1000
## leverage_cpp 10.542 13.503 18.59291 16.767 22.383 83.863 1000
## leverage_cpp_short 7.857 9.182 12.03752 10.254 13.728 128.881 1000
The microbenchmark
package also supports the ggplot2::autoplot
API, if you
prefer a plot:
library(ggplot2)
ggplot2::autoplot(test)
As expected, the Rcpp
code is 5 to 10 times faster, even on this small data
set. But there’s a small mystery here as well: why is the second piece of C++
code faster than the first?
If you’re operating in R world, it’s not all that surprising that two different
pieces of code to do the same thing have different performance characteristics.
This is, after all, the main reason that the microbenchmark
package comes in
handy.
But since C++ is a compiled language, I generally expect the compiler to figure out that two pieces of code that do the same thing ought to boil down to the same machine code. Why is that not the case here?
The answer is probably to do with how Eigen (the library that RcppEigen
wraps)
defers computation of an expression
in intelligent ways. My guess is that Eigen figures out at compile time that
there’s a really simple loop for computing this matrix expression, and pares
your high-level description down to a simple statement. By breaking the
computation in to two parts (as in the first C++ function), you hamper the
ability of the library (not the compiler!) to help you out.
This whole experience has made me much more cautious about the C++ code that I
write for R, to the point that I’m now prone to writing the function a number of
different ways, and then doing some tests with microbenchmark
to see what ends
up being faster. Of course, it’s arguable that I should have been taking that
kind of empirical approach to begin with.
If anyone can shed some light on this by looking at the assembly output (which is beyond my own abilities), I would be quite grateful.