Some time ago I discovered a very surprising performance issue while working
with the `Rcpp`

package, and
thought I’d share the example I discovered.

At the time, I was interested in computing the vector of
leverages for a linear
model (which you need for certain covariance matrices). This vector is
constructed from the diagonal elements of the “hat” (or “influence”) matrix *H*,
such that

$$ H = X (X^T X)^{-1} X^T$$

In base R, you might write the following function to compute this for you:

```
leverage <- function(X) {
diag(X %*% solve(t(X) %*% X) %*% t(X))
}
```

In other words, it’s a simple matrix calculation: the exact kind of thing you
might want to optimize with `Rcpp`

.

## Computing Leverages using Rcpp

For fun, I decided to implement this same computation in `Rcpp`

code,
specifically using the `RcppEigen`

package, since having matrix classes is really nice in this case.

My first version looked like the following:

```
// [[Rcpp::depends(RcppEigen)]]
#include <RcppEigen.h>
using namespace Rcpp;
using Eigen::Map;
using Eigen::VectorXd;
using Eigen::MatrixXd;
typedef Map<MatrixXd> MapMatd;
// [[Rcpp::export]]
VectorXd leverage_cpp(const NumericMatrix& XX) {
const MapMatd X(as<MapMatd>(XX));
const MatrixXd H = X * (X.adjoint() * X).inverse() * X.adjoint();
return H.diagonal();
}
```

But later I was feeling strongly about brevity, and opted for this instead:

```
// [[Rcpp::export]]
VectorXd leverage_cpp_short(const NumericMatrix& XX) {
const MapMatd X(as<MapMatd>(XX));
return (X * (X.adjoint() * X).inverse() * X.adjoint()).diagonal();
}
```

You can load these into R by using the `sourceCpp`

function

```
library(RcppEigen)
#sourceCpp("code above")
```

And we can confirm that they produce the same results as the pure R version by
using one of the built-in data sets, called `attitude`

:

```
data("attitude", package = "datasets")
model <- lm(rating ~ ., data = attitude)
X <- model.matrix(model)
all.equal(leverage(X), leverage_cpp(X))
```

```
## [1] "names for target but not for current"
```

`all.equal(leverage(X), leverage_cpp_short(X))`

```
## [1] "names for target but not for current"
```

Notice that `RcppEigen`

does not set the `names`

of these vectors in the same
way that `diag`

does, so the vectors are not precisely identical.

## Benchmarking R Against Rcpp Functions

The first thing that comes to mind when rewriting any R code using `Rcpp`

is:
how much faster is it? The brillaint `microbenchmark`

package provides an easy
way to test this:

```
library(microbenchmark)
library(ggplot2)
# Benchmark the results.
test <- microbenchmark(
leverage_r = leverage(X),
leverage_cpp = leverage_cpp(X),
leverage_cpp_short = leverage_cpp_short(X),
times = 1000L
)
test
```

```
## Unit: microseconds
## expr min lq mean median uq max neval
## leverage_r 73.378 87.126 103.80609 91.880 100.933 3683.704 1000
## leverage_cpp 10.542 13.503 18.59291 16.767 22.383 83.863 1000
## leverage_cpp_short 7.857 9.182 12.03752 10.254 13.728 128.881 1000
```

The `microbenchmark`

package also supports the `ggplot2::autoplot`

API, if you
prefer a plot:

```
library(ggplot2)
ggplot2::autoplot(test)
```

As expected, the `Rcpp`

code is 5 to 10 times faster, even on this small data
set. But there’s a small mystery here as well: why is the second piece of C++
code faster than the first?

If you’re operating in R world, it’s not all that surprising that two different
pieces of code to do the same thing have different performance characteristics.
This is, after all, the main reason that the `microbenchmark`

package comes in
handy.

But since C++ is a compiled language, I generally expect the compiler to figure out that two pieces of code that do the same thing ought to boil down to the same machine code. Why is that not the case here?

The answer is probably to do with how Eigen (the library that `RcppEigen`

wraps)
defers computation of an expression
in intelligent ways. My guess is that Eigen figures out at compile time that
there’s a really simple loop for computing this matrix expression, and pares
your high-level description down to a simple statement. By breaking the
computation in to two parts (as in the first C++ function), you hamper the
ability of the library (not the compiler!) to help you out.

This whole experience has made me much more cautious about the C++ code that I
write for R, to the point that I’m now prone to writing the function a number of
different ways, and then doing some tests with `microbenchmark`

to see what ends
up being faster. Of course, it’s arguable that I should have been taking that
kind of empirical approach to begin with.

If anyone can shed some light on this by looking at the assembly output (which is beyond my own abilities), I would be quite grateful.