When do RcppEigen 'expressions' matter?

Some time ago I discovered a very surprising performance issue while working with the Rcpp package, and thought I’d share the example I discovered.

At the time, I was interested in computing the vector of leverages for a linear model (which you need for certain covariance matrices). This vector is constructed from the diagonal elements of the “hat” (or “influence”) matrix H, such that

$$ H = X (X^T X)^{-1} X^T$$

In base R, you might write the following function to compute this for you:

leverage <- function(X) {
    diag(X %*% solve(t(X) %*% X) %*% t(X))
}

In other words, it’s a simple matrix calculation: the exact kind of thing you might want to optimize with Rcpp.

Computing Leverages using Rcpp

For fun, I decided to implement this same computation in Rcpp code, specifically using the RcppEigen package, since having matrix classes is really nice in this case.

My first version looked like the following:

// [[Rcpp::depends(RcppEigen)]]
#include <RcppEigen.h>

using namespace Rcpp;
using Eigen::Map;
using Eigen::VectorXd;
using Eigen::MatrixXd;

typedef Map<MatrixXd> MapMatd;

// [[Rcpp::export]]
VectorXd leverage_cpp(const NumericMatrix& XX) {
    const MapMatd X(as<MapMatd>(XX));
    const MatrixXd H = X * (X.adjoint() * X).inverse() * X.adjoint();
    return H.diagonal();
}

But later I was feeling strongly about brevity, and opted for this instead:

// [[Rcpp::export]]
VectorXd leverage_cpp_short(const NumericMatrix& XX) {
    const MapMatd X(as<MapMatd>(XX));
    return (X * (X.adjoint() * X).inverse() * X.adjoint()).diagonal();
}

You can load these into R by using the sourceCpp function

library(RcppEigen)

#sourceCpp("code above")

And we can confirm that they produce the same results as the pure R version by using one of the built-in data sets, called attitude:

data("attitude", package = "datasets")
model <- lm(rating ~ ., data = attitude)
X <- model.matrix(model)

all.equal(leverage(X), leverage_cpp(X))

## [1] "names for target but not for current"

all.equal(leverage(X), leverage_cpp_short(X))

## [1] "names for target but not for current"

Notice that RcppEigen does not set the names of these vectors in the same way that diag does, so the vectors are not precisely identical.

Benchmarking R Against Rcpp Functions

The first thing that comes to mind when rewriting any R code using Rcpp is: how much faster is it? The brillaint microbenchmark package provides an easy way to test this:

library(microbenchmark)
library(ggplot2)
 
# Benchmark the results.
test <- microbenchmark(
    leverage_r         = leverage(X),
    leverage_cpp       = leverage_cpp(X),
    leverage_cpp_short = leverage_cpp_short(X),
    times = 1000L
)

test

## Unit: microseconds
##                expr    min     lq      mean median      uq      max neval
##          leverage_r 73.378 87.126 103.80609 91.880 100.933 3683.704  1000
##        leverage_cpp 10.542 13.503  18.59291 16.767  22.383   83.863  1000
##  leverage_cpp_short  7.857  9.182  12.03752 10.254  13.728  128.881  1000

The microbenchmark package also supports the ggplot2::autoplot API, if you prefer a plot:

library(ggplot2)

ggplot2::autoplot(test)

![plot of chunk unnamed-chunk-5](./images/r-7797a4c4/unnamed-chunk-5-1.png)

As expected, the Rcpp code is 5 to 10 times faster, even on this small data set. But there’s a small mystery here as well: why is the second piece of C++ code faster than the first?

If you’re operating in R world, it’s not all that surprising that two different pieces of code to do the same thing have different performance characteristics. This is, after all, the main reason that the microbenchmark package comes in handy.

But since C++ is a compiled language, I generally expect the compiler to figure out that two pieces of code that do the same thing ought to boil down to the same machine code. Why is that not the case here?

The answer is probably to do with how Eigen (the library that RcppEigen wraps) defers computation of an expression in intelligent ways. My guess is that Eigen figures out at compile time that there’s a really simple loop for computing this matrix expression, and pares your high-level description down to a simple statement. By breaking the computation in to two parts (as in the first C++ function), you hamper the ability of the library (not the compiler!) to help you out.

This whole experience has made me much more cautious about the C++ code that I write for R, to the point that I’m now prone to writing the function a number of different ways, and then doing some tests with microbenchmark to see what ends up being faster. Of course, it’s arguable that I should have been taking that kind of empirical approach to begin with.

If anyone can shed some light on this by looking at the assembly output (which is beyond my own abilities), I would be quite grateful.