Measuring performance is a tricky business. Two of the most annoying drawbacks of applying regression analysis (as used in most simple econometric approaches) to these kinds of problems are the requirement to specify a functional form for the production function (which we can wave away by admitting that our results are only a “best linear approximation” that work around the mean) and the inability to measure the impact of inputs on several outputs similtaneously.
These are two problems that data envelopment analysis (DEA) tries to address. In essence, it looks at the relative ability of actors (or “decision-making units”) to turn inputs into outputs, and then measures how far away from the “frontier” of the most efficient actors the rest of the lot are. This avoids making many of the very restrictive assumptions (Cobb-Douglas, anyone?) we usually employ when discussing production functions.
All of this sounded pretty interesting, so I thought I’d try out some of these models. The FEAR package for the R programming language seems like a popular tool to for this kind of analysis; however, I also discovered the newer Benchmarking package, which has the added benefit of being free even for commercial use.
… but mostly it was because FEAR does not yet compile properly on OS X for R vesion 3.1.0+, and I didn’t want to try and patch it. So:
It wasn’t entirely clear how to actually check that I was computing efficiencies correctly, so I pulled some data out of a textbook1 that also provided the “answer”:
dmu input1 input2 output1 output2 ccr 1 0.7150 1.7996 0.7618 1.2239 0.432 2 1.0036 1.2616 1.5441 0.6716 0.709 3 0.0126 0.9276 1.3270 0.3433 1.000 4 0.4339 0.7792 0.3600 0.5075 0.419 5 1.7455 0.2783 1.0560 1.0000 1.000 6 1.5061 0.9276 0.7990 0.5821 0.448 7 0.5313 1.2059 1.1862 1.7612 0.952 8 1.8513 0.7978 1.0885 1.9104 0.865 9 1.9796 1.6698 1.6517 1.1493 0.570 10 0.2211 0.3525 0.2258 0.8507 1.000
And then ran a very simple script to see if I could reproduce the results of a DEA model with constant returns to scale (this is often called the CCR model, hence the name of the last column above):
library(Benchmarking) test1 = read.table("dea-test1.tsv", header = T) x = matrix(c(test1$input1, test1$input2), nrow = 10, ncol = 2) y = matrix(c(test1$output1, test1$output2), nrow = 10, ncol = 2) result = dea(x, y, RTS = "crs", ORIENTATION = 1) print(result$eff)
##  0.4319693 0.7091446 1.0000000 0.4194153 1.0000000 0.4485119 0.9519665 0.8653230 0.5701125 1.0000000
Which seems to match up with the final column in the example data nicely.
A few minor details about the script: the
dea function provided by the
Benchmarking package defaults to much more nuanced assumptions about returns to
scale, etc, so you need to set the
RTS argument explicitly for simpler models;
ORIENTATION = 1 arguments request the computation of output-based
efficiencies. I’ve obviously saved the tab-separated data above as
Specifically, this is from Table 3 in Sarkis, Joseph. 2002. “Preparing Your Data for DEA” in Productivity Analysis in the Service Sector with Data Envelopment Analysis, 2nd Edition, ed. Necmi Avkiran. ↩︎