Fiddling Around with DEA Models

Measuring performance is a tricky business. Two of the most annoying drawbacks of applying regression analysis (as used in most simple econometric approaches) to these kinds of problems are the requirement to specify a functional form for the production function (which we can wave away by admitting that our results are only a “best linear approximation” that work around the mean) and the inability to measure the impact of inputs on several outputs similtaneously.

These are two problems that data envelopment analysis (DEA) tries to address. In essence, it looks at the relative ability of actors (or “decision-making units”) to turn inputs into outputs, and then measures how far away from the “frontier” of the most efficient actors the rest of the lot are. This avoids making many of the very restrictive assumptions (Cobb-Douglas, anyone?) we usually employ when discussing production functions.

All of this sounded pretty interesting, so I thought I’d try out some of these models. The FEAR package for the R programming language seems like a popular tool to for this kind of analysis; however, I also discovered the newer Benchmarking package, which has the added benefit of being free even for commercial use.

… but mostly it was because FEAR does not yet compile properly on OS X for R vesion 3.1.0+, and I didn’t want to try and patch it. So:

install.packages("Benchmarking")

It wasn’t entirely clear how to actually check that I was computing efficiencies correctly, so I pulled some data out of a textbook¹ that also provided the “answer”:

dmu	input1	input2	output1	output2	ccr
1	0.7150	1.7996	0.7618	1.2239	0.432
2	1.0036	1.2616	1.5441	0.6716	0.709
3	0.0126	0.9276	1.3270	0.3433	1.000
4	0.4339	0.7792	0.3600	0.5075	0.419
5	1.7455	0.2783	1.0560	1.0000	1.000
6	1.5061	0.9276	0.7990	0.5821	0.448
7	0.5313	1.2059	1.1862	1.7612	0.952
8	1.8513	0.7978	1.0885	1.9104	0.865
9	1.9796	1.6698	1.6517	1.1493	0.570
10	0.2211	0.3525	0.2258	0.8507	1.000

And then ran a very simple script to see if I could reproduce the results of a DEA model with constant returns to scale (this is often called the CCR model, hence the name of the last column above):

library(Benchmarking)

test1 = read.table("dea-test1.tsv", header = T)
x = matrix(c(test1$input1, test1$input2), nrow = 10, ncol = 2)
y = matrix(c(test1$output1, test1$output2), nrow = 10, ncol = 2)

result = dea(x, y, RTS = "crs", ORIENTATION = 1)
print(result$eff)

## [1] 0.4319693 0.7091446 1.0000000 0.4194153 1.0000000 0.4485119 0.9519665 0.8653230 0.5701125 1.0000000

Which seems to match up with the final column in the example data nicely.

A few minor details about the script: the dea function provided by the Benchmarking package defaults to much more nuanced assumptions about returns to scale, etc, so you need to set the RTS argument explicitly for simpler models; further, the ORIENTATION = 1 arguments request the computation of output-based efficiencies. I’ve obviously saved the tab-separated data above as dea-test1.tsv.

Specifically, this is from Table 3 in Sarkis, Joseph. 2002. “Preparing Your Data for DEA” in Productivity Analysis in the Service Sector with Data Envelopment Analysis, 2nd Edition, ed. Necmi Avkiran. ↩︎