Choropleth Maps with R and ggplot2

5 September 2014 // tagged

data
r
visualization

This post is meant to be a short intro on how to create visualizations like the following using R and ggplot2:

![intro-map](/images/r-f380bbe4/final-map-1.png)

Update (February 6, 2017): I’ve updated the content of this post to be much more modern, taking advantage of developments in the spatial package ecosystem and in the capabilities of ggplot2.

Continue Reading →

Fiddling Around with DEA Models

12 May 2014 // tagged

data
econ
r

Measuring performance is a tricky business. Two of the most annoying drawbacks of applying regression analysis (as used in most simple econometric approaches) to these kinds of problems are the requirement to specify a functional form for the production function (which we can wave away by admitting that our results are only a “best linear approximation” that work around the mean) and the inability to measure the impact of inputs on several outputs similtaneously. Continue Reading →

Managing Complex Research Workflows with Make

26 April 2014 // tagged

data
econ
reproducible research

If you’re doing any kind of empirical work in Economics, you probably have a huge, messy folder containing a mix of

Data files (.csv, .dta, .xlsx, etc.) in various states of merge-ness and cleanliness.
Scripts for creating graphs & figures, producing summary statistics, and computing models. Probably written for Stata, R, or the Pandas data stack¹.
Files containing written work. These are usually .doc(x) files, but I’ve seen lots of LaTeX lately as well, and being a plain-text format, this is a huge boon to reproducible research.

A really simple research workflow (start with data, make some figures, make some summary statistics, and run some models) might look like the following:

![An Econ Workflow](./images/econ-workflow.png)

But of course that’s not clear when looking at the .zip file you send your coauthor.

Continue Reading →

Parsing tape data with Python

18 February 2014 // tagged

data
python

Recently I had to use an older dataset that hadn’t been nicely sanitized into something that Stata or Pandas could understand. In particular, I was working with the NHANES I Epidemiologic Follow-up Study1, which has data available only in the original tape format from the 1980s and early 1990s. An individual record from one of the smaller files looks like this (scroll right for the full effect): 9220809 511 12 0112 996102410232091442411 1 2 9 486 21400520 230031142750215185031486 0 03 42750486 051850 Where every number might be a single variable or part of a multi-column variable. Continue Reading →

Moving to Pelican

12 February 2014 // tagged

meta
python

I recently moved the site over to Pelican. Although I liked Octopress, it wasn’t working exactly like I wanted it to, and I’m not comfortable enough working with Ruby to modify it. Since Python has become my go-to language for most things these days, it made sense to move to what seems to be the most popular static site generator written in that language. For posteriority’s sake, the following is a guide based on the steps I took to get it working. I’ve also included most of my configuration files and a short program for creating new posts.

Continue Reading →