Choropleth Maps with R and ggplot2

This post is meant to be a short intro on how to create visualizations like the following using R and ggplot2:

intro-map

Update (February 6, 2017): I’ve updated the content of this post to be much more modern, taking advantage of developments in the spatial package ecosystem and in the capabilities of ggplot2.

To get through this tutorial, you’ll need the following packages:

suppressMessages({
    library(rgeos)
    library(maptools)
    library(ggplot2)
})

The first thing you need to get your hands on is some representation of the polygons on a map. There are many formats for these polygons, but one of the more common ones is the .shp format used by ArcGIS. The maptools package enables opening and manipulating objects created from this format, so you’ll need to get your hands on both that and its prerequisite, rgeos.

Update (February 6, 2017): In the past I recommended gpclib, but the rgeos library is now the preferred approach. It also has a more permissive license.

A nice scale to work at in Canada is the level of the forward sortation areas (FSAs) defined by the first three digits of a postal code. Until very recently the exact borders of these areas was data you had to pay for, but now Statistics Canada makes it available on the Census website, here.

Fair warning: it’s quite a large file, and will take some time to downlaod. Of course, all of the FSAs in Canada is a bit too much to deal with, so I’ll focus just on those that encompass the city of Toronto, which all begin with the letter “M”.

# fsas <- maptools::readShapeSpatial("path/to/gfsa000b11a_e.shp")

toronto.fsas <- fsas[substr(fsas$CFSAUID, 1, 1) == 'M',]

# The shapefile takes up quite a lot of memory, and we no longer need it all,
# so you can delete it if you like:
rm(fsas)

We have to turn the object that maptools produces into something that can be plotted in the ggplot2 paradigm, which reguires a data.frame object. To do this, we can use the ggplot2::fortify() function:

data <- ggplot2::fortify(toronto.fsas, region = "CFSAUID")

# Turn the generated "id" chr column into a correctly-labelled factor variable
data$fsa <- factor(data$id)
data$id <- NULL

Toronto is actually pretty great about making lots of interesting data available online. They provide the number of licensed dogs and cats in the city by FSA here, although it needs to be cleaned up a little before you can use it directly. It needs to be merged into the map data:

# pets <- read.csv("path/to/pets.csv")

# Merge the map data and the pet data into something we'll plot
plot.data <- merge(data, pets, by = "fsa")

Now we can use ggplot2 to plot the polygons, and fill them with a gradient based on the number of dogs. Note that you must group the polygons, otherwise they might not be drawn out in the correct order (try omitting it and see). Another key parameter is the coord_equal() coordinate modification: since we’re dealing with map data, if the scaling of the x and y axis are not the same the map will be arbitrarily distorted.

I’m glossing over the details of map projections here — in this case, the city is small enough that a latitude = longitude presentation will look just fine.

ggplot(plot.data, aes(x = long, y = lat, group = group, fill = dog)) +
    geom_polygon() +
    coord_equal()

plot of chunk first-map

It’s that easy! Of course, the default ggplot2 theme is really meant for presenting data, and might not look that great with a map. The built-in theme_void() will strip away most of this extraneous framing, although I tend to tweak it a little as well.

Update (February 6, 2017): Bin your spatial data, and use viridis.

I strongly believe that you usually want to bin data for choropleth maps, since it can be very difficult to judge fine colour differences. The ggplot2::cut_number() function will find bins roughly equal in size, which is a good place to start.

Unfortunately, the built-in colours for discrete scales are horrible not only for a final presentation, but also for your own exploratory work. You’re better off using one of the ColorBrewer palettes (available through the built-in scale_fill_brewer() function), and you should also check out the carefully tuned palettes from the viridis package.

Now let’s try this again, with these changes:

library(viridis)

ggplot(plot.data, aes(x = long, y = lat, group = group,
       fill = cut_number(dog, 5))) +
    geom_polygon(color = "grey10", size = 0.2) +
    coord_equal() +
    viridis::scale_fill_viridis(discrete = TRUE) +
    labs(title = "Registered Dogs (Toronto, 2013)",
         fill = NULL) +
    theme_void() +
    theme(legend.position = "bottom",
          panel.background = element_rect(fill = NA, colour = "#cccccc"))

plot of chunk second-map

Which looks much better. Normally we’d be done here, but the sage advice of Randall Munroe is worth keeping in mind:

xkcd Heatmap

So we can improve the meaningfulness of this map by adjusting for population. Luckily, Statistics Canada actually publishes population data at the FSA-level, and even more luckily they provide data on the number of private dwellings that are occupied by their usual owners. This last number is particularly interesting because it is the level at which you would expect pet ownership to occur. You can find the data here.

As with most StatsCan-provided CSV files, you will need to clean it up a bit before using it.

# pop <- read.csv("path/to/fsa_pop.csv")

# We don't really need the rest of this...
pop$population <- NULL
pop$indian <- NULL
pop$dwellings <- NULL

# Merge the data into the dataset we're already using
plot.data <- merge(plot.data, pop, by = "fsa")

# Calculate the density of dogs per private dwelling
plot.data$dog.density <- plot.data$dog / plot.data$dwellings.usual

And now we can plot our population-adjusted map using the theme from before:

dog.map <- ggplot(plot.data, aes(x = long, y = lat, group = group,
                      fill = cut_number(dog.density, 5))) +
    geom_polygon(color = "gray10", size = 0.2) +
    coord_equal() +
    viridis::scale_fill_viridis(discrete = TRUE) +
    labs(title = "Registered Dogs per Private Dwelling (Toronto, 2013)",
         fill = NULL) +
    theme_void() +
    theme(legend.position = "bottom",
          panel.background = element_rect(fill = NA, colour = "#cccccc"))

dog.map

plot of chunk third-map

Notice that the numbers are quite small. We might want to convert to per-1000 units later.

Style Points: Better Longitude & Latitude Labels

In case you need to explicitly show the coordinates of your spatial data, we can ensure that they are formatted nicely by passing a custom formatting function to the labels option on the x and y scale. These functions must take a vector of inputs and return a vector of labels, and we can write one to handle each of longitude and latitude quite easily:

# Converts a numeric array to an array of latitude labels
latitude <- function(x)
    paste0(format(x, digits = 3), "° ", ifelse(x > 0, "N", "S"))

# Converts a numeric array to an array of longitude labels
longitude <- function(x) 
    paste0(format(x, digits = 3), "° ", ifelse(x > 0, "E", "W"))

dog.map +
    scale_y_continuous(labels = latitude) +
    scale_x_continuous(labels = longitude) +
    theme(axis.text = element_text(size = 8, colour = "#333333"))

plot of chunk labelled-map

Style Points: Adding a Compass Rose

Update (February 6, 2017): Before ggsn existed, I used to do this manually by creating a grob from a vector image using the grImport package. This is no longer necessary.

The ggsn package makes it possible to add scale bars and compass rose symbols to ggplot objects.

library(ggsn)

dog.map +
    ggsn::north(plot.data, location = "bottomright", symbol = 10)

plot of chunk rose-map

Style Points: Adding a Drop Shadow

There’s a little trick for adding a drop shadow: we just draw the polygons filled with grey and shifted slightly southeast, and then draw the actual map polygons with the correct fill on top (which I learned here).

ggplot(plot.data, aes(x = long, y = lat, group = group,
                      fill = cut_number(dog.density, 5))) +
    geom_polygon(aes(x = long + 0.005, y = lat - 0.002),
                 color = "grey50", size = 0.2, fill = "grey50") +
    geom_polygon(color = "gray10", size = 0.2) +
    coord_equal() +
    viridis::scale_fill_viridis(discrete = TRUE) +
    labs(title = "Registered Dogs per Private Dwelling (Toronto, 2013)",
         fill = NULL) +
    theme_void() +
    theme(legend.position = "bottom",
          panel.background = element_rect(fill = NA, colour = "#cccccc"))

plot of chunk shadow-map

Finishing Touches: Better Fonts and Nicer Labels

Update (February 6, 2017): As of ggplot2 version 2.2, subtitles and captions are officially supported.

For standalone graphics, it is conventional to point to your data sources somewhere, often below. You can do this fairly easily with the caption label. I also like to add authorship and licensing information.

In order to get “nicer” labels, I’m also going to bin the data manually.

Adopting Arial Narrow as the font-of-the-day, the image at the top of this post can be generated as follows:

plot.data$dogs.per.1000 <- factor(
    cut(plot.data$dog.density * 1000, c(0, 30, 45, 60, 90, 120)),
    labels = c("Under 30", "30 to 44", "45 to 59", "60 to 89", "90 to 120")
)

ggplot(plot.data, aes(x = long, y = lat, group = group, fill = dogs.per.1000)) +
    geom_polygon(aes(x = long + 0.005, y = lat - 0.002),
                 color = "grey50", size = 0.2, fill = "grey50") +
    geom_polygon(color = "grey10", size = 0.2) +
    coord_equal() +
    # Add the compass rose.
    ggsn::north(plot.data, location = "bottomleft", symbol = 10) +
    viridis::scale_fill_viridis(option = "viridis", discrete = TRUE) +
    # Set the labels to NULL so that no space is allocated for them
    labs(x = NULL, y = NULL, fill = NULL,
         title = "Registered Dogs per Toronto Home",
         subtitle = "Density per 1000 Private Dwellings, 2013",
         caption = paste("A. Jacobs :: unconj.ca",
"Data: Toronto Animal Services (2014) & Statistics Canada (2013)",
"You may redistribute this graphic under the terms of the CC-by-SA license.",
            sep = "\n")) +
    theme_void() +
    theme(text = element_text(family = "Arial Narrow", size = 8),
          plot.title = element_text(size = 12, face = "bold"),
          plot.margin = unit(c(0, 0.25, 0.0, 0.25), "in"),
          panel.border = element_rect(fill = NA, colour = "#cccccc"),
          legend.text = element_text(size = 8),
          legend.position = c(0.9, 0.25))

plot of chunk final-map

Which looks pretty good to me!

comments powered by Disqus