Unconstant Conjunction A personal blog

Not All Population Maps Are Boring

As the venerable xkcd comic points out, unadjusted geographic data often ends up looking like a population map. Normally, this makes it kind of boring, since it doesn’t tell you anything new.

Except, of course, when a visualizing the population is exactly what you had in mind. I discovered recently that the Australian government keeps track of every public toilet in the country, for example — and what better way is there to learn about Australian geography than through such an important public utility?

![Australia, In Public Toilets](/images/australia-in-public-toilets.png)

As usual, the remainder of this post is a technical discussion of how I created the graphic above. It was a neat opportunity to make use of hexbinning, and I’m quite fond of the end result. I haven’t yet figured out a way to add a “shadow” made of hexagons to indicate the overall shape of Australia, though I would like to. The fully reproducible code can be found in my visualization repository on Github.

Using Hex Bins in ggplot2

You can grab the data on Australia toilets here. Each entry in the .csv file contains quite a lot of other interesting information, but for now I’m most interested in the Longitude and Latitude fields. I’ve used the brilliant dplyr package to rearrange and select what I need, as follows:

require(dplyr, warn.conflicts = FALSE, quiet = TRUE)

toilets <-
    read.csv("~/Code/R/data/toiletmap.csv",
             header = TRUE, stringsAsFactors = FALSE) %>%
    mutate(long = Longitude, lat = Latitude) %>%
    select(ToiletID, long, lat)

The hex binning features of ggplot2 rely on the hexbin package, so you may need to run install.packages("hexbin") before the following code will work. But it’s actually quite simple to get them working:

require(ggplot2, quiet = TRUE)

p <-
    ggplot(toilets, aes(x = long, y = lat)) +
    geom_hex() +
    coord_equal()

print(p)
![plot of chunk toilet-hexbin](images/r-b27dc3c2/toilet-hexbin.png)

This looks pretty good. But it can be reasonaby difficult to compare different places on a continuous scale, so I decided to break the bins in levels. A very useful base R function for me here was cut(), which does exactly this: it breaks a continuous variable into a factor. I chose seven levels, because I think that’s about the most one can digest really easily. To do this, I made use of stat_binhex as opposed to geom_hex (much as you might use stat_density instead of geom_density when you want more control). It’s easy to add a nicer colour scale as well using the RColorBrewer palettes, too.

p <-
    ggplot(toilets, aes(x = long, y = lat)) +
    stat_binhex(
        colour = NA,
        aes(fill = cut(..count.., c(0, 5, 10, 50, 100,
                                    500, 1000, Inf)))
    ) +
    coord_equal() +
    labs(fill = NULL) +
    scale_fill_brewer(
        palette = "OrRd",
        labels = c("<5 ", "5-9 ", "10-49 ", "50-99 ",
                   "100-499 ", "500-999 ", "1000+  ")
    )

print(p)
![plot of chunk toilet-hexbin2](images/r-b27dc3c2/toilet-hexbin2.png)

Adding Points of Interest

Not surprisingly, the highest concentrations of toilets are found in the largest urban areas. I thought the map might be more interesting if I layered over the labels for these cities, so I looked up the list of the largest cities in Australia (via Wikipedia) and then extracted the longitude and latitude information from the site’s links to GeoHack. This gave me the following data frame:

poi <- data.frame(
    name = c("Perth", "Sydney", "Melbourne", "Brisbane", "Adelaide",
             "Alice\nSprings", "Darwin", "Hobart\n(Tasmania)"),
    long = c(115.858889, 151.209444, 144.963056, 153.027778, 138.601, 133.87,
             130.833333, 147.325),
    lat  = c(-31.952222, -33.859972, -37.813611, -27.467917, -34.929, -23.7,
             -12.45, -42.880556),
    vjust = c(1, 1, 0, 0.5, 0, 0.5, 0.5, 0),
    hjust = 0.5
)

Which can just be overlaid on the previous ggplot object using geom_text. To get away from the “plot” feeling, I also removed most of the theme elements.

require(grid, quiet = TRUE) # For the unit() function.

p <- p +
    geom_text(aes(x = long, y = lat, label = name, vjust = vjust,
                  hjust = hjust), data = poi, size = 4) +
    theme(panel.background = element_rect(fill = "gray90", colour = NA),
          plot.background = element_rect(fill = "gray90", colour = NA),
          # Remove titles, ticks, gridlines, and borders.
          axis.text = element_blank(),
          axis.title = element_blank(),
          axis.ticks = element_blank(),
          panel.grid = element_blank(),
          panel.border = element_blank(),
          # Set the legend background.
          legend.background = element_rect(fill = NA, colour = NA),
          legend.key = element_rect(fill = NA, colour = NA),
          # Set margins so that the graphic fills the whole space.
          plot.margin = unit(c(0, 0, -0.5, -0.5), "line")
    ) 

print(p)
![plot of chunk toilet-hexbin3](images/r-b27dc3c2/toilet-hexbin3.png)

There were some other minor adjustments I made (to the legend’s position and the transparency of the hexbins, for example), but that’s the gist of my approach. To get a sense of how I incorporated fonts and created the title/subtitle, you can check out the fully reproducible code on Github.

comments powered by Disqus