As the venerable xkcd comic points out, unadjusted geographic data often ends up looking like a population map. Normally, this makes it kind of boring, since it doesn’t tell you anything new.
Except, of course, when a visualizing the population is exactly what you had in mind. I discovered recently that the Australian government keeps track of every public toilet in the country, for example — and what better way is there to learn about Australian geography than through such an important public utility?
As usual, the remainder of this post is a technical discussion of how I created the graphic above. It was a neat opportunity to make use of hexbinning, and I’m quite fond of the end result. I haven’t yet figured out a way to add a “shadow” made of hexagons to indicate the overall shape of Australia, though I would like to. The fully reproducible code can be found in my visualization repository on Github.
Using Hex Bins in ggplot2
You can grab the data on Australia toilets
here.
Each entry in the .csv
file contains quite a lot of other interesting
information, but for now I’m most interested in the Longitude
and Latitude
fields. I’ve used the brilliant dplyr
package to rearrange and select what I
need, as follows:
require(dplyr, warn.conflicts = FALSE, quiet = TRUE)
toilets <-
read.csv("~/Code/R/data/toiletmap.csv",
header = TRUE, stringsAsFactors = FALSE) %>%
mutate(long = Longitude, lat = Latitude) %>%
select(ToiletID, long, lat)
The hex binning features of ggplot2
rely on the hexbin
package, so you may
need to run install.packages("hexbin")
before the following code will work.
But it’s actually quite simple to get them working:
require(ggplot2, quiet = TRUE)
p <-
ggplot(toilets, aes(x = long, y = lat)) +
geom_hex() +
coord_equal()
print(p)
This looks pretty good. But it can be reasonaby difficult to compare different
places on a continuous scale, so I decided to break the bins in levels. A very
useful base R function for me here was cut()
, which does exactly this: it
breaks a continuous variable into a factor. I chose seven levels, because I
think that’s about the most one can digest really easily. To do this, I made
use of stat_binhex
as opposed to geom_hex
(much as you might use
stat_density
instead of geom_density
when you want more control). It’s easy
to add a nicer colour scale as well using the RColorBrewer palettes, too.
p <-
ggplot(toilets, aes(x = long, y = lat)) +
stat_binhex(
colour = NA,
aes(fill = cut(..count.., c(0, 5, 10, 50, 100,
500, 1000, Inf)))
) +
coord_equal() +
labs(fill = NULL) +
scale_fill_brewer(
palette = "OrRd",
labels = c("<5 ", "5-9 ", "10-49 ", "50-99 ",
"100-499 ", "500-999 ", "1000+ ")
)
print(p)
Adding Points of Interest
Not surprisingly, the highest concentrations of toilets are found in the largest urban areas. I thought the map might be more interesting if I layered over the labels for these cities, so I looked up the list of the largest cities in Australia (via Wikipedia) and then extracted the longitude and latitude information from the site’s links to GeoHack. This gave me the following data frame:
poi <- data.frame(
name = c("Perth", "Sydney", "Melbourne", "Brisbane", "Adelaide",
"Alice\nSprings", "Darwin", "Hobart\n(Tasmania)"),
long = c(115.858889, 151.209444, 144.963056, 153.027778, 138.601, 133.87,
130.833333, 147.325),
lat = c(-31.952222, -33.859972, -37.813611, -27.467917, -34.929, -23.7,
-12.45, -42.880556),
vjust = c(1, 1, 0, 0.5, 0, 0.5, 0.5, 0),
hjust = 0.5
)
Which can just be overlaid on the previous ggplot object using geom_text
. To
get away from the “plot” feeling, I also removed most of the theme elements.
require(grid, quiet = TRUE) # For the unit() function.
p <- p +
geom_text(aes(x = long, y = lat, label = name, vjust = vjust,
hjust = hjust), data = poi, size = 4) +
theme(panel.background = element_rect(fill = "gray90", colour = NA),
plot.background = element_rect(fill = "gray90", colour = NA),
# Remove titles, ticks, gridlines, and borders.
axis.text = element_blank(),
axis.title = element_blank(),
axis.ticks = element_blank(),
panel.grid = element_blank(),
panel.border = element_blank(),
# Set the legend background.
legend.background = element_rect(fill = NA, colour = NA),
legend.key = element_rect(fill = NA, colour = NA),
# Set margins so that the graphic fills the whole space.
plot.margin = unit(c(0, 0, -0.5, -0.5), "line")
)
print(p)
There were some other minor adjustments I made (to the legend’s position and the transparency of the hexbins, for example), but that’s the gist of my approach. To get a sense of how I incorporated fonts and created the title/subtitle, you can check out the fully reproducible code on Github.