Don't Forget to Reproject Spatial Data when Exporting to GeoJSON

In the process of working on a recent choropeth piece for work, I discovered that it’s easy to stumble when moving spatial data out of R and onto the web. It’s a poorly-documented reality that many web-based mapping libraries (including both D3 and Leaflet) expect GeoJSON data to be in EPSG:4326, and it is by no means a given that that your spatial data will start off in this projection.

If you’re like me and do your spatial data pre-processing in R before exporting to GeoJSON, you may have to re-project your data before these libraries will handle them properly. Thankfully, this is fairly easy to do with the modern spatial packages.

Forecasting YYZ Passengers in the Tidyverse

Buried in the Toronto Economic Bulletin (warning: Excel document) is a column listing the number of passengers passing through Pearson International Airport (YYZ) each month, going back more than fifteen years. There’s a story to tell in the forecast, too:

Passengers at Pearson

Flight data are a great forecasting example because they display such clear seasonal patterns, in this case peaking in the summer months and falling off in the winter. R has excellent tools for working with time series data and whipping up simple forecasts like this one. But there’s some friction with the modern tidyverse tools, because the latter expect a data.frame as the common interchange format.

In this post, I’ll outline an approach to fitting many time series models using the tidyverse tools, including model selection for out-of-sample performance. To ease the transition between these two worlds I make extensive use of list columns and the broom package.

An Update to the Choropleth Post

A few years ago I published a post outlining how to make nice-looking choropleth maps in R, and this piece still draws a reasonable share of my hits each month. Unfortunately, some of the techniques I used at the time are now quite out of date, and I was starting to feel bad for anyone taking my advice.

As of today the post has received a makeover, and takes a more modern approach. For any returning readers, the changes are explained in a series of HTML <ins> tags — which I have only recently discovered.

Exporting Clock Entries from org-mode to CSV

If you’ve used the clocking features of org-mode, you’re no doubt familiar with the clock table, which allows you to summarise time spent on different tasks. This is great for getting an overview of projects, but it’s not a very flexible tool if you want to have a more detailed idea of how you spend your time.

At this point I’ve accumulated about a year’s worth of clocked work time in org, and while clock tables have served me well so far, eventually I just wanted to get my data into R or Python for more minute analysis, and charts like the following:

Calendar heatmap example

However, I haven’t come across a reliable1 way to get individual clock entries out of org-mode files and into a more widely readable format. So I’ve written one.

Custom Hexbin Functions with ggplot

Recently, I wanted to create a map similar to James Cheshire’s crime map of London, which shows the most common crimes commited in a rectangular grid of points laid over London. Instead of using a rectangular grid, I wanted to use hexbins, but it turns out that ggplot needs a bit of prodding to do anything other than simply count the number of observations in each bin.

At the time I couldn’t find a good tutorial on writing custom hexbin functions, so this post is a reasonably thorough explanation of what I’ve made work.

