Unconstant Conjunction latest posts

Don't Forget to Reproject Spatial Data when Exporting to GeoJSON

In the process of working on a recent choropeth piece for work, I discovered that it’s easy to stumble when moving spatial data out of R and onto the web. It’s a poorly-documented reality that many web-based mapping libraries (including both D3 and Leaflet) expect GeoJSON data to be in EPSG:4326, and it is by no means a given that that your spatial data will start off in this projection.

If you’re like me and do your spatial data pre-processing in R before exporting to GeoJSON, you may have to re-project your data before these libraries will handle them properly. Thankfully, this is fairly easy to do with the modern spatial packages.

Continue Reading →

Forecasting YYZ Passengers in the Tidyverse

    2 March 2017 // tagged

Buried in the Toronto Economic Bulletin (warning: Excel document) is a column listing the number of passengers passing through Pearson International Airport (YYZ) each month, going back more than fifteen years. There’s a story to tell in the forecast, too:

Passengers at Pearson

Flight data are a great forecasting example because they display such clear seasonal patterns, in this case peaking in the summer months and falling off in the winter. R has excellent tools for working with time series data and whipping up simple forecasts like this one. But there’s some friction with the modern tidyverse tools, because the latter expect a data.frame as the common interchange format.

In this post, I’ll outline an approach to fitting many time series models using the tidyverse tools, including model selection for out-of-sample performance. To ease the transition between these two worlds I make extensive use of list columns and the broom package.

Continue Reading →

An Update to the Choropleth Post

A few years ago I published a post outlining how to make nice-looking choropleth maps in R, and this piece still draws a reasonable share of my hits each month. Unfortunately, some of the techniques I used at the time are now quite out of date, and I was starting to feel bad for anyone taking my advice.

As of today the post has received a makeover, and takes a more modern approach. For any returning readers, the changes are explained in a series of HTML <ins> tags — which I have only recently discovered.

Continue Reading →

Exporting Clock Entries from org-mode to CSV

If you’ve used the clocking features of org-mode, you’re no doubt familiar with the clock table, which allows you to summarise time spent on different tasks. This is great for getting an overview of projects, but it’s not a very flexible tool if you want to have a more detailed idea of how you spend your time.

At this point I’ve accumulated about a year’s worth of clocked work time in org, and while clock tables have served me well so far, eventually I just wanted to get my data into R or Python for more minute analysis, and charts like the following:

Calendar heatmap example

However, I haven’t come across a reliable1 way to get individual clock entries out of org-mode files and into a more widely readable format. So I’ve written one.

Continue Reading →

Custom Hexbin Functions with ggplot

Recently, I wanted to create a map similar to James Cheshire’s crime map of London, which shows the most common crimes commited in a rectangular grid of points laid over London. Instead of using a rectangular grid, I wanted to use hexbins, but it turns out that ggplot needs a bit of prodding to do anything other than simply count the number of observations in each bin.

At the time I couldn’t find a good tutorial on writing custom hexbin functions, so this post is a reasonably thorough explanation of what I’ve made work.

Continue Reading →