Unconstant Conjunction latest posts

Principles for Pipe-Friendly APIs in R Packages

    28 March 2018 // tagged

In the past few years the APIs of the wildly popular tidyverse packages have coalesced around the pipe operator (%>%). R users familiar with this ecosystem of packages are now accustomed to writing “pipelines” where code is expressed as a series of steps composed with the pipe operator.

For example, plenty of data science code basically boils down to:

data() %>% transform() %>% summarise()

(This is even more evident if you subscribe to Wickham & Grolemund’s view that models are a “low-dimensional summary of your data”.)

There are plenty of resources on how to use pipes to re-write you R code. There are fewer on the implications of pipes on how R functions and package APIs are designed. What should package authors should keep in mind, knowing that their users are likely to use their functions in pipelines?

My own experience has led me to compile four principles for writing pipe- friendly APIs for R packages:

  1. Only one argument is going to be “piped” into your functions, so you should design them to accommodate this. The first argument should be what you’d expect to be piped in; all other arguments should be parameters with meaningful default values. If you think you’ll always need more than one argument, consider wrapping them up in a lightweight S3 class.

  2. Think carefully about the output of your functions, since this will be what is passed down the pipeline. Strive to always return a single type of output (a data frame, a numeric vector, etc). Never return NULL, because very few functions can take NULL as an input. (To signify empty values, use zero-row data frames or vectors with NA.)

  3. Prefer “pure” functions – e.g. those that have no “side effects” that mutate global state – whenever possible. Users think about pipelines as a linear progression of steps. The transparency of pure functions make them easy to reason about in this fashion.

  4. When your functions must have side effects (e.g. when printing or plotting), return the original object (instead of the conventional NULL) so that pipelines can continue.

I’ve found that being explicit about these design goals has improved the quality of my own package interfaces; perhaps they can be of use to others.

Continue Reading →

A Note on Migrating Packages from rjson to jsonlite

    19 February 2018 // tagged

Recently I was working on an R package that had historically used both rjson and jsonlite to serialize R objects to JSON (before sending it off to an API). In this case we wanted to remove the rjson depdency, which was only used in a few places.

The most noticable hiccup I encountered while porting the code was during encoding of parameter lists generated in R, which looked something like

params <- list(key1 = "param", key2 = NULL,
               key3 = c("paired", "params"))

In this case, rjson produced exactly what our API was looking for:

cat(rjson::toJSON(params))
#> {"key1":"param","key2":null,"key3":["paired","params"]}

But by default the jsonlite package will behave very differently:

jsonlite::toJSON(params)
#> {"key1":["param"],"key2":{},"key3":["paired","params"]}

Continue Reading →

Don't Forget to Reproject Spatial Data when Exporting to GeoJSON

In the process of working on a recent choropeth piece for work, I discovered that it’s easy to stumble when moving spatial data out of R and onto the web. It’s a poorly-documented reality that many web-based mapping libraries (including both D3 and Leaflet) expect GeoJSON data to be in EPSG:4326, and it is by no means a given that that your spatial data will start off in this projection.

If you’re like me and do your spatial data pre-processing in R before exporting to GeoJSON, you may have to re-project your data before these libraries will handle them properly. Thankfully, this is fairly easy to do with the modern spatial packages.

Continue Reading →

Forecasting YYZ Passengers in the Tidyverse

    2 March 2017 // tagged

Buried in the Toronto Economic Bulletin (warning: Excel document) is a column listing the number of passengers passing through Pearson International Airport (YYZ) each month, going back more than fifteen years. There’s a story to tell in the forecast, too:

Passengers at Pearson

Flight data are a great forecasting example because they display such clear seasonal patterns, in this case peaking in the summer months and falling off in the winter. R has excellent tools for working with time series data and whipping up simple forecasts like this one. But there’s some friction with the modern tidyverse tools, because the latter expect a data.frame as the common interchange format.

In this post, I’ll outline an approach to fitting many time series models using the tidyverse tools, including model selection for out-of-sample performance. To ease the transition between these two worlds I make extensive use of list columns and the broom package.

Continue Reading →

Using hledger with ledger-mode

    10 February 2017 // tagged

Last summer I landed a few patches in Emacs’s ledger-mode that make it easier to use with alternative implementations of Ledger, such as hledger. Since the competing hledger-mode garnered some attention last week on Hacker News, I thought these new features might be worth highlighting to those interested in plain-text accounting in Emacs.

Continue Reading →