Unconstant Conjunction latest posts

Copyright in Closed-Source R Packages: The Right Way

    18 March 2019 // tagged

There are a wealth of excellent resources for R users on creating and maintaining R packages, which remain the best way to share your code within your organisation or the larger community. However, almost all of these resources tend to focus on open-source packages, tools, and workflows. As a result of this, they tend to skim over more corporate issues like copyright assignment.

Yet many R users are working on their code in proprietary environments, creating packages for internal use by their company. That code is closed source, with the copyright belonging to their organisation. If you fall into this category, how should you communicate that in your R package? This post is intended to provide a very clear answer to that question.

Continue Reading →

What Does R Value?

    13 December 2018 // tagged

I’m fan of Bryan Cantrill’s argument that we ought to think about “platform values” when assessing technologies, highlighted in a recent talk, but explained more thoroughly in an earlier one on his experience with the node.js community.

In my reading, he argues that of the many values (such as performance, security, or expressiveness) that programming languages or platforms may hold, many are in conflict, and the platform inevitably emphasises some of these values over others. These decisions are a reflection of explicit or implicit “platform values”.

Cantrill illustrates this with a series of examples, but, no surprise, the R platform does not make his shortlist. I couldn’t resist trying to cook up my own taxonomy of platform values for the R language and its community.

More broadly, though, Cantrill believes that the values of a platform affect the projects that adopt them, and conflicts can arise between the values of a project and its respective choice of platform. This strongly echoes my own experience in the R community, and I’ve gotten a measure of clarity for future and existing projects by learning to articulate these conflicts.

Continue Reading →

Why There is No importFrom() Function in R

    27 August 2018 // tagged

For those coming to R from other languages, it seems very odd that R users import code from packages the way that they do. For instance, if you are used to Python, this is the general pattern of “using code from elsewhere”:

import math
import numpy as np
from random import randint

# Usage:
math.floor(3.2)
np.array(...)
randint()

Meanwhile, in R, you’re likely to see

library(dplyr)

data_frame(x = 1, y = "A") %>%
  mutate(z = TRUE)

… with the subtext being that users must simply know that the data_frame(), mutate(), and %>% functions are actually all from the dplyr package. Why is R so unusual here?

Continue Reading →

Communicating with UDP Sockets from R

    6 July 2018 // tagged

Recently I wanted to send some Shiny usage data from R to a certain metrics server. Since R includes write.socket() and friends for opening arbitrary network sockets, it seemed at the outset that this would be quite simple. However, I ran into an interesting roadblock along the way.

It turns out that R’s socket API only supports TCP connections, which you can confirm by looking at the source code – and in this case I needed to send UDP packets instead. This was a little surprising to me, since most other languages would include UDP support out of the box; it’s a core internet protocol, after all. For whatever reason, this seems not to be the case with R, and even after searching CRAN and GitHub I wasn’t able to find an existing package that provides UDP socket support.

To remedy this, I put together a simple way to write messages to UDP sockets from R.

Continue Reading →

Principles for Pipe-Friendly APIs in R Packages

    28 March 2018 // tagged

In the past few years the APIs of the wildly popular tidyverse packages have coalesced around the pipe operator (%>%). R users familiar with this ecosystem of packages are now accustomed to writing “pipelines” where code is expressed as a series of steps composed with the pipe operator.

For example, plenty of data science code basically boils down to:

data() %>% transform() %>% summarise()

(This is even more evident if you subscribe to Wickham & Grolemund’s view that models are a “low-dimensional summary of your data”.)

There are plenty of resources on how to use pipes to re-write you R code. There are fewer on the implications of pipes on how R functions and package APIs are designed. What should package authors should keep in mind, knowing that their users are likely to use their functions in pipelines?

My own experience has led me to compile four principles for writing pipe- friendly APIs for R packages:

  1. Only one argument is going to be “piped” into your functions, so you should design them to accommodate this. The first argument should be what you’d expect to be piped in; all other arguments should be parameters with meaningful default values. If you think you’ll always need more than one argument, consider wrapping them up in a lightweight S3 class.

  2. Think carefully about the output of your functions, since this will be what is passed down the pipeline. Strive to always return a single type of output (a data frame, a numeric vector, etc). Never return NULL, because very few functions can take NULL as an input. (To signify empty values, use zero-row data frames or vectors with NA.)

  3. Prefer “pure” functions – e.g. those that have no “side effects” that mutate global state – whenever possible. Users think about pipelines as a linear progression of steps. The transparency of pure functions make them easy to reason about in this fashion.

  4. When your functions must have side effects (e.g. when printing or plotting), return the original object (instead of the conventional NULL) so that pipelines can continue.

I’ve found that being explicit about these design goals has improved the quality of my own package interfaces; perhaps they can be of use to others.

Continue Reading →